The System Design Newsletter

The System Design Newsletter

Share this post

The System Design Newsletter
The System Design Newsletter
How PayPal Was Able to Support a Billion Transactions per Day With Only 8 Virtual Machines
Copy link
Facebook
Email
Notes
More
User's avatar
Discover more from The System Design Newsletter
Weekly newsletter to help busy engineers become good at system design
Over 150,000 subscribers
Already have an account? Sign in

How PayPal Was Able to Support a Billion Transactions per Day With Only 8 Virtual Machines

#30: Learn More - Awesome PayPal Engineering (4 minutes)

Neo Kim's avatar
Neo Kim
Dec 26, 2023
248

Share this post

The System Design Newsletter
The System Design Newsletter
How PayPal Was Able to Support a Billion Transactions per Day With Only 8 Virtual Machines
Copy link
Facebook
Email
Notes
More
14
23
Share

Get the powerful template to approach system design for FREE on newsletter sign-up:


This post outlines how PayPal scaled to a billion daily transactions with only 8 virtual machines. If you want to learn more, scroll to the bottom and find the references.

  • Share this post & I'll send you some rewards for the referrals.

December 1998 - California, United States.

A team of engineers creates security software for hand-held devices.

Yet their business model failed.

So they pivoted to create an online payment service and called it PayPal.

As the number of users grew in the early days, they bought newer hardware to scale.

Moore’s Law
Moore’s Law

But transistors in an integrated circuit (IC) stopped doubling every other year.

Put another way, the single-threaded performance gain of Moore’s law slowed down. So buying newer hardware couldn't solve their scalability problems.

Yet their growth rate was explosive.

And hit 1 million transactions a day in the next 2 years.

So they scaled out by running services in more than 1000 virtual machines.

Horizontal Scaling
Horizontal Scaling

Although they solved the scalability issue, it created new problems.

Here are some of them:

1. Network Infrastructure

A request took more network hops to finish and it worsened latency.

Also it became expensive to maintain the network infrastructure.

2. Maintainance Costs

Adding more servers increased their infrastructure complexity.

Besides the service deployment across every machine took more time.

And setting up autoscaling needed extra effort.

Also infrastructure management like monitoring became difficult.

3. Resource Usage

They didn’t fully use the CPU of each server.

Put another way, the server throughput was low.

It resulted in resource wastage and extra costs.


Actor Model

The code doesn’t take full advantage of the hardware unless it’s run concurrently.

Also they wanted to keep it simple and scalable.

So they moved to the actor model based on the Akka framework (Java).

It allowed them to support a billion daily transactions with only 8 virtual machines.

The actor model is a conceptual concurrent computation model.

And an actor is the fundamental unit of computation.

Here’s how the actor model offers extreme scalability:

1. Resource Usage

An actor is an extremely lightweight object. It takes fewer resources than threads.

So it’s easy to create millions of actors if necessary.

The actor does an action when a message is received.

Yet the actor is decoupled from the source of the message.

Threads Assigned to Actors With a Message to Process
Actors with Messages Are Assigned a Thread

A thread gets assigned to an actor when it must process a message.

While the thread is released after the message is processed and gets assigned to another actor.

The number of threads is proportional to the number of CPU cores.

Yet a small number of threads can handle a large number of concurrent actors.

Because a thread gets assigned to an actor only during its runtime.

2. State Information

Actors don’t share memory and are isolated from each other.

Put another way, the state of an actor is private.

They communicate with each other through messages.

Messages are simple and immutable data structures that get sent over the network.

Actors Sending Messages
Actors Communicating Through Messages

Each actor has a mailbox. It’s like a message queue.

Actors store messages in the mailbox until they get processed in a First-in First-out (FIFO) order.

Also actors allow the system to avoid extra network calls to a distributed cache or a database.

Because they store the local state in the application server.

App Server Handling Requests Without Querying Database
Application Server Handling Requests Without Querying Database

Put another way, a stateful application server improves performance. Because it caches data locally.

Besides PayPal uses consistent hashing to route a customer to the same server.

3. Concurrency

Many actors can run at the same time but each actor process messages sequentially.

An Actor Processing Messages in Sequential Order
An Actor Process Messages in Sequential Order

Put another way, an actor can process only a single message at a time.

So they need 3 actors to process 3 messages in parallel.

Also actors work asynchronously. In other words, they don’t wait for another actor's response.

So the actor model makes concurrency easier.

Besides PayPal uses the functional programming style of Akka for scalability.

It prevents side effects and makes testing easier.

Also they use pluggable code pieces with functional programming for easy scalability.

The actors could run locally or remotely on another machine.

Yet it’s transparent to the system.

4. Fault Tolerance

An actor can create more actors and also supervise them.

Fault Tolerance in Actor Model
Fault Tolerance in Actor Model

The supervisor actor restarts the supervised actor if it fails. Also the message can be routed to another actor.

Besides errors propagate to the supervisor actor.

So graceful error handling can be done without code clutter.


The actor model is not a silver bullet to scalability.

It introduces a learning curve for the developers.

Also extra care should be taken to prevent race conditions and deadlocks.

The actor model allowed PayPal to handle extreme scale with fewer resources.


👋 PS - Are you unhappy at your current job?

And preparing for system design interviews to get your dream job can be stressful.

Don't worry, I'm working on content to help you pass the system design interview. I'll make it easier - you spend only a few minutes each week to go from 0 to 1. Yet paid subscription fees will be higher than current pledge fees.

So pledge now to get access at a lower price.

"This newsletter is great to level up your system design knowledge." Gregor


Consider subscribing to get simplified case studies delivered straight to your inbox:


Author NK; System design case studies
Follow me on LinkedIn | YouTube | Threads | Twitter | Instagram | Bluesky

Thank you for supporting this newsletter. Consider sharing this post with your friends and get rewards. Y’all are the best.

system design newsletter

Share


5 Reasons Why Zoom Was Able to Support 300 Million Video Calls a Day

5 Reasons Why Zoom Was Able to Support 300 Million Video Calls a Day

NK
·
December 11, 2023
Read full story
Virtual Waiting Room Architecture That Handles High-Demand Ticket Sales at SeatGeek

Virtual Waiting Room Architecture That Handles High-Demand Ticket Sales at SeatGeek

NK
·
December 19, 2023
Read full story

References

  • https://medium.com/paypal-tech/squbs-a-new-reactive-way-for-paypal-to-build-applications-127126bf684b

  • https://en.wikipedia.org/wiki/PayPal

  • https://akka.io/

  • https://finematics.com/actor-model-explained/

  • https://www.brianstorti.com/the-actor-model/

Muhammad Afzal's avatar
Minh Phu Nguyen's avatar
Prashanth's avatar
kevinnguyen's avatar
Rana's avatar
248 Likes∙
23 Restacks
248

Share this post

The System Design Newsletter
The System Design Newsletter
How PayPal Was Able to Support a Billion Transactions per Day With Only 8 Virtual Machines
Copy link
Facebook
Email
Notes
More
14
23
Share

Discussion about this post

User's avatar
Kent Bull's avatar
Kent Bull
Dec 26, 2023

Looks like Erlang/Elixir’s OTP model.

Expand full comment
Like (5)
Reply
Share
2 replies
Junaid Effendi's avatar
Junaid Effendi
Mar 4, 2024

Actor model is rarely used, we use it in Scala in some of our projects. I think its underrated.

Expand full comment
Like (1)
Reply
Share
1 reply by Neo Kim
12 more comments...
8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers
#1: Learn More - Awesome WhatsApp Engineering (6 minutes)
Aug 27, 2023 â€¢ 
Neo Kim
737

Share this post

The System Design Newsletter
The System Design Newsletter
8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers
Copy link
Facebook
Email
Notes
More
24
How Stripe Prevents Double Payment Using Idempotent API
#45: A Simple Introduction to Idempotent API (4 minutes)
May 9, 2024 â€¢ 
Neo Kim
380

Share this post

The System Design Newsletter
The System Design Newsletter
How Stripe Prevents Double Payment Using Idempotent API
Copy link
Facebook
Email
Notes
More
29
How Uber Computes ETA at Half a Million Requests per Second
#26: And How Online Maps Work Explained Like You’re Twelve (5 minutes)
Dec 3, 2023 â€¢ 
Neo Kim
299

Share this post

The System Design Newsletter
The System Design Newsletter
How Uber Computes ETA at Half a Million Requests per Second
Copy link
Facebook
Email
Notes
More
25

Ready for more?

© 2025 Neo Kim
Publisher Privacy
Substack
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More

Create your profile

User's avatar

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.