How PayPal Was Able to Support a Billion Transactions per Day With Only 8 Virtual Machines

#30: Learn More - Awesome PayPal Engineering (4 minutes)

Dec 26, 2023

Error

Get my system design playbook for FREE on newsletter signup:

This post outlines how PayPal scaled to a billion daily transactions with only 8 virtual machines. If you want to learn more, scroll to the bottom and find the references.

Share this post & I'll send you some rewards for the referrals.

December 1998 - California, United States.

A team of engineers creates security software for hand-held devices.

Yet their business model failed.

So they pivoted to create an online payment service and called it PayPal.

As the number of users grew in the early days, they bought newer hardware to scale.

But transistors in an integrated circuit (IC) stopped doubling every other year.

Put another way, the single-threaded performance gain of Moore’s law slowed down. So buying newer hardware couldn't solve their scalability problems.

Yet their growth rate was explosive.

And hit 1 million transactions a day in the next 2 years.

So they scaled out by running services in more than 1000 virtual machines.

Although they solved the scalability issue, it created new problems.

Here are some of them:

1. Network Infrastructure

A request took more network hops to finish and it worsened latency.

Also it became expensive to maintain the network infrastructure.

2. Maintainance Costs

Adding more servers increased their infrastructure complexity.

Besides the service deployment across every machine took more time.

And setting up autoscaling needed extra effort.

Also infrastructure management like monitoring became difficult.

3. Resource Usage

They didn’t fully use the CPU of each server.

Put another way, the server throughput was low.

It resulted in resource wastage and extra costs.

Actor Model

The code doesn’t take full advantage of the hardware unless it’s run concurrently.

Also they wanted to keep it simple and scalable.

So they moved to the actor model based on the Akka framework (Java).

It allowed them to support a billion daily transactions with only 8 virtual machines.

The actor model is a conceptual concurrent computation model.

And an actor is the fundamental unit of computation.

Here’s how the actor model offers extreme scalability:

1. Resource Usage

An actor is an extremely lightweight object. It takes fewer resources than threads.

So it’s easy to create millions of actors if necessary.

The actor does an action when a message is received.

Yet the actor is decoupled from the source of the message.

Threads Assigned to Actors With a Message to Process — Actors with Messages Are Assigned a Thread

A thread gets assigned to an actor when it must process a message.

While the thread is released after the message is processed and gets assigned to another actor.

The number of threads is proportional to the number of CPU cores.

Yet a small number of threads can handle a large number of concurrent actors.

Because a thread gets assigned to an actor only during its runtime.

2. State Information

Actors don’t share memory and are isolated from each other.

Put another way, the state of an actor is private.

They communicate with each other through messages.

Messages are simple and immutable data structures that get sent over the network.

Actors Sending Messages — Actors Communicating Through Messages

Each actor has a mailbox. It’s like a message queue.

Actors store messages in the mailbox until they get processed in a First-in First-out (FIFO) order.

Also actors allow the system to avoid extra network calls to a distributed cache or a database.

Because they store the local state in the application server.

App Server Handling Requests Without Querying Database — Application Server Handling Requests Without Querying Database

Put another way, a stateful application server improves performance. Because it caches data locally.

Besides PayPal uses consistent hashing to route a customer to the same server.

3. Concurrency

Many actors can run at the same time but each actor process messages sequentially.

An Actor Processing Messages in Sequential Order — An Actor Process Messages in Sequential Order

Put another way, an actor can process only a single message at a time.

So they need 3 actors to process 3 messages in parallel.

Also actors work asynchronously. In other words, they don’t wait for another actor's response.

So the actor model makes concurrency easier.

Besides PayPal uses the functional programming style of Akka for scalability.

It prevents side effects and makes testing easier.

Also they use pluggable code pieces with functional programming for easy scalability.

The actors could run locally or remotely on another machine.

Yet it’s transparent to the system.

4. Fault Tolerance

An actor can create more actors and also supervise them.

The supervisor actor restarts the supervised actor if it fails. Also the message can be routed to another actor.

Besides errors propagate to the supervisor actor.

So graceful error handling can be done without code clutter.

The actor model is not a silver bullet to scalability.

It introduces a learning curve for the developers.

Also extra care should be taken to prevent race conditions and deadlocks.

The actor model allowed PayPal to handle extreme scale with fewer resources.

Consider subscribing to get simplified case studies delivered straight to your inbox:

Author NK; System design case studies — **Follow me on LinkedIn | YouTube | Threads | Twitter | Instagram**

Thank you for supporting this newsletter. Consider sharing this post with your friends and get rewards. Y’all are the best.

5 Reasons Why Zoom Was Able to Support 300 Million Video Calls a Day

December 11, 2023

Read full story

Virtual Waiting Room Architecture That Handles High-Demand Ticket Sales at SeatGeek

December 19, 2023

Read full story

References

https://medium.com/paypal-tech/squbs-a-new-reactive-way-for-paypal-to-build-applications-127126bf684b
https://en.wikipedia.org/wiki/PayPal
https://akka.io/
https://finematics.com/actor-model-explained/
https://www.brianstorti.com/the-actor-model/