How Razorpay Scaled to Handle Flash Sales at 1500 Requests per Second

#46: A Case Study on Payment Gateway Scalability (5 min read)

Neo Kim

May 17, 2024

Get my system design playbook for FREE on newsletter signup:

2020 - India.

IPL, the most famous cricket league in the world is about to start.

And more than 23 million raving fans of cricket in India will stream it.

While companies sell food at a discount via flash sales minutes before the game starts.

And accepts online payment via Razorpay, a shiny payment gateway service.

Flash sales create a traffic spike with transactions reaching 1500 requests per second.

Although it’s possible to serve this traffic, scaling the infrastructure quickly to handle it could be difficult.

This post outlines how Razorpay scales to handle flash sales. If you want to learn more, scroll to the bottom and find the references.

Share this post & I'll send you some rewards for the referrals.

Note: This post is based on my research and may differ from real-world implementation.

Refind – Brain food, delivered daily (Featured)

Loved by 450,000+ curious minds. Every day Refind analyzes thousands of articles and sends you only the best. Subscribe for free today.

Join Refind

Payment Gateway Architecture

Here are their scalability techniques for flash sale:

1. Rate Limit the Traffic

They rate limit the traffic to prevent server overload.

And use a Nginx proxy server as the rate limiter. It gets deployed as a sidecar and runs a dedicated cache for rate limiting. Imagine the sidecar pattern as extending a service by attaching an extra container.

Besides they use the fixed window algorithm for efficient rate limiting. It uses a single atomic counter per key with an expiry time (TTL).

2. Connection Pooling

They use MySQL as the main database. While clients compete with each other for a database connection during flash sales.

They run PHP on the application layer. But PHP uses a process model for execution. So sharing resources between processes isn’t possible. Hence PHP doesn’t support database connection pooling natively.

This means the application layer holds database connections while waiting for results. So connection starvation is likely to happen if the queries get expensive.

Also the database performance degrades if the number of idle connections increases.

So they use ProxySQL as a database proxy. It holds a pool of connections to the database.

And a tenant connects to ProxySQL instead of MySQL directly. Thus limiting the number of MySQL connections to avoid connection starvation.

Think of a tenant as an isolated data space for a specific user.

Also constant opening and closing of database connections could be expensive. While ProxySQL prevents it via persistent connections.

Besides ProxySQL caches the query results for low latency.

They deploy ProxySQL as a sidecar to keep the application layer stateless. And set up a fallback to connect directly with MySQL if ProxySQL fails for high availability.

3. Avoid the Thundering Herd

Thundering herd occurs when many clients query the server concurrently during flash sales. That means bad performance and downtime.

So they use these techniques to prevent the thundering herd problem:

Throttle incoming traffic
Add exponential backoff by the client
Include caching

Besides they use ProxySQL to throttle tenants issuing expensive database queries.

4. Autoscaling Isn’t Enough

It takes around 4 minutes for a newly provisioned server to become healthy. So they don't rely only on autoscaling to handle traffic spikes.

Instead they prewarm their infrastructure. And run baked container images to reduce the deployment time.

They do capacity planning based on estimated transactions and scale their servers horizontally. Also they scale down the infrastructure after flash sales with autoscaling.

5. Smart Routing

They should forward the traffic only to external bank gateways that are operational.

Routing Traffic to an Operational Gateway

So they use routing rules based on machine learning. It considers the success and failure events from payments. And then predicts the success probability of each external gateway.

6. Testing

They must resolve system bottlenecks for better performance.

So they do load testing using an open-source tool called k6. It checks the system's performance under an expected load. And provides information about latency and throughput.