How Halo Scaled to 11.6 Million Users Using the Saga Design Pattern 🎮
#51: Break Into Saga Design Pattern (4 Minutes)
Get the powerful template to approach system design for FREE on newsletter sign-up:
This post outlines the Saga design pattern. You will find references at the bottom of this page if you want to go deeper.
Refer just 3 people & I'll send you some rewards as a thank you.
Why it matters: Saga design pattern is often used to achieve data consistency and reliability in distributed systems.
Between the lines: Microsoft now owns the Halo game series.
Once upon a time, a game development company named Bungie made a strategy game.
Yet they didn’t have success with it.
So they pivoted to create a shooting game and called it Halo.
They used a single SQL database to store the entire game data.
But their growth rate was incredible.
And it became difficult to store all data in a single database.
So they set up a NoSQL database and partitioned it.
While each game can have up to 32 players.
And each player’s data gets stored in a different database partition.
Although it temporarily solved their scalability issue, it created new problems.
Here are some of them:
1. Atomicity:
Each player in the same game must see the correct game points. That means atomic writes.
Atomicity is the idea that the writes to every partition succeed or no writes happen at all.
Yet there’s a risk of database partition failure.
This means a failed partition will have wrong data due to missing writes.
So it’s hard to achieve atomicity with a partitioned database.
2. Consistency:
Consistency means changing data from one valid state to another valid state.
Yet there’s a risk of network latency and network failures.
That means some partitions will contain outdated data.
So it’s hard to achieve consistency with a partitioned database.
Saga Design Pattern
They wanted a simple & scalable failure management pattern.
So they set up Saga.
Here’s how Saga works:
1. Divide & Conquer:
It splits a transaction into sub-transactions.
And assigns a separate sub-transaction to each database partition.
That means it still looks like a single transaction.
Besides a revert action is available for each sub-transaction - compensating transaction. It’s like a correction and not an undo. For example, canceling a hotel booking instead of removing it.
The compensating transactions get executed only if a sub-transaction fails.
The compensating transactions must be idempotent, so retries are possible without side effects.
2. Interacting with Database Partitions:
They set up a separate service to manage sub-transactions and called it Orchestrator.
It let them:
Control sub-transactions
Apply compensating transactions if needed
3. State Information:
It’s necessary to store each sub-transaction's state (start & end) outside Orchestrator.
Otherwise it will become a single point of failure.
So they use a durable and distributed log.
It let them:
Track if a sub-transaction failed
Find compensating transactions that must be executed
Track the state of compensating transactions
Keep the orchestrator stateless
Recover from failures
TL;DR:
Orchestrator: manages sub-transactions & interacts with log
Log: stores the state of each sub-transaction
Compensating transaction: idempotent for retries & revert failure
🔥 Use Cases:
Saga is often used to maintain data consistency in a microservices architecture. And it prevents tight coupling.
Here are some real-world use cases of the Saga design pattern:
E-commerce app: placing an order after checking inventory and payment status.
Travel booking system: guaranteeing combined reservation of flights, hotels, and rental cars.
Banking system: transferring money from one user account to another.
Saga doesn't acquire locks on remote resources and uses compensation to handle failures. Thus it scales well.
It’s a good architectural choice when new database partitions get added over time.
While Halo remains a popular game series for Xbox with millions of unique users.
👋 PS - Are you unhappy at your current job?
And preparing for system design interviews to get your dream job can be stressful.
Don't worry, I'm working on content to help you pass the system design interview. I'll make it easier - you spend only a few minutes each week to go from 0 to 1. Yet paid subscription fees will be higher than current pledge fees.
So pledge now to get access at a lower price.
"This newsletter is an amazing resource to learn system design." Alex
Subscribe to get simplified case studies delivered straight to your inbox:
Thank you for supporting this newsletter. Consider sharing this post with your friends and get rewards. Y’all are the best.
SAGA in action, one of my favorites, Neo!
super interesting design pattern. I love hearing of stories of SQL to NoSQL migrations for various reasons, and the simplicity of your writing :)
Great article Neo !