How Halo Scaled to 11.6 Million Users Using the Saga Design Pattern 🎮

#51: Break Into Saga Design Pattern (4 Minutes)

Jul 04, 2024

∙ Paid

Get my system design playbook for FREE on newsletter signup:

This post outlines the Saga design pattern. You will find references at the bottom of this page if you want to go deeper.

Refer just 3 people & I'll send you some rewards as a thank you.

Saga design pattern is often used to achieve data consistency and reliability in distributed systems. Microsoft now owns the Halo game series.

Once upon a time, a game development company named Bungie made a strategy game.

Yet they didn’t have success with it.

So they pivoted to create a shooting game and called it Halo.

They used a single SQL database to store the entire game data.

But their growth rate was incredible.

And it became difficult to store all data in a single database.

So they set up a NoSQL database and partitioned it.

While each game can have up to 32 players.

And each player’s data gets stored in a different database partition.

Storing Game Data Across Database Partitions — Storing Data Across Database Partitions

Although it temporarily solved their scalability issue, it created new problems.

Here are some of them:

1. Atomicity:

Each player in the same game must see the correct game points. That means atomic writes.

Atomicity is the idea that the writes to every partition succeed or no writes happen at all.

Yet there’s a risk of database partition failure.

This means a failed partition will have wrong data due to missing writes.

So it’s hard to achieve atomicity with a partitioned database.

2. Consistency:

Consistency means changing data from one valid state to another valid state.

Yet there’s a risk of network latency and network failures.

That means some partitions will contain outdated data.

So it’s hard to achieve consistency with a partitioned database.

Saga Design Pattern

They wanted a simple & scalable failure management pattern.

So they set up Saga.

Here’s how Saga works:

1. Divide & Conquer:

It splits a transaction into sub-transactions.

And assigns a separate sub-transaction to each database partition.

That means it still looks like a single transaction.

Saga Applying Compensating Transactions on a Failed Sub-Transaction — Saga Applying Compensating Transactions after a Failed Sub-Transaction

Besides a revert action is available for each sub-transaction - compensating transaction. It’s like a correction and not an undo. For example, canceling a hotel booking instead of removing it.

The compensating transactions get executed only if a sub-transaction fails.
The compensating transactions must be idempotent, so retries are possible without side effects.