The System Design Newsletter

The System Design Newsletter

Share this post

The System Design Newsletter
The System Design Newsletter
How YouTube Was Able to Support 2.49 Billion Users With MySQL
Copy link
Facebook
Email
Notes
More

How YouTube Was Able to Support 2.49 Billion Users With MySQL

#48: Break Into Vitess Architecture (4 minutes)

Neo Kim's avatar
Neo Kim
May 31, 2024
287

Share this post

The System Design Newsletter
The System Design Newsletter
How YouTube Was Able to Support 2.49 Billion Users With MySQL
Copy link
Facebook
Email
Notes
More
10
19
Share

Get the powerful template to approach system design for FREE on newsletter sign-up:


This post outlines Vitess architecture. If you want to learn more, find references at the bottom of the page.

  • Share this post & I'll send you some rewards for the referrals.

Note: This post is based on my research and may differ from real-world implementation.

Once upon a time, 3 people working for PayPal decided to build a dating site.

Yet their business model failed.

So they pivoted to create a video-sharing site and called it YouTube.

They stored video titles, descriptions, and user data in MySQL.

As more users joined, they set up MySQL in leader-follower replication topology to scale.

Leader-Follower Replication Topology in Mysql
Leader-Follower Replication Topology in MySQL

But replication in MySQL is single-threaded.

So followers couldn’t keep up with fresh data on extreme write operations to the leader.

Yet their growth rate was explosive.

Vitess MySQL

And hit a whopping billion users to become the second most visited site in the world.

So they scaled out by adding a cache and preloaded all the events from the MySQL binary log. That means the replication becomes memory-bound and faster.

Although it temporarily solved their scalability issue, there were new problems.

Here are some of them:

1. Sharding:

MySQL must be partitioned to handle storage needs.

But transactions and joins become difficult after sharding.

So application logic should handle it.

This means application logic should find what shards to query.

And that increases the chance of downtime.

2. Performance:

The leader-follower replication topology causes stale data to be read from followers.

So application logic must route the reads to the leader if fresh data is necessary.

And this needs extra logic implementation.

3. Protection:

There’s a risk of some queries taking too long to return data.

Also too many MySQL connections at once can be problematic.

And might take down the database.

system design newsletter

Vitess MySQL

They wanted an abstraction layer on top of MySQL for simplicity and scalability.

So they created Vitess.

Here’s how Vitess offers extreme scalability:

1. Interacting with Database:

They installed a sidecar server in front of each MySQL instance and called it VTTablet.

Vttablet Running as a Sidecar Server
VTTablet Running as a Sidecar Server

It let them:

  • Control MySQL server and manage database backups

  • Rewrite expensive queries by adding the limit clause

  • Cache frequently accessed data to prevent the thundering herd problem

2. Routing SQL Queries:

They set up a stateless proxy server to route the queries and called it VTGate.

Vtgate Routing Queries to a Specific Shard
VTGate Routing Queries to the Specific Shard

It let them:

  • Find the correct VTTablet to route a query based on the schema and sharding scheme

  • Keep the number of MySQL connections low via connection pooling

  • Speak MySQL protocol with the application layer

  • Act like a monolithic MySQL server for simplicity

  • Limit the number of transactions at a time for performance

Scaling With Many VTGate Servers
Scaling With Many VTGate Servers

Besides they run many VTGate servers to scale out.

3. State Information:

They set up a distributed key-value database to store information about schemas, sharding schemes, and roles.

Key-Value Database Storing Meta Information
Key-Value Database Storing Meta Information

Also it takes care of relationships between databases like the leader and followers.

They use Zookeeper to implement the key-value database.

Besides they cache this data on VTGate for better performance.

Updating Key-Value Database
Updating Key-Value Database

They run an HTTP server to keep the key-value database updated. And called it VTctld.

It gets the entire list of servers and their relationships and then updates the key-value database.

system design newsletter

TL;DR:

High-Level Architecture of Vitess
High-Level Architecture of Vitess
  • VTGate: proxy server to route queries

  • Key-Value Database: configuration server for topology management

  • VTTablet: sidecar server running on each MySQL


They wrote Vitess in Go and open-sourced it.

Also it supports MariaDB.

While YouTube was able to serve 2.49 billion users with the Vitess MySQL combination.

This case study shows MySQL can easily handle internet-scale traffic.


👋 PS - Are you unhappy at your current job?

And preparing for system design interviews to get your dream job can be stressful.

Don't worry, I'm working on content to help you pass the system design interview. I'll make it easier - you spend only a few minutes each week to go from 0 to 1. Yet paid subscription fees will be higher than current pledge fees.

So pledge now to get access at a lower price.

"This newsletter is an amazing resource to learn system design." Alex


Consider subscribing to get simplified case studies delivered straight to your inbox:


Author NK; System design case studies
Follow me on LinkedIn | YouTube | Threads | Twitter | Instagram | Bluesky

Thank you for supporting this newsletter. Consider sharing this post with your friends and get rewards. Y’all are the best.

system design newsletter

Share


How Facebook Scaled Live Video to a Billion Users

How Facebook Scaled Live Video to a Billion Users

Neo Kim
·
May 24, 2024
Read full story
How Razorpay Scaled to Handle Flash Sales at 1500 Requests per Second

How Razorpay Scaled to Handle Flash Sales at 1500 Requests per Second

Neo Kim
·
May 17, 2024
Read full story

References

  • Vitess Official Site

  • Vitess Architecture Official Docs

  • Scaling YouTube's Backend: The Vitess Trade-offs - @Scale 2014 - Data

  • Vitess: A Distributed Scalable Database Architecture

  • What Is Vitess?

  • Vitess on GitHub

  • Scalability at YouTube

  • Vitess Supported Databases

  • Do Mysql slaves run multiple threads to read the Relay log to sync up the Master's operation

  • 11 Reasons Why YouTube Was Able to Support 100 Million Video Views a Day With Only 9 Engineers

  • Most Visited Websites In The World (May 2024)

Ankit Ahuja's avatar
Lokendra Bairwa's avatar
Kayla's avatar
Gabriel Anyosa's avatar
Raul Junco's avatar
287 Likes∙
19 Restacks
287

Share this post

The System Design Newsletter
The System Design Newsletter
How YouTube Was Able to Support 2.49 Billion Users With MySQL
Copy link
Facebook
Email
Notes
More
10
19
Share

Discussion about this post

User's avatar
Junaid Effendi's avatar
Junaid Effendi
May 31

I thought they would be using NOSQL just for scaling reason. Interesting.

Expand full comment
Like (2)
Reply
Share
1 reply by Neo Kim
Ashwani Yadav's avatar
Ashwani Yadav
Jun 1

Thanks Neo. Why did they stick to MySQL for all this time? Why didn't they chose NoSQL?

Expand full comment
Like (1)
Reply
Share
8 more comments...
8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers
#1: Learn More - Awesome WhatsApp Engineering (6 minutes)
Aug 27, 2023 â€¢ 
Neo Kim
727

Share this post

The System Design Newsletter
The System Design Newsletter
8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers
Copy link
Facebook
Email
Notes
More
24
How PayPal Was Able to Support a Billion Transactions per Day With Only 8 Virtual Machines
#30: Learn More - Awesome PayPal Engineering (4 minutes)
Dec 26, 2023 â€¢ 
Neo Kim
234

Share this post

The System Design Newsletter
The System Design Newsletter
How PayPal Was Able to Support a Billion Transactions per Day With Only 8 Virtual Machines
Copy link
Facebook
Email
Notes
More
14
How Stripe Prevents Double Payment Using Idempotent API
#45: A Simple Introduction to Idempotent API (4 minutes)
May 9, 2024 â€¢ 
Neo Kim
377

Share this post

The System Design Newsletter
The System Design Newsletter
How Stripe Prevents Double Payment Using Idempotent API
Copy link
Facebook
Email
Notes
More
29

Ready for more?

© 2025 Neo Kim
Publisher Privacy
Substack
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More

Create your profile

User's avatar

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.

User's avatar

Petar Ivanov, a subscriber of The System Design Newsletter, shared this with you.