The System Design Newsletter

The System Design Newsletter

Share this post

The System Design Newsletter
The System Design Newsletter
11 Reasons Why YouTube Was Able to Support 100 Million Video Views a Day With Only 9 Engineers
Copy link
Facebook
Email
Notes
More
User's avatar
Discover more from The System Design Newsletter
Weekly newsletter to help busy engineers become good at system design
Over 147,000 subscribers
Already have an account? Sign in

11 Reasons Why YouTube Was Able to Support 100 Million Video Views a Day With Only 9 Engineers

#5: Read Now - YouTube Top 11 Scalability Techniques (6 minutes)

Neo Kim's avatar
Neo Kim
Sep 16, 2023
39

Share this post

The System Design Newsletter
The System Design Newsletter
11 Reasons Why YouTube Was Able to Support 100 Million Video Views a Day With Only 9 Engineers
Copy link
Facebook
Email
Notes
More
2
Share

Get the powerful template to approach system design for FREE on newsletter sign-up:


This post outlines YouTube's scalability in its early days. If you want to learn more, scroll to the bottom and find the references.

  • Share this post & I'll send you some rewards for the referrals.

February 2005 - California, United States.

3 early employees from PayPal wanted to build a platform to share videos.

They co-founded YouTube in their garage.

Yet they had limited financial resources. So they funded YouTube through credit card debt and infrastructure borrowing. The financial limitations forced them to create innovative scalability techniques.

In the next year, they hit 100 million video views a day. And they did this with only 9 engineers.


YouTube Scalability

Here are the 11 YouTube scalability techniques:

1. Flywheel Effect

They took a scientific approach to scalability: collect and analyze system data.

Scalability loop; YouTube scalability
Scalability Loop

Their workflow was a constant loop: identify and fix bottlenecks.

This approach avoided the need for high-end hardware and reduced hardware costs.

2. Tech Stack

They kept their tech stack simple and used proven technologies.

YouTube tech stack; YouTube Scalability
YouTube Tech Stack

MySQL stored metadata: video titles, tags, descriptions, and user data. Because it was easy to fix issues in MySQL.

Lighttpd web server served video.

Suse Linux as the operating system. They used Linux tools to inspect the system behavior: strace, ssh, rsync, vmstat, and tcpdump.

Python ran on the application server. Because it offered many reusable libraries and they didn’t want to reinvent the wheel. In other words, Python allowed rapid and flexible development. Python was never a bottleneck based on their measurements.

Yet they used Python-to-C compiler and C-language extensions to run CPU-intensive tasks.

3. Keep It Simple

They considered software architecture to be the root of scalability. They didn’t follow buzzwords to scale. But kept the architecture simple - making code reviews easier. And it allowed them to rearchitect fast to meet changing needs. For example, they pivoted from a dating site to a video-sharing site.

They kept the network path simple. Because network appliances have scalability limitations.

Hardware costs; YouTube Scalability
Hardware Costs

Also they used commodity hardware. It allowed them to reduce power consumption and maintenance fees - and keep the costs low.

Besides they kept the scale-aware code opaque to application development.

4. Choose Your Battles

They outsourced their problems. Because they wanted to focus on important things. They didn’t have the time or resources to build their infrastructure to serve popular videos. So they put the popular videos on a third-party CDN. The benefits:

  • Low latency due to fewer network hops from the user

  • High performance because it served videos from memory

  • High availability because of automatic replication

They served less popular videos from a colocated data center. And used the software RAID to improve the performance through multi-disk access parallelism. Also tweaked their servers to prevent cache thrashing.

They kept their infrastructure in a colocated data center for 2 reasons. To tweak servers with ease to meet their needs and to negotiate their contracts.

Choose your fights; YouTube Scalability
Outsourcing Problems to Free Up Resources

Each video had 4 thumbnails. So, they faced problems in serving small objects: lots of disk seeks, and filesystem limits. So, they put thumbnails in BigTable. It is a distributed data store with many benefits:

  • Avoids small file problems by clustering files

  • Improved performance

  • Low latency with a multi-level cache

  • Easy to provision

Also they faked data to prevent expensive transactions. For example, they faked the video view count and asynchronously updated the counter. A popular technique today to approximate correctness is the bloom filter. It’s a probabilistic data structure.

5. Pillars of Scalability

They relied on 3 pillars of scalability: stateless, replication, and partitioning.

Pillars of scalability; YouTube Scalability
3 Pillars of Scalability

They kept their web servers stateless. And scaled it out via replication.

They replicated the database server for read scalability and high availability. And load balanced the traffic among replicas. But this approach caused problems: replication lag and issues with write scalability.

replication vs partitioning; YouTube Scalability
Replication vs Partitioning

So they partitioned the database for improved write scalability, cache locality, and performance. Also it reduced their hardware costs by 30%.

Besides they studied data access patterns to determine the partition level. For example, they studied popular queries, joins, and transactional consistency. And chose the user as the partition level.

6. Solid Engineering Team

A knowledgeable team is an important asset to scalability.

engineering team; YouTube Scalability
Cross-disciplinary Team

They kept the team size small for improved communication - 9 engineers. And their team was great at cross-disciplinary skills.

7. Don’t Repeat Yourself

They used cache to prevent repeating expensive operations. It allowed them to scale reads.

Cache; YouTube Scalability
Multi-level Cache to Scale

Also they implemented caching at many levels - it reduced latency.

8. Rank Your Stuff

Pareto principle; YouTube Scalability
Rank Important Traffic; 80-20 Rule

They ranked video-watch traffic over everything else. So they kept a dedicated cluster of resources for video-watch traffic. It provided high availability.

9. Prevent the Thundering Herd

The thundering herd problem occurs if many concurrent clients query a server. It degrades performance.

Thundering herd; YouTube Scalability
The Thundering Herd Problem

So they added jitter to prevent the thundering herd problem. For example, they added jitter to the cache expiry of popular videos.

10. Play the Long Game

They focused on macro-level of things: algorithms, and scalability. They did quick hacks to buy more time to build long-term solutions. For example, stubbing bad API with Python to prevent short-term problems.

Take risks; YouTube Scalability
Take Risks

They tolerated imperfection in their components. When hit a bottleneck: they either rewrote the component or got rid of it.

They traded off efficiency for scalability. For example:

  • They chose Python over C

  • They kept clear boundaries between components to scale out. And tolerated latency

  • They optimized the software to be fast enough. But didn’t obsess with machine efficiency

  • They served video from a server location based on bandwidth availability. And not based on latency

11. Adaptive Evolution

They tweaked the system to meet their needs. Examples:

  • Critical components used RPC instead of HTTP REST. It improved performance

  • Custom BSON as the data serialization format. It offered high-performance

  • Eventual consistency in certain parts of the application for scalability. For example, the read-your-writes consistency model for user comments

  • Studied Python to prevent common pitfalls. Also relied on profiling

  • Customized open-source software

  • Optimized database queries

  • Made non-critical real-time tasks asynchronous

Coding principles; YouTube Scalability
Coding Principles

They didn’t waste time writing code to restrict people. Instead adopted great engineering practices - coding conventions to improve their code structure.


Google acquired YouTube in 2006. And they remain the market leader in video sharing with 5 billion video views a day.

According to Forbes, the founders of YouTube have a net worth of 100+ million USD.


👋 PS - Are you unhappy at your current job?

While preparing for system design interviews to get your dream job can be stressful.

Don't worry, I'm working on content to help you pass the system design interview. I'll make it easier - you spend only a few minutes each week to go from 0 to 1. Yet paid subscription fees will be higher than current pledge fees.

So pledge now to get access at a lower price.

“An excellent newsletter to learn system design through practical case studies.” Franco


Consider subscribing to get simplified case studies delivered straight to your inbox:


Author NK; System design case studies
Follow me on LinkedIn | YouTube | Threads | Twitter | Instagram | Bluesky

Thank you for supporting this newsletter. Consider sharing this post with your friends and get rewards. Y’all are the best.

system design newsletter

Share


8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers

8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers

NK
·
August 27, 2023
Read full story
Tumblr Shares Database Migration Strategy With 60+ Billion Rows

Tumblr Shares Database Migration Strategy With 60+ Billion Rows

NK
·
September 10, 2023
Read full story

My latest favorite post is the Realtime Chat Client and Server Challenge by

John Crickett
. His newsletter aims at upskilling software engineers through real-world coding challenges. So, consider subscribing to his weekly newsletter Coding Challenges.


Word-of-mouth referrals like yours help this community grow - Thank you.

Feedback from a wonderful reader

Get featured in the newsletter: Write your feedback on this post. And tag me on Twitter, LinkedIn, and Substack Notes. Or, you can reply to this email with anonymous feedback.


References

  • Seattle Conference on Scalability: YouTube Scalability. (2012). YouTube. Available at: YouTube [Accessed 14 Sep. 2023].

  • www.youtube.com. (n.d.). Scalability at YouTube. [online] Available at YouTube [Accessed 14 Sep. 2023].

  • didip (2008). Super Sizing Youtube with Python. [online] Available at: https://www.slideshare.net/didip/super-sizing-youtube-with-python.

  • Wikipedia Contributors (2019). YouTube. [online] Wikipedia. Available at: https://en.wikipedia.org/wiki/YouTube.

  • highscalability.com. (n.d.). YouTube Architecture - High Scalability -. [online] Available at: http://highscalability.com/youtube-architecture.


Subscribe to The System Design Newsletter

By Neo Kim · Launched 2 years ago
Weekly newsletter to help busy engineers become good at system design
Peter Salvitti's avatar
Vishal Jain's avatar
Muhammad Afzal's avatar
Prathamesh Dukare's avatar
Gregor Ojstersek's avatar
39 Likes
39

Share this post

The System Design Newsletter
The System Design Newsletter
11 Reasons Why YouTube Was Able to Support 100 Million Video Views a Day With Only 9 Engineers
Copy link
Facebook
Email
Notes
More
2
Share

Discussion about this post

User's avatar
SWE's avatar
SWE
Nov 8, 2023

Good summary

Expand full comment
Like (1)
Reply
Share
Akshay Soni's avatar
Akshay Soni
Mar 21, 2024

Great Article

Expand full comment
Like
Reply
Share
8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers
#1: Learn More - Awesome WhatsApp Engineering (6 minutes)
Aug 27, 2023 • 
Neo Kim
724

Share this post

The System Design Newsletter
The System Design Newsletter
8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers
Copy link
Facebook
Email
Notes
More
24
How PayPal Was Able to Support a Billion Transactions per Day With Only 8 Virtual Machines
#30: Learn More - Awesome PayPal Engineering (4 minutes)
Dec 26, 2023 • 
Neo Kim
234

Share this post

The System Design Newsletter
The System Design Newsletter
How PayPal Was Able to Support a Billion Transactions per Day With Only 8 Virtual Machines
Copy link
Facebook
Email
Notes
More
14
How Stripe Prevents Double Payment Using Idempotent API
#45: A Simple Introduction to Idempotent API (4 minutes)
May 9, 2024 • 
Neo Kim
376

Share this post

The System Design Newsletter
The System Design Newsletter
How Stripe Prevents Double Payment Using Idempotent API
Copy link
Facebook
Email
Notes
More
29

Ready for more?

© 2025 Neo Kim
Publisher Privacy
Substack
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More

Create your profile

User's avatar

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.