11 Reasons Why YouTube Was Able to Support 100 Million Video Views a Day With Only 9 Engineers

#5: Read Now - YouTube Top 11 Scalability Techniques (6 minutes)

Sep 16, 2023

Get the powerful template to approach system design for FREE on newsletter sign-up:

This post outlines YouTube's scalability in its early days. If you want to learn more, scroll to the bottom and find the references.

Share this post & I'll send you some rewards for the referrals.

February 2005 - California, United States.

3 early employees from PayPal wanted to build a platform to share videos.

They co-founded YouTube in their garage.

Yet they had limited financial resources. So they funded YouTube through credit card debt and infrastructure borrowing. The financial limitations forced them to create innovative scalability techniques.

In the next year, they hit 100 million video views a day. And they did this with only 9 engineers.

YouTube Scalability

Here are the 11 YouTube scalability techniques:

1. Flywheel Effect

They took a scientific approach to scalability: collect and analyze system data.

Scalability loop; YouTube scalability — Scalability Loop

Their workflow was a constant loop: identify and fix bottlenecks.

This approach avoided the need for high-end hardware and reduced hardware costs.

2. Tech Stack

They kept their tech stack simple and used proven technologies.

YouTube tech stack; YouTube Scalability — YouTube Tech Stack

MySQL stored metadata: video titles, tags, descriptions, and user data. Because it was easy to fix issues in MySQL.

Lighttpd web server served video.

Suse Linux as the operating system. They used Linux tools to inspect the system behavior: strace, ssh, rsync, vmstat, and tcpdump.

Python ran on the application server. Because it offered many reusable libraries and they didn’t want to reinvent the wheel. In other words, Python allowed rapid and flexible development. Python was never a bottleneck based on their measurements.

Yet they used Python-to-C compiler and C-language extensions to run CPU-intensive tasks.

3. Keep It Simple

They considered software architecture to be the root of scalability. They didn’t follow buzzwords to scale. But kept the architecture simple - making code reviews easier. And it allowed them to rearchitect fast to meet changing needs. For example, they pivoted from a dating site to a video-sharing site.

They kept the network path simple. Because network appliances have scalability limitations.

Hardware costs; YouTube Scalability — Hardware Costs

Also they used commodity hardware. It allowed them to reduce power consumption and maintenance fees - and keep the costs low.

Besides they kept the scale-aware code opaque to application development.

4. Choose Your Battles

They outsourced their problems. Because they wanted to focus on important things. They didn’t have the time or resources to build their infrastructure to serve popular videos. So they put the popular videos on a third-party CDN. The benefits:

Low latency due to fewer network hops from the user
High performance because it served videos from memory
High availability because of automatic replication

They served less popular videos from a colocated data center. And used the software RAID to improve the performance through multi-disk access parallelism. Also tweaked their servers to prevent cache thrashing.

They kept their infrastructure in a colocated data center for 2 reasons. To tweak servers with ease to meet their needs and to negotiate their contracts.

Choose your fights; YouTube Scalability — Outsourcing Problems to Free Up Resources

Each video had 4 thumbnails. So, they faced problems in serving small objects: lots of disk seeks, and filesystem limits. So, they put thumbnails in BigTable. It is a distributed data store with many benefits:

Avoids small file problems by clustering files
Improved performance
Low latency with a multi-level cache
Easy to provision

Also they faked data to prevent expensive transactions. For example, they faked the video view count and asynchronously updated the counter. A popular technique today to approximate correctness is the bloom filter. It’s a probabilistic data structure.

5. Pillars of Scalability

They relied on 3 pillars of scalability: stateless, replication, and partitioning.

Pillars of scalability; YouTube Scalability — 3 Pillars of Scalability

They kept their web servers stateless. And scaled it out via replication.

They replicated the database server for read scalability and high availability. And load balanced the traffic among replicas. But this approach caused problems: replication lag and issues with write scalability.

replication vs partitioning; YouTube Scalability — Replication vs Partitioning

So they partitioned the database for improved write scalability, cache locality, and performance. Also it reduced their hardware costs by 30%.

Besides they studied data access patterns to determine the partition level. For example, they studied popular queries, joins, and transactional consistency. And chose the user as the partition level.

6. Solid Engineering Team

A knowledgeable team is an important asset to scalability.

engineering team; YouTube Scalability — Cross-disciplinary Team

They kept the team size small for improved communication - 9 engineers. And their team was great at cross-disciplinary skills.

7. Don’t Repeat Yourself

They used cache to prevent repeating expensive operations. It allowed them to scale reads.

Cache; YouTube Scalability — Multi-level Cache to Scale

Also they implemented caching at many levels - it reduced latency.

8. Rank Your Stuff

Pareto principle; YouTube Scalability — Rank Important Traffic; 80-20 Rule

They ranked video-watch traffic over everything else. So they kept a dedicated cluster of resources for video-watch traffic. It provided high availability.

9. Prevent the Thundering Herd

The thundering herd problem occurs if many concurrent clients query a server. It degrades performance.

Thundering herd; YouTube Scalability — The Thundering Herd Problem

So they added jitter to prevent the thundering herd problem. For example, they added jitter to the cache expiry of popular videos.

10. Play the Long Game

They focused on macro-level of things: algorithms, and scalability. They did quick hacks to buy more time to build long-term solutions. For example, stubbing bad API with Python to prevent short-term problems.

Take risks; YouTube Scalability — Take Risks

They tolerated imperfection in their components. When hit a bottleneck: they either rewrote the component or got rid of it.

They traded off efficiency for scalability. For example:

They chose Python over C
They kept clear boundaries between components to scale out. And tolerated latency
They optimized the software to be fast enough. But didn’t obsess with machine efficiency
They served video from a server location based on bandwidth availability. And not based on latency

11. Adaptive Evolution

They tweaked the system to meet their needs. Examples:

Critical components used RPC instead of HTTP REST. It improved performance
Custom BSON as the data serialization format. It offered high-performance
Eventual consistency in certain parts of the application for scalability. For example, the read-your-writes consistency model for user comments
Studied Python to prevent common pitfalls. Also relied on profiling
Customized open-source software
Optimized database queries
Made non-critical real-time tasks asynchronous

Coding principles; YouTube Scalability — Coding Principles

They didn’t waste time writing code to restrict people. Instead adopted great engineering practices - coding conventions to improve their code structure.

Google acquired YouTube in 2006. And they remain the market leader in video sharing with 5 billion video views a day.

According to Forbes, the founders of YouTube have a net worth of 100+ million USD.

👋 PS - Are you unhappy at your current job?

While preparing for system design interviews to get your dream job can be stressful.

Don't worry, I'm working on content to help you pass the system design interview. I'll make it easier - you spend only a few minutes each week to go from 0 to 1. Yet paid subscription fees will be higher than current pledge fees.

So pledge now to get access at a lower price.

“An excellent newsletter to learn system design through practical case studies.” Franco

Consider subscribing to get simplified case studies delivered straight to your inbox:

Author NK; System design case studies — **Follow me on LinkedIn | YouTube | Threads | Twitter | Instagram | Bluesky**

Thank you for supporting this newsletter. Consider sharing this post with your friends and get rewards. Y’all are the best.

8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers

August 27, 2023

Read full story

Tumblr Shares Database Migration Strategy With 60+ Billion Rows

September 10, 2023

Read full story

My latest favorite post is the Realtime Chat Client and Server Challenge by

John Crickett

. His newsletter aims at upskilling software engineers through real-world coding challenges. So, consider subscribing to his weekly newsletter Coding Challenges.

Word-of-mouth referrals like yours help this community grow - Thank you.

Get featured in the newsletter: Write your feedback on this post. And tag me on Twitter, LinkedIn, and Substack Notes. Or, you can reply to this email with anonymous feedback.

References

Seattle Conference on Scalability: YouTube Scalability. (2012). YouTube. Available at: YouTube [Accessed 14 Sep. 2023].
www.youtube.com. (n.d.). Scalability at YouTube. [online] Available at YouTube [Accessed 14 Sep. 2023].
didip (2008). Super Sizing Youtube with Python. [online] Available at: https://www.slideshare.net/didip/super-sizing-youtube-with-python.
Wikipedia Contributors (2019). YouTube. [online] Wikipedia. Available at: https://en.wikipedia.org/wiki/YouTube.
highscalability.com. (n.d.). YouTube Architecture - High Scalability -. [online] Available at: http://highscalability.com/youtube-architecture.