System Design Newsletter

Share this post

8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers

newsletter.systemdesign.one

Discover more from System Design Newsletter

Weekly newsletter on system design. Get the powerful system design template for FREE
Over 18,000 subscribers
Continue reading
Sign in

8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers

#1: Learn More - Awesome WhatsApp Engineering (6 minutes)

NK
Aug 27, 2023
129
Share this post

8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers

newsletter.systemdesign.one
5
Share

Get the powerful template to approach system design for FREE on newsletter sign-up:


This post outlines the incredible story of WhatsApp co-founder Jan Koum. And the engineering techniques used to scale WhatsApp. Consider sharing this post with somebody who wants to study scalability patterns.

January 2008 - California, United States.

Jan Koum, an engineer at Yahoo, applies for work at Facebook - rejected.

This was not the end - he moved on with his life.

He buys an iPhone next year and immediately recognizes the huge potential of the new App Store.

So he decided to build an instant messenger with some of his former coworkers from Yahoo. And named it WhatsApp. The vision behind WhatsApp was to replace the expensive SMS.

With 1 million people signing up each day, the growth rate of WhatsApp was mind-boggling.

WhatsApp was able to support 50 billion messages a day from 450 million daily active users. And they did it with only 32 engineers.

Although explosive product growth is a good problem to have, Jan Koum and the WhatsApp team had to adopt the best engineering practices to overcome the challenges.


WhatsApp Engineering

WhatsApp engineering practices to meet extreme scalability were:

1. Single Responsibility Principle

They put product focus only on the core feature - messaging.

And didn’t bother to build an advertising network or a social media platform.

WhatsApp Engineering; Single responsibility principle
Single responsibility principle

Also they eliminated feature creep at all costs.

Feature creep occurs when you add excessive features to a product. And make it difficult to use.

Besides they focused on the reliability of WhatsApp over everything else.

2. Technology Stack

They used Erlang to build the core functionalities of WhatsApp servers. Because it:

  • Provides high scalability with a tiny footprint

  • And supports hot-loading

Threads are a native feature of Erlang. But in Java or C++ threads belong to the operating system. So there is no need to save the entire CPU state in Erlang. And this makes context switching cheaper.

Hot loading makes it easier to deploy code changes without a server restart. Or traffic redirection. In simple words, Hot loading offers high availability.

3. Why Reinvent the Wheel?

Don’t reinvent the wheel - either use open source or buy a commercial solution.

WhatsApp Engineering; Do not reinvent the wheel
Don’t reinvent the wheel

Ejabberd is an open-source real-time messaging server written in Erlang.

And they built WhatsApp on top of ejabberd. Also they rewrote some of the ejabberd core components to meet their needs.

Besides WhatsApp leveraged third-party services such as Google Push to provide push notifications.

4. Cross-Cutting Concerns

They put huge emphasis on cross-cutting concerns to improve product quality.

Cross-cutting concerns are things that affect many parts of a product. And are hard to separate. For example, monitoring and alerting the health of the services.

WhatsApp engineering; Cross-cutting concerns
Cross-cutting concerns

And they improved the software development process with Continuous integration and Continuous delivery.

Continuous integration is the process of merging the code changes regularly into a central repository.

Continuous delivery is the process of code deployment to a testing or production environment.

5. Scalability

WhatsApp used diagonal scaling to keep the costs and operational complexity low.

Horizontal scaling is the process of increasing the number of machines in the resource pool.

Vertical scaling is the process of increasing the capacity of an existing machine, such as the CPU or memory.

And diagonal scaling is a hybrid of horizontal and vertical scaling. The computing resources get added both vertically and horizontally.

WhatsApp engineering; Scalability
Scalability

They ran WhatsApp servers on the FreeBSD operating system. Because they had previous experience with FreeBSD while working at Yahoo. Besides FreeBSD offered a reliable network stack.

Also they fine-tuned FreeBSD to accommodate 2 million+ connections per server. And modified kernel parameters such as files and sockets.

They overprovisioned servers to handle sudden traffic spikes and keep headroom for failures. For example, failures such as network partitions or hardware faults.

6. Flywheel Effect

They measured the metrics such as CPU, context switches, and system calls. Then identified and eliminated the bottlenecks. And they did this at regular intervals.

WhatsApp Engineering; Continuous feedback cycle
Continuous feedback cycle

The continuous feedback cycle tremendously improved the performance of WhatsApp.

7. Quality

They used load testing to identify single points of failure.

Load testing is the process of measuring the performance of the system under the anticipated load.

WhatsApp Engineering; Load testing
Load testing

And they used artificial production traffic and DNS configuration changes for load testing.

8. Small Team Size

The communication paths between engineers increase quadratically as the team size grows. This is a recipe for degraded productivity.

WhatsApp Engineering; Communication paths between engineers
Communication paths between engineers

So they kept the team size small - 32 engineers.


WhatsApp is one of the most successful instant messengers in the market.

In 2014, the same Facebook that rejected Jan Koum acquired WhatsApp for a whopping 19 billion USD.

According to Forbes, Jan Koum has a net worth of 14 billion USD in 2023.


Consider subscribing to get simplified case studies delivered straight to your inbox:


Loading...

Read this post summary: Twitter Thread, LinkedIn Post, and Substack Notes.

Next Two Weeks: 🚀 New Posts Incoming!

This Is How Quora Shards MySQL to Handle 13+ Terabytes

This Is How Quora Shards MySQL to Handle 13+ Terabytes

NK
·
Sep 3
Read full story
Tumblr Shares Database Migration Strategy With 60+ Billion Rows

Tumblr Shares Database Migration Strategy With 60+ Billion Rows

NK
·
Sep 10
Read full story

A huge thank you to everybody who shared their support in the last week’s email. You are now 1,823 subscribers strong, very close to 2k. I think we can get to 2,000 subscribers by September 3rd. Y’all are the best.

Share


Word-of-mouth referrals like yours help this community grow - Thank you.

Testimonial from a wonderful subscriber

Get featured in the newsletter: Write your feedback on this post. And tag me on Twitter, LinkedIn, and Substack Notes. Or, you can reply to this email with anonymous feedback.


  • http://highscalability.com/blog/2014/2/26/the-whatsapp-architecture-facebook-bought-for-19-billion.html

  • https://www.shopify.com/partners/blog/feature-creep

  • https://stackoverflow.com/questions/2708033/technically-why-are-processes-in-erlang-more-efficient-than-os-threads

  • https://www.ejabberd.im/index.html

  • https://en.wikipedia.org/wiki/Jan_Koum

  • https://www.atlassian.com/continuous-delivery/principles/continuous-integration-vs-delivery-vs-deployment

  • https://www.nops.io/blog/horizontal-vs-vertical-scaling/

  • https://www.javatpoint.com/scaling-in-cloud-computing

  • https://www.businessinsider.com/whatsapp-built-using-erlang-and-freebsd-2015-10

  • https://www.blazemeter.com/blog/performance-testing-vs-load-testing-vs-stress-testing

  • Thumbnail Photo by Anton from Pexels

  • Rick Reed. WhatsApp: Half a billion unsuspecting FreeBSD users

  • Anton Lavrik. (2018) A Reflection on Building the WhatsApp Server - Code BEAM

129
Share this post

8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers

newsletter.systemdesign.one
5
Share
Previous
Next
5 Comments
Share this discussion

8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers

newsletter.systemdesign.one
Math police
Aug 28Liked by NK

Hi,

You write "The communication paths between engineers increase exponentially as the team grows in size.", but in fact the total number of communication paths between n nodes is equal to n * (n-1) / 2, which is not an exponential growth, but an polynomial growth (quadratic to be exact), i.e. O(n^2).

Expand full comment
Reply
Share
2 replies by NK and others
Gerard Beaubrun
18 hrs agoLiked by NK

I learned so much reading this. Also thank you for the bibliography/works cited.

While diagonal Scaling is an interesting infrastructure scaling, the use of load testing and the small team size were eye-opening

This allows me to delve deeper into some of mentioned concepts

Expand full comment
Reply
Share
1 reply by NK
3 more comments...
Top
New
Community

No posts

Ready for more?

© 2023 NK
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing