5 Reasons Why Zoom Was Able to Support 300 Million Video Calls a Day

#28: Learn More - Awesome Zoom Architecture (4 minutes)

Dec 11, 2023

Error

Get my system design playbook for FREE on newsletter signup:

This post outlines how Zoom architecture supports 300 million video calls per day. If you want to learn more, scroll to the bottom and find the references.

Share this post & I'll send you some rewards for the referrals.

March 2020 - Berlin, Germany.

Annika moved to a new apartment during lockdown.

She has to attend video calls for work.

Yet she has only mobile internet.

So she was sad.

She receives a Zoom meeting invitation from a coworker the next day.

She installed the Zoom app and was mind-blown by its video call quality.

Zoom Architecture

Here’s how Zoom supports 300 million video calls a day:

1. Video Streaming

They do adaptive streaming because each device type needs a different video resolution.

Adjusting resolution based on device type and bandwidth is called adaptive streaming.

And the number of pixels in a video frame is called resolution.

But sending many video streams for different resolutions isn’t scalable.

So they use Scalable Video Coding (SVC) to stream videos.

Imagine SVC as having a video in Lego blocks. The lower blocks contain the basic picture. While upper blocks contain extra details.

Scalable Video Coding; Zoom Architecture — Scalable Video Coding

Put another way, SVC sends a single video stream that is divided into hierarchical layers. Each layer holds a different resolution.

The lower layers contain basic information. While upper layers contain extra information for higher resolution.

The receiving client decodes only specific layers that match its device type.

SVC reduces bandwidth usage because there is only a single video stream.

Also it reduces server CPU usage by avoiding the need to encode and decode many video streams.

So SVC video streaming scales well and provides low latency.

2. Video Processing

They separate video stream processing from routing.

Also they don’t process video streams on the server because it isn’t scalable.

Stream Routing vs Processing; Zoom Architecture — Stream Routing vs Processing

Instead the server only routes the video streams.

While the client processes it.

3. Video Routing

They don’t combine video streams from a video call’s participants on the server.

Instead they send separate video streams from each participant to the client. The client then decodes them.

So it avoids the need for transcoding on the server.

Converting a video to a different format is called transcoding.

Separate Video Streams Zoom Architecture — Each Participant Send Separate Video Streams

They do multimedia routing to send video streams with low latency.

The multimedia router finds the best network paths to send video between participants in a video call.

4. Monitoring Quality of Service

The network can be unreliable especially if the user is on mobile internet.

Quality of Service Zoom Architecture — Client Monitors Quality of Service

So Zoom client monitors the Quality of Service (QoS). It does that by measuring data packet loss and latency.

The client then optimizes the video stream using proprietary algorithms to provide the best user experience.

5. Network Awareness

Video calls need faster delivery of data.

So they use User Datagram Protocol (UDP).

Imagine UDP as a person sending postcards without a confirmation recipient.

It's a lightweight and connectionless protocol.

Also they set up the client to use TCP, HTTPS, and HTTP as a fallback for consistent user experience.

Peer-To-Peer Connection Between 2 Participants in Zoom — Peer-to-Peer Connection Between Two Participants in a Video Call

Zoom uses a peer-to-peer connection if there are only 2 participants in the video call. Because it reduces server load and provides low latency.

They use a client-server architecture.

And run microservices on Amazon Web Services (AWS).

Zoom client connects to the closest data center for low latency.

They use meeting zones to group servers.

While a zone controller manages every activity that occurs within a meeting zone.

They engineered Zoom for video streaming by keeping its architecture simple.

Consider subscribing to get simplified case studies delivered straight to your inbox:

Author NK; System design case studies — **Follow me on LinkedIn | YouTube | Threads | Twitter | Instagram**

Thank you for supporting this newsletter. Consider sharing this post with your friends and get rewards. Y’all are the best.

How to Scale an App to 10 Million Users on AWS

December 6, 2023

Read full story

How Uber Computes ETA at Half a Million Requests per Second

December 3, 2023

Read full story

References

112 Likes∙

10 Restacks

Mindi Weik

Dec 23, 2023

Excellent. It's clear, concise and informative! Thank you 🙌

Expand full comment

A very useful article with valuable information. A lot is happening behind the scenes of our Zoom meetings. So, are meetings end-to-end encrypted? 😄

3 replies by Neo Kim and others

8 more comments...

The System Design Newsletter

How to Scale an App to 10 Million Users on AWS

How Uber Computes ETA at Half a Million Requests per Second

Discussion about this post