5 Reasons Why Zoom Was Able to Support 300 Million Video Calls a Day
#28: Learn More - Awesome Zoom Architecture (4 minutes)
Get the powerful template to approach system design for FREE on newsletter sign-up:
This post outlines how Zoom architecture supports 300 million video calls per day. Consider sharing this post with somebody who wants to study system design.
March 2020 - Berlin, Germany.
Annika moved to a new apartment during lockdown.
She has to attend video calls for work.
Yet she has only mobile internet.
So she was sad.
She receives a Zoom meeting invitation from a coworker the next day.
She installed the Zoom app and was mind-blown by its video call quality.
Here’s how Zoom supports 300 million video calls a day:
1. Video Streaming
They do adaptive streaming because each device type needs a different video resolution.
Adjusting resolution based on device type and bandwidth is called adaptive streaming.
And the number of pixels in a video frame is called resolution.
But sending many video streams for different resolutions isn’t scalable.
So they use Scalable Video Coding (SVC) to stream video.
Imagine SVC as having a video in Lego blocks. The lower blocks contain the basic picture. While upper blocks contain extra details.
Put another way, SVC sends a single video stream that is divided into hierarchical layers. Each layer holds a different resolution.
The lower layers contain basic information. While upper layers contain extra information for higher resolution.
The receiving client decodes only specific layers that match its device type.
SVC reduces bandwidth usage because there is only a single video stream.
Also it reduces server CPU usage by avoiding the need to encode and decode many video streams.
So SVC video streaming scales well and provides low latency.
2. Video Processing
They separate video stream processing from routing.
Also they don’t process video streams on the server because it isn’t scalable.
Instead the server only routes the video streams.
While the client processes it.
3. Video Routing
They don’t combine video streams from a video call’s participants on the server.
Instead they send separate video streams from each participant to the client. The client then decodes them.
So it avoids the need for transcoding on the server.
Converting a video to a different format is called transcoding.
They do multimedia routing to send video streams with low latency.
The multimedia router finds the best network paths to send video between participants in a video call.
4. Monitoring Quality of Service
The network can be unreliable especially if the user is on mobile internet.
So Zoom client monitors the Quality of Service (QoS). It does that by measuring data packet loss and latency.
The client then optimizes the video stream using proprietary algorithms to provide the best user experience.
5. Network Awareness
Video calls need faster delivery of data.
So they use User Datagram Protocol (UDP).
Imagine UDP as a person sending postcards without a confirmation recipient.
It's a lightweight and connectionless protocol.
Also they set up the client to use TCP, HTTPS, and HTTP as a fallback for consistent user experience.
Zoom uses a peer-to-peer connection if there are only 2 participants in the video call. Because it reduces server load and provides low latency.
They use a client-server architecture.
And run microservices on Amazon Web Services (AWS).
Zoom client connects to the closest data center for low latency.
They use meeting zones to group servers.
While a zone controller manages every activity that occurs within a meeting zone.
They engineered Zoom for video streaming by keeping its architecture simple.
Consider subscribing to get simplified case studies delivered straight to your inbox:
Thank you for supporting this newsletter. Consider sharing this post with your friends and get rewards. Y’all are the best.