Wechat Architecture That Powers 1.67 Billion Monthly Users

#17: Read Now - Architecture Overview Chat Application (4 minutes)

Neo Kim

Oct 24, 2023

Get my system design playbook for FREE on newsletter signup:

This post outlines WeChat's architecture. If you want to learn more, scroll to the bottom and find the references.

Share this post & I'll send you some rewards for the referrals.

WeChat is the most popular messaging app in China with 1.67 billion monthly active users.

They went from 491 concurrent users in 2011 to 400+ million concurrent users.

Their development principle was simple: release first and then optimize. And it allowed them to validate the product idea before scaling it.

Chat Application Architecture

When a message gets sent, the WeChat server stores it because the receiver might be offline. And a notification gets delivered to the receiver when it's back online again.

The receiver then fetches the message from the server.

Message queue; Chat Application Architecture — Message Queue Delivering Messages

They used an asynchronous message queue to deliver chat messages to the receiver. Because processing time is higher in some use cases.

For example, message delivery in group chat is a time-consuming operation.

Also they restricted the number of people in a group chat to 500. And tracked each person’s last read message index to deliver unread messages.

WeChat servers run C++. And services communicate with each other via Svrkit: an RPC framework.

They built WeChat with N-layer Architecture: Access, Logic, and Storage.

N-layer architecture partitions application logic into different logical layers and offers scalability.

WeChat Architecture; Chat Application Architecture — WeChat N-layer Architecture

Access Layer

It handled network calls: client-initiated requests and server-initiated pushes.

I don’t know the network protocol that WeChat uses but WebSocket is a good choice. Because it offers bi-directional communication between the client and the server.

Logic Layer

It contains business logic and provides an abstract interface to the client.

They broke the business logic into separate modules based on functionality and importance. And kept the modules independently deployable.

Storage Layer

It provided data access and included databases: MySQL and SDB.

SDB is a simple string key-value database.

Besides they used Memcached to improve data access efficiency and reduce database calls.

They stored the last-read message index and contacts of a person in SDB. Because it's highly performant and reliable.

And ran SDB in the leader-follower (asynchronous) replication topology. SDB followers handled reads when the leader crashed.

They stored account data and chat messages in MySQL. And ran MySQL in multi-leader topology to scale writes.

They routed reads to MySQL followers. But reads that needed strong consistency got routed to MySQL leader.

Yet replication lag and the single point of failure remained a problem with this setup.

So they installed the KVSvr algorithm on top of MySQL and SDB.

KVSvr is a distributed algorithm based on the Quorum protocol. It gives a strong consistency guarantee, asynchronous data replication, and high write performance. Also it cached data for improved read performance.

Data Synchronization

A person’s messages, contacts, and account data get stored on the server. And the client needs to synchronize it.

So they created a snapshot of the data on the server and sent it to the client. The snapshot consisted of key-value pairs.

Multi Data Center

Their initial data center was in Shanghai and they wanted to grow further. So they installed extra data centers in Hong Kong and Canada.

They wanted each data center to be self-independent. So they deployed every service in each data center. It allowed them to route traffic to a healthy data center if one of them failed.

Yet data consistency between data centers remained a problem. Because the latency between data centers was high.

Also they needed to avoid business logic problems due to eventual consistency.

So they segmented users. The traffic from China got routed to the Shanghai data center. While international users got routed to data centers outside China.

And data got replicated asynchronously between data centers. This setup reduced complexity and prevented consistency problems.

Sync Multi Data Center; Chat Application Architecture — Synchronization between Data Centers

They used a Quorum-based queue to synchronize data between data centers for reliability.

And coordinated operations across data centers for special cases such as global unique account ID creation.

Replication; Chat Application Architecture — Data Replication across Data Centers

They decided to use the eventual consistency model for group chat because it met their needs.

WeChat was extended to support voice messages, games, and mobile payment. And it has become China's app for everything.

There are still many open questions about WeChat architecture. But I couldn't find extra information about it. So please share in the comments if you find anything helpful.

Consider subscribing to get simplified case studies delivered straight to your inbox:

Author NK; System design case studies — **Follow me on LinkedIn | YouTube | Threads | Twitter | Instagram**

A big thank you to everybody who supports this newsletter. Consider sharing this post with your friends and get rewards. Y’all are the best.

How LinkedIn Scaled to 930 Million Users

October 17, 2023

Read full story

How Shopify Handles Flash Sales at 32 Million Requests per Minute

October 19, 2023

Read full story

References

https://www.infoq.cn/article/the-road-of-the-growth-weixin-background
https://github.com/radareorg/sdb
https://en.wikipedia.org/wiki/WeChat

28 Likes∙

1 Restack

Omar Al Raisi

Awesome work

Expand full comment

Like (2)

1 reply by Neo Kim

Toan Tran

Nov 29, 2023Edited

Thank you for the article.

I would like to clarify information here

> When a message gets sent, the WeChat server stores it because the receiver might be offline. And a notification gets delivered to the receiver when it's back online again.

From WeChat document, it seems like they will only store your message in case the receiver is offline. They store messages in a short amount of time

https://help.wechat.com/cgi-bin/micromsg-bin/oshelpcenter?t=help_center/topic_detail&opcode=2&plat=2&lang=en&id=160317aebr7v160317e6jj22&Channel=helpcenter

> We do not permanently retain the content of any messages on our servers whether they are text, audio or rich media files such as photos, videos, or documents, unless you or your recipient saves them as a Favorite. Once 72 hours has lapsed since you sent your chat message, or 120 hours for images, audio, videos, and files, WeChat permanently deletes the content of the message on our servers. Upon deletion, neither WeChat nor any and third party will be able to view the content of your message.

Like (1)

2 more comments...

The System Design Newsletter

How LinkedIn Scaled to 930 Million Users

How Shopify Handles Flash Sales at 32 Million Requests per Minute

Discussion about this post