Discover more from System Design Newsletter
This Is How Quora Shards MySQL to Handle 13+ Terabytes
#2: Must-Know Tips - Best Sharding Techniques for MySQL (8 minutes)
Get the powerful template to approach system design for FREE on newsletter sign-up:
Summer 2005 - California, United States.
Adam D'Angelo became a full-time employee at Facebook.
Later, he transitioned into the role of chief technology officer (CTO) at Facebook.
But he felt that his responsibilities no longer fit with his skills and interests.
That’s reasonable - each person has a different definition of success in life.
I took the one less traveled by,. And that has made all the difference.
- Robert Frost, The Road Not Taken
He wanted to build Quora - a platform for asking and answering questions. So, he took the leap and left Facebook.
Quora has a great engineering team. They implemented state-of-the-art sharding techniques to scale the database layer.
Data storage requirements at Quora are in the order of tens of terabytes. The queries per second (QPS) are around 100 thousand.
They stored the critical data such as questions, answers, upvotes, and comments in MySQL.
They chose MySQL for improved read performance. The NoSQL data store, such as HBase, is implemented with a Log-structured merge (LSM) tree. In general, LSM trees are not optimized for read-heavy workloads.
They wanted to further improve the read performance. So, they introduced a caching layer before the database.
But rapid data growth and high write QPS remain a problem.
Solution? Shard MySQL.
This post outlines the extraordinary story of Quora co-founder Adam D'Angelo and the sharding techniques used to shard MySQL at Quora. Consider sharing this post with somebody who wants to study database scalability patterns.
What Is MySQL Sharding?
A single server can store and handle only a limited amount of data.
Sharding is the technique of storing a large database across many servers. In other words, sharding spreads the workload of a single server across many servers.
Does MySQL Support Sharding?
It is possible to shard MySQL. But there is no support for automatic sharding. So, sharding MySQL is an extra task performed by application engineers.
How to Shard MySQL Database?
They used vertical sharding and horizontal sharding to shard MySQL.
The leader-follower is the most common replication topology in MySQL. The leader handles read-write requests. The followers handle read requests and replicate the leader.
Vertical sharding is the technique of moving a table to a separate server. Vertical sharding allows to keep different tables with different database leaders. This approach improves the write scalability.
A partition is a division of a logical database into distinct independent parts. But Quora has a different definition for partition. According to Quora, a partition is a cluster with a leader and followers.
They stored the following metadata in Apache Zookeeper:
mapping from a partition to the list of tables in that particular partition
mapping from a partition to the list of servers
They moved a large or high-traffic table to a new partition to scale out.
MySQL requires tables to be in the same partition to join them at the database level. But this is not possible on scaling out. So, they joined tables at the application level.
An outline of the drawbacks of vertical sharding is as follows:
replication lag problem on moving out large tables
reduced transactional functionality
degraded performance if the table becomes very large
A popular variant of vertical sharding is splitting a table into many tables. With this approach, some columns are in one table and the rest of the columns are in another table.
But the large tables in a database became problematic for them due to the following reasons:
Schema changes were difficult
Table movement became challenging
Unexpected errors might damage the entire table
Solution? Horizontal sharding is the technique of splitting a logical table into many physical tables.
The key design decisions taken for horizontal sharding at Quora were as follows:
1. Buy or Make Decision
They decided not to use a third-party solution (Vitess). But to build their own sharding solution. The reasons behind their decision were the following:
It was expensive to gain expertise and modify a third-party sharding service. They had fewer than 10 large tables to shard
It was easy to reuse the existing vertical sharding logic
It was easy to support custom API at the database level
There was no need for database-level joins. They performed joins at the application level
There was no extra middleware introduced. This kept latency low
2. Shard Level
There are 2 levels to sharding the database: logical database level and table level.
The shards will live either on the same server or on different servers.
Sharding at the logical database level is pretty straightforward. It creates shards with the same set of tables across each shard.
And sharding at the table level allows to shard only the large tables.
They preferred sharding at the table level. The reason was the extensive use of secondary indexes at Quora.
A secondary index is an index that is not based on the usual primary index and may contain duplicate values.
The secondary index is stored only within a shard. So, the queries relying on secondary indexes may have to query all the shards of the table. This pattern is called the scatter-and-gather pattern. And this makes sharding at the logical database level expensive.
There is a workaround. Convert the secondary index into dedicated tables and then shard them. But this process is not easy.
Sharding at the table level is workable. Because only the queries against sharded tables need to perform scatter and gather.
3. Sharding Method
A logical table is split into many physical tables via sharding. The popular sharding methods are as follows:
Range-based sharding splits database rows based on a range of values. A shard key is assigned to a particular range of values.
Hash-based sharding assigns the shard key to each database row via a hash function.
They preferred range-based sharding because range queries were the most common at Quora.
4. Metadata of Shards
They persisted metadata of the shards in Zookeeper.
5. API to Query Sharded Tables
Their database APIs took in explicit arguments to generate an SQL statement. This was due to security reasons. For example, this approach eliminated queries that are vulnerable to SQL injections.
SQL injection is the placement of malicious code in SQL statements. The SQL statements might come through user input.
They extended the database API to include the sharding column and the sharding key data.
6. Sharding Column
They chose the sharding column based on 2 factors: latency sensitivity and QPS.
A query that uses a non-sharding column must interact with every shard. This results in scatter-and-gather pattern. The latency of the scatter and gather pattern depends on the latency of the slowest shard. This is a recipe for degraded performance.
A potential optimization to this problem is to use a cross-shard index.
A cross-shard index is a separate table that maps a non-sharding column to the sharding column. Give in value of a non-sharding column to a cross-shard index. It finds the corresponding values in the sharding column.
A cross-shard index prevents the scatter and gather pattern in certain cases. For example, if there is only a single value in the sharding column for a given value of a non-sharding column. It queries only one shard.
But an overhead to querying the cross-shared index remains.
And it can also further degrade performance in certain cases. For example, if there are many values in the sharding column for a given value of the non-sharding column. It queries many shards.
There is also an increased risk of inconsistencies in the cross-shard index. This is because it lives in a separate table and atomic updates become difficult.
7. Number of Shards
The number of shards depends on the shard size. The general rule of thumb is to keep the number of shards low.
Why? Queries using a non-sharding column might need to query many shards. So, the latency will degrade if the number of shards increases.
Quora became a successful unicorn startup.
According to Forbes, Adam D' Angelo is worth around 600 million USD in 2023. He is an excellent example of a person who chased dreams to achieve success.
Consider subscribing to get simplified case studies delivered straight to your inbox:
What to Read Next?
Read this post summary in a Twitter Thread.
Next Two Weeks: 🚀 New Posts Incoming!
Last Week: 🚀 Catch up on the latest post!
A huge thank you to 92 people who shared the last post with your friends. Our goal was to reach 2,000 subscribers by 3 September. And we hit 3,000+ subscribers - Welcome new readers!
So let’s up it: 3,500 subscribers by 10 September. Nothing is more exciting than getting a new subscriber.
Special thanks for your encouragement - first supporter! Word-of-mouth referrals like yours help this community grow - Thank you.
You can share your testimonial by posting on Twitter or LinkedIn. Then send me the link by replying to this email. Or, you can reply to this email with an anonymous testimonial.
I stumbled upon this great newsletter that helps you to become an engineering leader. You can read the article on must-know tips for transitioning from a developer to a team lead fromnewsletter.
Quora Engineering. (2020). MySQL sharding at Quora. Quora.
Amazon Web Services. (n.d.). What is Database Sharding?
Manik Chhabra. (2023). Understanding MySQL Sharding Simplified 101. Hevo Data Blog
Wikipedia Contributors (2022). Adam D’Angelo. [online] Wikipedia. Available at: https://en.wikipedia.org/wiki/Adam_D%27Angelo.