How LinkedIn Adopted Protocol Buffers to Reduce Latency by 60%
#12: You Need to Read This - Awful JSON Serialization (5 minutes)
Get the powerful template to approach system design for FREE on newsletter sign-up:
LinkedIn uses microservices architecture because the number of daily requests they receive is in the range of billions. Also they needed to scale.
But microservices architecture increased their network calls and degraded latency.
This post outlines how LinkedIn adopted Protocol Buffers to reduce latency. I think it’s important to understand data serialization and Protocol Buffers first. So I’ll teach you the basics first.
What Is Data Serialization?
Imagine a get-together of people who speak different languages. They have no choice but to speak in a language that everybody understands.
So each person must translate thoughts from their native to the common language. But it reduces communication efficiency. This is how data serialization works.
Now in computing terms. Translating an in-memory data structure to a format that can be stored or sent across the network. This is called data serialization.
What Is Protocol Buffers?
Protocol buffer (Protobuf) is a data serialization format and a set of tools to exchange data.
Protobuf keeps data and the metadata separate. And serializes data into binary format.
Besides Protobuf messages are sent across network protocols such as REST or RPC. And supports many programming languages: Python, Java, Go, and C++.
Here is the Protobuf workflow:
Create a proto file
And define payload schema: data fields and types.
Compile proto file to language-specific source files
Compile the proto file using the Protobuf compiler. And create language-specific source files: One file for the client to serialize data. And other for the server to deserialize data.
Create an executable package
Compile the generated Protobuf source file with the project code.
Serialize or deserialize data
Serialize data at runtime.
Why Use Protocol Buffers?
JSON serialization became a performance bottleneck at LinkedIn. Because textual format needed extra network bandwidth. And more computing resources to compress data. It resulted in poor latency and throughput.
Also skipping unwanted data fields is not possible while parsing JSON. Because there is no separation between data and metadata.
But metadata in Protobuf allows parsing specific data fields. And makes it a lot more efficient for a big payload.
Their criteria to find a JSON data serialization alternative were:
Smaller payload size. Because it reduces bandwidth needs
Improved efficiency. Because it reduces latency
Support for many programming languages. Because their tech stack was diverse
Easy to plug into the existing setup. Because they wanted to reduce the engineering effort
Protobuf satisfied all the criteria. So they moved to Protobuf.
Protobuf reduced their P99 latency by 60% for big payloads. And improved average throughput by 6.25% for response payloads.
99th latency percentile is called P99 latency. Put another way, 99% of requests will be faster than the given latency number. Or only 1% of requests will be slower than P99 latency.
Here is a summary of the Protobuf study by Auth0.com
But if services running Java and Python or Java communicated with each other. Protobuf offered 6 times better latency compared to JSON.
Protobuf Rollout at LinkedIn
And this is how they rolled out Protobuf:
Add Protobuf support to Rest.li framework
Increment Rest.li framework version. And redeploy every microservice
Release slowly using client configuration. And reduce service disruption
Protocol Buffers vs JSON
I will outline the top 3 benefits and limitations of Protobuf and JSON because it might help you to make better architectural decisions with your project.
Support for schema validation
Improved performance with big payloads. Because it uses the binary format
Support for backward compatibility
Hard to debug. And not human-readable
Extra effort to update the proto file needed
Limited language support compared to JSON
Easy to use and human-readable
Easy to change. Because it provides a flexible schema
Support for many programming languages
No support for schema validation
Poor performance for big payloads
Backward compatibility problems
Use Protobuf when:
Payload is big
Frequent changes to the payload schema expected
Use JSON when:
High performance is not needed
Protobuf gave big performance improvements for LinkedIn. But it is important to check if Protobuf is best for your use case to prevent over-engineering.
Consider subscribing to get simplified case studies delivered straight to your inbox:
Thank you for supporting this newsletter. Consider sharing this post with your friends and get rewards. Y’all are the best.
Thank you for the referral and for helping this community grow.