The System Design Newsletter

The System Design Newsletter

Share this post

The System Design Newsletter
The System Design Newsletter
Everything You Need to Know About Gossip Protocol
Copy link
Facebook
Email
Notes
More
User's avatar
Discover more from The System Design Newsletter
Download my system design playbook for free on newsletter signup
Over 154,000 subscribers
Already have an account? Sign in

Everything You Need to Know About Gossip Protocol

#25: Learn More - How Distributed Systems Gossip (6 minutes)

Neo Kim's avatar
Neo Kim
Nov 28, 2023
34

Share this post

The System Design Newsletter
The System Design Newsletter
Everything You Need to Know About Gossip Protocol
Copy link
Facebook
Email
Notes
More
5
Share

Get my system design playbook for FREE on newsletter signup:


  • Share this post & I'll send you some rewards for the referrals.

The 2 main problems in distributed systems are state management and communication.

A peer-to-peer service like gossip protocol can be used to solve them.

The gossip protocol handles system state with high availability.

Also application-level data can be piggybacked in gossip messages as key-value pairs.

The gossip protocol is also called the epidemic protocol. Because the messages get transferred like how epidemics spread.


Break into tech in 6 weeks 💻 (Featured)

EntryLevel

Learn tech skills in 6 weeks to create a portfolio and land a remote job.

Try it


Why Use Gossip Protocol?

There are different ways to broadcast a message in distributed systems. They are:

1. Point-To-Point Broadcast

Point-To-Point Broadcast
Point-To-Point Broadcast

The producer sends the messages directly to the consumer. Also the producer retries if the consumer fails to accept the message.

Yet the message will be lost if both the producer and consumer fail simultaneously.

2. Eager Reliable Broadcast

Eager Reliable Broadcast
Eager Reliable Broadcast

Each server broadcasts a message to every other server in the system.

Although it’s fault-tolerant, this approach is problematic because:

  • High bandwidth usage due to O(n^2) messages broadcast to n number of servers

  • Network bottleneck due to O(n) linear broadcast

  • Extra storage is needed to maintain the list of nodes

3. Gossip Protocol

Gossip Protocol
Gossip Protocol

Each server periodically sends the messages to a set of random servers. And the entire system will receive a message eventually.

Gossip protocol is a good choice for communication in a large-scale system because:

  • Each server transfers only a limited number of messages

  • Limited bandwidth usage

  • Tolerant to network and server failures

The gossip protocol is reliable because many servers retransmit the messages.

Yet gossip protocol can be used to keep nodes consistent only if:

  • Operations are commutative

  • Serializability is not needed

The number of servers that receive a message from a particular server is called the Fanout.

The number of gossip rounds needed to transfer a specific message across the entire system is called the Cycle.

A case study of a gossiping system with 128 servers needed less than 2 percent of CPU and 60 KBps of bandwidth.

Here are some gossip protocol simulations:

  • Serf Convergence Simulator

  • Gossip Simulator

Gossip Protocol Properties

There is no formal definition for gossip protocol. But it’s expected to have certain properties:

  • A peer server must be selected randomly

  • Each server stores only location information. And is unaware of the entire system state

  • Interactions between servers are periodic and pairwise

Gossip Algorithm

Each server maintains a list of servers and their metadata.

Here is how the gossip algorithm works:

  1. Gossip periodically to a random server

  2. The server inspects the received gossip message

  3. The server merges the message with the highest version to local data

Gossip Protocol Implementation

The gossip protocol uses the peer sampling service to find the peer servers.

Here is how the peer sampling service works:

  1. Initialize each server with a partial view of the system

  2. Merge the server’s view with a peer server’s view on gossip exchange

A server initiating a gossip exchange sends a gossip digest synchronization message. It contains a list of gossip digests.

Here is a sample schema of gossip digest:

EndPointState:
10.0.1.41

HeartBeatState: 
generation: 1259904231, version: 681

ApplicationState: 
"average-load": 2.7, generation: 3659909691, version: 42

ApplicationState: 
"bootstrapping": pxLpassF9XD8Kymj, generation: 1281909615, version: 91

The gossip messages get sent over User Datagram Protocol (UDP) or Transmission Control Protocol (TCP).

A server is considered healthy if the heartbeat counter keeps incrementing. So the heartbeat counter of a server is incremented on each gossip exchange.

The gossip protocol removes the data from a server using a tombstone. The tombstone is a special data entry to invalidate a data key without the actual deletion of the data.


Gossip Protocol Types

There are 3 types of gossip protocols. They are categorized based on message transfer time and the network traffic created.

1. Anti-Entropy Gossip Protocol

This variant sends an unbounded number of messages without termination.

It’s usually used to reduce the entropy between replicas of a stateful service like the database. The server with the newest message sends it to other servers.

Yet it causes high bandwidth usage due to the transfer of the entire dataset.

2. Rumor-Mongering Gossip Protocol

This variant’s cycle is more frequent compared to the anti-entropy cycle. So it will likely flood the network.

Yet rumor-mongering protocol uses less bandwidth because only the latest changes get transferred.

3. Aggregation Gossip Protocol

This variant creates a system-wide value by sampling data across each server. And then combining them.


How Gossip Protocol Spreads Messages

There are different ways to spread gossip messages. So it should be chosen based on the service needs and available network conditions.

The 3 ways to spread gossip messages are:

1. Push Model

The server with the newest message sends it to a random set of servers.

The push model is efficient if there are only a few messages because it avoids the traffic overhead.

2. Pull Model

Each server actively polls a random set of servers for newer messages.

The pull model is efficient if there are many new messages.

3. Push-Pull Model

The server pushes the newest messages and also polls for newer messages.

The push model is efficient in the initialization phase when there are only a few active servers.

While the pull model becomes efficient if there are many active servers.


Gossip Protocol Use Cases

The gossip protocol can be used to implement:

  • First-in-first-out (FIFO) broadcast

  • Causality broadcast

  • Total order broadcast

Some of the popular use cases of gossip protocol are:

  • Spreading server state across the system in Amazon S3

  • Detecting failures and tracking server membership in Amazon Dynamo

  • Propagating server metadata in the Redis cluster

  • Spreading the nonce value across the mining servers in Bitcoin

  • Electing the leader and detecting agent failures in Consul

  • Transferring consistent hash ring state in Riak database

Gossip Protocol Advantages

The advantages of gossip protocol are:

  • Fault-tolerant: messages flow via many routes making it tolerant against unreliable networks

  • Scalable: each server interacts with a limited number of servers and sends a fixed number of messages. Also a server doesn’t wait for an acknowledgement

  • Decentralized: it uses a peer-to-peer communication model

Gossip Protocol Disadvantages

The disadvantages of gossip protocol are:

  • Eventually consistent: it’s slower compared to multicast. Also gossip behavior depends on network topology

  • Difficult to debug and test: non-deterministic and distributed nature makes it hard to debug and reproduce failures

  • High bandwidth usage: A single message will get sent many times across different servers

The takeaway is to avoid gossiping in the real world. But gossip in the world of distributed systems when eventual consistency is fine.


Consider subscribing to get simplified case studies delivered straight to your inbox:


Author NK; System design case studies
Follow me on LinkedIn | YouTube | Threads | Twitter | Instagram

Thank you for supporting this newsletter. Consider sharing this post with your friends and get rewards. Y’all are the best.

system design newsletter

Share


Everything You Need to Know About Consistent Hashing

Everything You Need to Know About Consistent Hashing

NK
·
November 21, 2023
Read full story
What Happens When You Type google.com in the Browser?

What Happens When You Type google.com in the Browser?

NK
·
November 26, 2023
Read full story

References

  • https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/2007PromiseAndLimitations.pdf

  • https://systemdesign.one/gossip-protocol/

Vikas Kumar Yadav's avatar
Conor's avatar
Anton Zaides's avatar
Nicola Ballotta's avatar
Madan Kumar Y's avatar
34 Likes∙
5 Restacks
34

Share this post

The System Design Newsletter
The System Design Newsletter
Everything You Need to Know About Gossip Protocol
Copy link
Facebook
Email
Notes
More
5
Share

Discussion about this post

User's avatar
8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers
#1: Learn More - Awesome WhatsApp Engineering (6 minutes)
Aug 27, 2023 â€¢ 
Neo Kim
742

Share this post

The System Design Newsletter
The System Design Newsletter
8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers
Copy link
Facebook
Email
Notes
More
24
How PayPal Was Able to Support a Billion Transactions per Day With Only 8 Virtual Machines
#30: Learn More - Awesome PayPal Engineering (4 minutes)
Dec 26, 2023 â€¢ 
Neo Kim
250

Share this post

The System Design Newsletter
The System Design Newsletter
How PayPal Was Able to Support a Billion Transactions per Day With Only 8 Virtual Machines
Copy link
Facebook
Email
Notes
More
14
How Stripe Prevents Double Payment Using Idempotent API
#45: A Simple Introduction to Idempotent API (4 minutes)
May 9, 2024 â€¢ 
Neo Kim
384

Share this post

The System Design Newsletter
The System Design Newsletter
How Stripe Prevents Double Payment Using Idempotent API
Copy link
Facebook
Email
Notes
More
30

Ready for more?

© 2025 Neo Kim
Publisher Privacy
Substack
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More

Create your profile

User's avatar

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.