The System Design Newsletter

The System Design Newsletter

Share this post

The System Design Newsletter
The System Design Newsletter
How Zapier Automates Billions of Tasks
Copy link
Facebook
Email
Notes
More

How Zapier Automates Billions of Tasks

#37: Learn More - Zapier Architecture Overview (5 minutes)

Neo Kim's avatar
Neo Kim
Feb 25, 2024
78

Share this post

The System Design Newsletter
The System Design Newsletter
How Zapier Automates Billions of Tasks
Copy link
Facebook
Email
Notes
More
13
9
Share

Get the powerful template to approach system design for FREE on newsletter sign-up:


This post outlines Zapier architecture. If you want to learn more, scroll to the bottom and find the references.

  • Share this post & I'll send you some rewards for the referrals.

Once upon a time, there lived an office assistant named Sophie.

She was bright and smart.

But she got exhausted from repetitive office tasks.

Zapier Architecture

Until one day when she hears about Zapier from a coworker.

Automation Workflow
Automation Workflow

She automated the workflow that occurs frequently when an event gets created in her office Google Calendar.

It sends Slack notifications and adds rows to Google Sheets automatically.

And she was stunned by its ease of automation.

system design newsletter

Zapier Architecture

Here's how Zapier automates billions of tasks:

1. Tech Stack

They run the Nginx web server and Python Django framework on the backend.

And stores data in MySQL and Redis. Put another way, Zaps gets stored in MySQL.

Zap is an automated workflow that connects different tasks or services.

While MySQL is a relational database management system.

Zapier Tech Stack Overview
Tech Stack Overview

They store the number of in-flight tasks in Redis. It allows them to throttle.

Redis is an in-memory key-value database.

While AWS Lambda runs custom scripts provided by the user.

Lambda is a serverless computing platform.

2. Zap Implementation

They use the directed rooted tree to create a workflow.

While each tree node represents a task.

And directed rooted tree is implemented in MySQL for simplicity.

A directed rooted tree is a directed graph with all edges pointing away from the root node.

Directed Rooted Tree Representing Tasks
Directed Rooted Tree Representing Tasks

Also tasks are kept independent of each other. Put another way, a task consumes data from the file system, performs API calls, and returns results.

So they’re unaware of their positioning in the workflow.

And a workflow engine orchestrates tasks. In other words, it decides the task execution order based on the directed rooted tree.

They store the session data of task execution in a dedicated MySQL. It’s used as a key-value store with softer consistency requirements and offers low operational complexity.

Besides they use MySQL read-only replicas to handle long-running background tasks. Because these tasks wouldn’t change often and a replication lag is tolerable.

3. Asynchronous Processing

A long-living but idle connection to the web server consumes resources. Thus it’s expensive.

So they use a message queue to avoid waiting for a request to finish. It prevents problems due to timeouts and resource bottlenecks.

They use RabbitMQ and Celery to create a distributed workflow engine. Put another way, it’s used to schedule background tasks.

RabbitMQ is a lightweight message queue.

While Celery is an asynchronous task queue based on distributed message passing. And it supports real-time operations and scheduling.

Celery sends messages to workers using RabbitMQ.

Put another way, Celery is a task management framework. It provides a high-level API to schedule and trigger tasks.

While RabbitMQ provides a low-level API to do the same things. And RabbitMQ is one of the many backends for Celery.

Asynchronous Processing of Tasks
Asynchronous Processing of Tasks

They send the task ID to the message queue and the worker gets the task data from the database.

And the worker executes the task.

4. Zap History

Zap history shows a user the list of tasks that ran in their account.

They use GraphQL and Next.js API routes to get the Zap history.

While Python Django runs on the backend.

GraphQL is a data query language for APIs and a query runtime engine.

Storing the Results of Zap Execution
Storing the Results of Zap Execution

They store the results of a Zap execution in AWS S3 and emit an event to Kafka.

The emitted event contains enough information to process the execution result.

AWS S3 is an object storage.

While Kafka is a distributed event store and stream-processing platform.

Indexing the Results of Zap Execution
Indexing the Results of Zap Execution

They use the indexer service to consume the events from Kafka. Also it downloads the relevant data from S3.

And they index the processed Zap execution data in the Elasticsearch cluster.

Put another way, ElasticSearch stores the historical activity of Zaps.

Elasticsearch is a search engine based on the Lucene library. It offers a text search functionality with an HTTP web interface.

5. Scalability

They use a combination of auto-scaling and auto-replacement for resilience.

Besides they scale horizontally and replicate infrastructure for high availability.

They use jitter to handle spikes in workload when many tasks get scheduled for the same time.

Put another way, they don’t guarantee that every task will run at the exact time.

Scaling Workers for Task Execution
Scaling Workers for Task Execution

They enqueue tasks on RabbitMQ. And the tasks get consumed by workers running on Kubernetes.

Kubernetes is a container orchestration system for automating software deployment, scaling, and management.

They scale the workers based on CPU usage and the number of ready tasks in RabbitMQ. Thus allowing them to handle the varying load.

system design newsletter

Zapier remains one of the leading automation tools in the market.

And this case study indicates that a simple tech stack with proven technologies is enough for high scalability.


👋 PS - Are you unhappy at your current job?

And preparing for system design interviews to get your dream job can be stressful.

Don't worry, I'm working on content to help you pass the system design interview. I'll make it easier - you spend only a few minutes each week to go from 0 to 1. Yet paid subscription fees will be higher than current pledge fees.

So pledge now to get access at a lower price.

"This newsletter is a perfect place to learn system design from big tech deep dives." Alexandre


Consider subscribing to get simplified case studies delivered straight to your inbox:


NK’s Recommendations

  • Leading Developers: If you are a Development Team Leader, Engineering Manager, or considering that career path - try this newsletter.

    Author:

    Anton Zaides

  • High Growth Engineer: Get actionable tips to grow faster in your software engineering career.

    Author:

    Jordan Cutler


Author NK; System design case studies
Follow me on LinkedIn | YouTube | Threads | Twitter | Instagram | Bluesky

Thank you for supporting this newsletter. Consider sharing this post with your friends and get rewards. Y’all are the best.

system design newsletter

Share


How Disney+ Hotstar Delivered 5 Billion Emojis in Real Time

How Disney+ Hotstar Delivered 5 Billion Emojis in Real Time

Neo Kim
·
February 10, 2024
Read full story
How Canva Supports Real-Time Collaboration for 135 Million Monthly Users

How Canva Supports Real-Time Collaboration for 135 Million Monthly Users

Neo Kim
·
February 18, 2024
Read full story

References

  • Scaling Zapier to Automate Billions of Tasks

  • The architecture behind Zapier's Zap History pages

  • The Zapier Tech Stack

  • How Zapier uses KEDA

  • KEDA at Zapier

  • Async Celery by Example: Why and How

  • Scaling Zapier to Automate Billions of Tasks on Hacker News

  • What is the relationship between Celery and RabbitMQ?

Harris J's avatar
Petr Reichl's avatar
Petar Ivanov's avatar
Jay Prakash's avatar
Pradeep Gandla's avatar
78 Likes∙
9 Restacks
78

Share this post

The System Design Newsletter
The System Design Newsletter
How Zapier Automates Billions of Tasks
Copy link
Facebook
Email
Notes
More
13
9
Share

Discussion about this post

User's avatar
Anton Zaides's avatar
Anton Zaides
Feb 25, 2024

I was curious to see that Django is still used in modern software 😅

Expand full comment
Like (2)
Reply
Share
4 replies by Neo Kim and others
Sanskar Shrivastava's avatar
Sanskar Shrivastava
Feb 27, 2024

Why did they choose to store the workflow DAGs in MySQL?

Expand full comment
Like (1)
Reply
Share
1 reply by Neo Kim
11 more comments...
8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers
#1: Learn More - Awesome WhatsApp Engineering (6 minutes)
Aug 27, 2023 â€¢ 
Neo Kim
741

Share this post

The System Design Newsletter
The System Design Newsletter
8 Reasons Why WhatsApp Was Able to Support 50 Billion Messages a Day With Only 32 Engineers
Copy link
Facebook
Email
Notes
More
24
How PayPal Was Able to Support a Billion Transactions per Day With Only 8 Virtual Machines
#30: Learn More - Awesome PayPal Engineering (4 minutes)
Dec 26, 2023 â€¢ 
Neo Kim
248

Share this post

The System Design Newsletter
The System Design Newsletter
How PayPal Was Able to Support a Billion Transactions per Day With Only 8 Virtual Machines
Copy link
Facebook
Email
Notes
More
14
How Stripe Prevents Double Payment Using Idempotent API
#45: A Simple Introduction to Idempotent API (4 minutes)
May 9, 2024 â€¢ 
Neo Kim
381

Share this post

The System Design Newsletter
The System Design Newsletter
How Stripe Prevents Double Payment Using Idempotent API
Copy link
Facebook
Email
Notes
More
30

Ready for more?

© 2025 Neo Kim
Publisher Privacy
Substack
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More

Create your profile

User's avatar

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.