The System Design Newsletter

The System Design Newsletter

Designing a Payment Backend with Stripe Integration

#152: The complete engineering blueprint for a Stripe-integrated payment backend

Hayk's avatar
Neo Kim's avatar
Hayk and Neo Kim
Jun 10, 2026
∙ Paid

Get my system design playbook for FREE on newsletter signup:

  • Share this post & I'll send you some rewards for the referrals.

  • Block diagrams created using Eraser.


A payment backend is one of the few systems where a bug doesn’t just cause a bad user experience but real money to vanish from user accounts.

This makes payment system design fundamentally different from most backend engineering…

The core challenge here is NOT performance or feature richness, but correctness under failure. Every design decision, from database choice to retry policy, must answer one question: what happens when this component fails mid-transaction?

This is a complete system design blueprint for building a payment backend that integrates Stripe as the payment service provider.

Onward.


The agent harness wasn't supposed to be the black box (Partner)

Agent loop is the most important piece of infrastructure in your workflow right now, and for most developers, it’s the one piece they can’t open up.

Agent builders have to jump through all the hoops themselves, crafting the infrastructure and tools, testing the harness, while fighting to maintain what they’ve built.

Meet Cline SDK: agent harness behind Cline 2.0, fully open-sourced. The same runtime that powers Cline across VS Code, JetBrains, and the CLI is now an npm install away: npm i @cline/sdk. Inspect it, fork it, extend it, ship on it.

  • Best-in-class harness: 74.2% on Terminal-Bench 2.0 with Claude Opus 4.7 ahead of Claude Code (69.4%) and strongest numbers published on open-weight models.

  • Open model & provider choice: Anthropic, OpenAI, Google, Bedrock, Mistral, or any OpenAI-compatible endpoint.

  • Real plugin system: Register tools, hooks, commands, providers, message builders. Prototype as a local file, harden into a package. Extend it freely for any of your agent use cases.

  • Scheduled + event-driven agents: Cron and event specs for PR reviews, dependency checks, coverage audits, changelogs no separate orchestration layer.

Stop building around your agent. Start building on it.

Install Cline SDK today: npm i @cline/sdk Or try the rebuilt harness directly: npm i -g @cline

Get Started Today

(Thanks to Cline for partnering on this post.)


I want to reintroduce Hayk Simonyan as a guest author.

He’s a senior software engineer specializing in helping developers break through their career plateaus and secure senior roles.

If you want to master the essential system design skills and land senior developer roles, I highly recommend checking out Hayk’s YouTube channel.

His approach focuses on what top employers actually care about: system design expertise, advanced project experience, and elite-level interview performance.


Inside this newsletter, you’ll get:

  • Three approaches to accepting payments. Why no-code checkout hits limits fast, when building your own processor makes sense, and why integrating a PSP like Stripe is the right call for most companies.

  • How money actually moves. The full card transaction lifecycle from authorization to settlement, including the six entities involved and the engineering risks most developers miss.

  • High-level architecture. The seven components of a production payment backend, how they split the synchronous and async paths, and why each boundary exists.

  • Idempotency and exactly-once processing. How to design a system that never double-charges, even when servers crash mid-transaction, using idempotency keys, recovery points, and the atomic phases pattern.

  • Webhook handling and the payment state machine. How to process Stripe webhooks reliably when duplicates and out-of-order delivery are expected, and how to enforce valid state transitions at the database level.

  • Designing for high availability. How to approach 99.999% uptime across your API layer, database, and async workers, and why a circuit breaker in front of Stripe protects your own system more than it protects Stripe.

Golden members get all posts like these!…


Three Approaches to Accepting Payments

Before diving into the design, it’s worth understanding the landscape…

When a company needs to accept payments, there are three broad approaches, and choosing the wrong one for your scale wastes either years of engineering time or millions in unnecessary fees.

Option 1: Build your own payment processor

This means connecting directly to card networks (Visa, Mastercard), acquiring banking licenses, handling PCI DSS1 (Payment Card Industry Data Security Standard, which is the set of security requirements every company that touches card data must comply with), Level 1 compliance2, building your own fraud detection, and managing relationships with issuing banks.

The upside is lower transaction costs.

At a massive scale, even saving 0.5% off each transaction adds up to hundreds of millions per year. Amazon, Uber, and Airbnb have all moved in this direction for parts of their payment stack.

The downside is the cost to get there: acquiring necessary licenses alone takes 12-24 months, costs millions, and comes with ongoing regulatory obligations. So this approach only makes sense once you are processing billions in annual payment volume.

For anyone else, it is premature optimization at its extreme…

Option 2: Use a payment service provider (PSP) like Stripe, PayPal, Adyen, or Braintree

A PSP is a company that handles the entire payment processing chain on your behalf: card network integrations, banking relationships, fraud tooling, PCI DSS compliance, and global payment method support.

You pay a per-transaction fee, typically around 2.9% + $0.30 with Stripe, in exchange for not having to build or maintain any of that infrastructure. This is the right approach for most companies, from early-stage startups to large-scale platforms.

Even Shopify, which processes billions in payments annually, built its own payment product (Shopify Payments) on top of Stripe’s infrastructure rather than connecting directly to card networks.

Option 3: Use a no-code payment flow, such as Stripe Checkout

Stripe offers hosted checkout pages and payment links that require almost no backend integration.

You create a checkout session, redirect the user to Stripe’s hosted page, and Stripe handles everything: UI, payment method selection, and confirmation.

This is the fastest way to accept payments, and it works well for simple use cases: selling a single product, collecting donations, or building a quick prototype.

The trade-off is limited customization and less control over the payment experience.

You cannot deeply embed it in your own UI, easily build custom subscription logic, or react to payment events in real time with full control over the flow.


How Money Moves Through Card Network

Understanding the payment flow end-to-end is a prerequisite to designing a backend that handles it correctly…

Six entities participate in each card transaction:

  1. Cardholder

  2. Merchant (Business or individual selling the product or service)

  3. PSP (Payment Service Provider, in this case, Stripe)

  4. Acquiring bank (Merchant’s bank)

  5. Card network (Visa/Mastercard)

  6. Issuing bank (Cardholder’s bank)

Money flows through 3 distinct phases, each with different timing and failure modes…

Phase 1: Authorization (1-3 seconds)

The customer submits card details through Stripe Elements.

These are technically iframes hosted by Stripe, even if they look like part of your site. We use Stripe.js, which is a JavaScript library that Stripe requires you to load directly from their servers (for security reasons).

The process starts on the backend, where a PaymentIntent3 gets created via Stripe’s API to track the transaction lifecycle. This returns a client_secret to the frontend.

The customer then enters their card details into Stripe Elements - secure iframes hosted by Stripe that look like part of your site. Using Stripe.js, the client-side code tokenizes the card details so the PAN (Primary Account Number: a 14 to 16-digit number on a credit or debit card) never reaches the merchant’s server.

What is PaymentIntent?

A PaymentIntent is a stateful object Stripe uses to track a single payment from start to finish.

Think of it as the source of truth for a transaction…

In older systems, you’d just send a “charge” request and hope it worked. With modern payments, things are more complex because of 2FA (e.g., 3D Secure4) or fraud checks.

PaymentIntent manages this by moving through different states:

  • Requires payment method: You created the intent, but the user hasn’t typed their card yet.

  • Requires action: User must verify the payment in their bank app.

  • Processing: Stripe is talking to the banks.

  • Succeeded: Money is authorized or captured.

Stripe formats an ISO 8583 authorization message5 and sends it to the acquiring bank6, which identifies the card network via the BIN7 (first 6-8 digits) and forwards the request.

The card network routes it to the issuing bank, which checks card validity, available funds, CVV/AVS8 match, and fraud risk models.

  • The issuer returns a two-digit response code, approve or decline, back through the entire chain.

  • If approved, a hold gets placed on the cardholder’s available balance. But no money moves yet.

Mastercard reports an average network response time of 130 milliseconds; the full round-trip, including all hops, completes in 1-3 seconds.

Visa’s network handles a peak of 56,000+ messages per second.

Phase 2: Capture (immediate/delayed)

Capture9 is when the merchant tells acquiring bank (the merchant’s bank) to finalize the authorized amount.

For digital goods and subscriptions, capture occurs immediately. Stripe’s default is capture_method: automatic.

For physical goods, hotels, or ride-hailing, capture is delayed until fulfillment. You can capture an amount less than or equal to what was authorized, but never more. Authorization holds10 typically last 5 to 10 days.

Visa allows 10 days for standard e-commerce and up to 31 days for lodging.

Here’s the Engineering Risk:

If you don’t capture within 7 days, the issuing bank might cancel the hold. This is a common source of bugs. Your state machine must treat AUTHORIZED and CAPTURED as “distinct” states. If your system thinks a payment is authorized but the bank has dropped the hold, the capture call will fail.

So you need a background job to detect these “stuck” payments and mark them as expired.

Phase 3: Clearing and settlement (T+1 to T+3)

At the end of the business day, captured transactions get batched and sent to the card network for clearing.

The network calculates fees and exchanges transaction files overnight…

Settlement is when the actual money moves. The issuing bank sends the funds (minus fees) to the card network, which then passes them to the acquiring bank.

The acquiring bank finally deposits them into the merchant’s account.

In banking, “T” stands for the Transaction Date. T+2 means the money arrives two business days after the transaction. Stripe’s US default is T+2, though new accounts usually have an initial hold of 7 to 14 days.

Fee Breakdown

On a $100 US online transaction, the 2.9% + $0.30 fee gets split three ways:

  • Interchange11 (~$2.05): This goes to the Issuing Bank (the customer’s bank). This is the largest cut. It covers the bank’s risk and pays for the customer’s credit card rewards.

  • Assessment (~$0.16): This goes to the Card Network (Visa or Mastercard) for using their rails.

  • Markup (~$0.70): This is what Stripe keeps for providing the API, security, and infrastructure.


Functional and Non-Functional Requirements

Functional requirements for a payment backend serving a mid-to-large platform fall into 6 categories:

  1. Customers must be able to make one-time payments and save payment methods for reuse.

  2. Merchants or the platform must be able to accept payments, with support for marketplace-style split payments if needed.

  3. Refunds, both full and partial, must flow through a dedicated refund state machine.

  4. System must handle subscriptions and recurring billing, including proration on plan changes, trial periods, and Stripe’s Smart Retries for failed renewal payments.

  5. Backend must process Stripe webhooks reliably, as webhooks are the source of truth for payment status.

  6. The entire payment lifecycle must be modeled as a finite-state machine with enforced transitions.

Non-functional requirements are where payment systems diverge sharply from typical backend services:

  • Exactly-once payment processing is the key requirement. A double charge on a $1,000 purchase means money taken from a person twice. True exactly-once delivery is impossible in distributed systems (Two Generals Problem12), so the practical implementation is at-least-once delivery combined with idempotent processing and reconciliation.

  • High availability targets are extreme. Stripe itself maintains 99.999% API uptime.

  • Idempotency must be enforced at every layer: client-to-backend, backend-to-Stripe, and webhook processing. Every retry must produce the same result as the original request.

  • Consistency over availability. Payment systems are one of the few domains where the CAP theorem should tilt decisively toward consistency. A stale read that shows an incorrect balance or a lost write that drops a payment is far worse than brief unavailability. This is why every major payment platform, including Shopify, Uber, and Airbnb, uses SQL databases with ACID guarantees for core payment data.

Scale estimates for a mid-to-large platform

A platform processing 100,000 payments per day generates roughly 500,000-1,000,000 webhook events daily (each payment triggers 5-10 events across creation, authorization, capture, and related objects).

Stripe’s default API rate limit is 100 requests/second per account, with individual endpoints limited to 25 requests/second; higher limits are available by arrangement.

Payment API latency should be under 5 seconds end-to-end, including the Stripe round-trip, with internal service-to-service calls under 100ms.


Reminder: this is a teaser of the subscriber-only newsletter series, exclusive to my golden members.

When you upgrade, you’ll get:

  • High-level architecture of real-world systems.

  • Deep dive into how popular real-world systems actually work.

  • How real-world systems handle scale, reliability, and performance.

Unlock Full Access


High-Level Architecture

The architecture splits into seven components, each with a distinct responsibility.

It isolates the synchronous customer-facing path from asynchronous processing and keeps the webhook ingestion pipeline decoupled from business logic.

1 Payment API Service

It’s the synchronous entry point.

It receives payment requests from checkout, validates input, checks idempotency keys13 against the database, creates or retrieves the payment record, and returns an immediate response to the client.

For Stripe integrations, this service creates a PaymentIntent and returns the client_secret to the frontend, which uses Stripe.js to complete the payment (including 3D Secure challenges).

The API service should never block on downstream processing. It creates the payment record, dispatches work, and returns.

2 Stripe Integration Layer

It abstracts all Stripe-specific API calls behind a uniform interface.

This adapter handles Stripe’s error types, maps them to internal error codes, attaches idempotency keys to every POST request, and manages timeouts. Shopify and Airbnb both use this pattern; Airbnb calls it the “PSP adapter,” which isolates provider-specific logic from the core payment domain.

If you ever need to support a second PSP (Adyen, Braintree), only this layer changes.

3 Payment Database

This is the system of record.

It stores the current state of every payment, idempotency keys, and the immutable audit log. The schema design is covered in the next section, but the critical principle is: payments table holds a mutable current state (optimized for queries), while the payment events table holds an append-only immutable log of every state change (optimized for audit, debugging, and reconciliation).

Both exist in the same SQL database and get updated within the same transaction.

4 Webhook Receiver

It’s a lightweight HTTP endpoint that does exactly three things: verify the Stripe signature (HMAC-SHA25614 using the Stripe-Signature header), store the raw event, and return 200 OK.

It must respond within seconds, since Stripe has an approximately 20-second timeout. All business logic happens asynchronously. The receiver enqueues the event onto the message queue for processing by workers.

5 Message Queue

It (Kafka, SQS, or RabbitMQ) decouples webhook receipt from processing and provides at-least-once delivery guarantees.

Uber uses Apache Kafka as the backbone of their payment platform’s async stream processing. The queue also supports the transactional outbox pattern15: when a payment state change is committed to the database, an outbox record is written in the same transaction, then relayed to the queue by a separate process.

6 Background Job Workers

They handle 4 categories of async work:

  • Webhook event processing: dequeuing events and applying state transitions idempotently,

  • Retry workers: retrying failed PSP calls with exponential backoff,

  • A reconciliation worker: daily comparison of internal records against Stripe’s records,

  • And a stuck-payment detector: alerting on payments in intermediate states beyond a configurable threshold.

7 Ledger

It provides double-entry bookkeeping16.

Every money movement is recorded as a balanced pair of debit and credit entries. Uber explicitly built its next-generation payment platform on double-entry bookkeeping for auditability, and Stripe’s own ledger system logs approximately 5 billion money-movement events daily.

The ledger is append-only.

If a mistake needs correction, a new reversing entry is inserted, never an update or delete.

The high-level architecture tells us what to build, but database design is where correctness is either enforced or lost. Every pattern we covered above, idempotency, exactly-once processing, the state machine, all of them depend on the database schema doing the right thing.

If we get this wrong, no amount of application-level logic will matter.

So let's see what the database design looks like…


Database Schema to Enforce Correctness

Keep reading with a 7-day free trial

Subscribe to The System Design Newsletter to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
Hayk's avatar
A guest post by
Hayk
I help fullstack developers break out of the mid-tier trap and scale into multi six-figure remote careers.
Subscribe to Hayk
© 2026 Neo Kim · Publisher Privacy
Substack · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture