I Struggled With System Design Interview Until I Learned This Framework

#128: How I Failed My Amazon System Design Interview (And What I Learned)

Mar 07, 2026

∙ Paid

Share this post & I'll send you some rewards for the referrals.

In 2021, I applied to Amazon and failed the system design interview.

I knew something was off during the interview, but I couldn’t figure out what. At that time, I was a Software Engineer at Trainline on the Search/Checkout Web Services team, building software for over 100 million customers across Europe. I could code, I understood distributed systems, and I had shipped real features.

On paper, everything looked fine.

Looking back, I can see exactly what I missed…

I jumped straight into designing without clarifying requirements, never validated my assumptions, and never asked what actually mattered. I just started drawing boxes and arrows. I felt great in the moment and got smiles from the interviewer, but ultimately I failed.

Fast forward today…

I’m an L5 Software Engineer at AWS, who has contributed to CloudWatch, EC2, and Regional Services. I conduct system design interviews for candidates. That journey from failure to interviewer taught me something important: most engineers who fail these interviews aren’t bad engineers. They just don’t have a framework for turning vague problems into conversations.

This newsletter is everything I wish I knew before my 2021 interview.

I’m going to share what you actually need to pass: the framework, the pacing, the collaboration checkpoints, and the mindset that separates candidates who pass from those who don’t.

Before we get into frameworks and strategies, let’s explore why these interviews feel so different from everything else you do as an engineer.

Onward.

Need public web data, without scraper headaches? (Partner)

SerpApi turns search results into predictable JSON with built-in scale, location options, and protection from blocks.

That’s why engineers use it to ship:

AI applications
Product research
Price tracking
SEO insights

All without maintaining scrapers or infrastructure.

Try For Free

(I’d like to thank SerpApi for partnering on this post.)

I want to introduce Abdirahman Jama as the guest author.

He’s a Software Engineer at AWS, where he’s contributed to production services like CloudWatch and EC2. He also conducts system design interviews for engineering candidates.

Before AWS, he worked at Trainline and BT, building software for over 100 million customers across Europe. He was an AWS Community Builder and served as a coding instructor at Code First Girls in collaboration with Skyscanner.

He regularly shares free software engineering tips, system design breakdowns, and career advice for engineers breaking into big tech.

You can connect with Abdirahman here:

Why System Design Interviews Feel Hard

In coding interviews, the problem is clear: “Write a function that does X.”

You can write the function, add some tests, and confirm whether it works. The feedback loop is immediate and obvious.

When you’re building systems at work, you have context.

You know your users, your constraints, and the internal components you can integrate. Also, you have teammates to bounce ideas off and challenge assumptions. Plus, you have time to assess different options, weigh pros/cons, and make informed decisions. And most importantly, you can use Google.

System design interviews strip all of that away:

You’re given a vague problem like “Design ChatGPT” and roughly 30-40 minutes to propose a solution. You can’t search for anything. Yet somehow you’re expected to show enough depth and breadth to cover important concepts.

Now that you understand the challenge, let’s look at what’s actually being evaluated. Knowing the scoring criteria changes how you prepare.

What Interviewers Look For (& Mental Model You Need)

Here’s what I learned as an interviewer: we’re not looking for the “right answer.”

There isn’t one! We’re evaluating whether you can think “systematically” about ambiguous problems…

Here are five dimensions most companies will assess you on:

1. Communication and Collaboration

Do you ask clarifying questions?
Do you validate assumptions throughout the interview?
Do you treat the interviewer as a partner, or do you just talk at them for 40 minutes straight?

This is crucial as it shows if you’d be a good teammate.

2. Structured Thinking

Can you break down a vague problem systematically?
Do you tackle things logically or jump around between different database choices, caching strategies, and API design?

Strong candidates show how they can organise their thoughts even when the problem is ambiguous.

3. Technical Judgment

When you pick a technology, a database, a cache, or a message queue, can you explain why?
Can you walk through the trade-offs? For example, why DynamoDB over PostgreSQL for this specific use case?

This shows you choose tools based on requirements, not habit.

4. Practical Depth

Do you think about failure modes, scale, and operations?
Or do you only design the happy path?

This is where we spot engineers who’ve shipped production systems and those who haven’t. Real systems fail. Databases go down. Networks partition. Strong candidates anticipate and design for this.

5. Time Management

Can you pace yourself to cover breadth and depth in 40 minutes?
Do you spend a long period covering a specific area?

It’s always worth working with your interviewer to understand if there’s an area of interest for them, or you should propose what you’ll focus on. Get alignment.

Between my failed 2021 interview and the ones I passed, the biggest shift was adopting a mental model, a framework that works for any system design problem. This framework keeps people organised and shows how to solve ambiguous problems.

Let’s walk through each step…

Framework: 5 Simple Steps to System Design Interviews

Step 1: Clarify the problem

Goal: Understand the problem before you design anything.
Estimated time: 5 minutes

Start by understanding what’s in scope and what isn’t:

Are you focusing on Feature A or Feature B, or both?
Are you targeting a specific platform?
How many users are you building for?
Where are they located?

These aren’t just theoretical questions…

The answers can fundamentally change your design.

Functional Requirements (What Are We Building?)

Be specific about what your system needs to do:

What are the main use cases?
Who are the users?
What core features are you building?
What are you explicitly leaving out of scope?

Non-Functional Requirements (What Are the Constraints?)

This is where numbers matter:

How many daily and monthly users do you have?
How many requests per second?
What’s the read-to-write ratio?
What are the latency requirements? Is 200ms acceptable, or do we need sub-100ms?
What’s an acceptable downtime?

These constraints will drive every technical decision you make…

A system for 1000 users looks completely different from one serving 200 million daily active users.

TIP: Check with your interviewer and ask if the scope and requirements make sense.

Step 2: Define Core Data & APIs

Goal: Model what you’re storing and how clients interact with it.
Estimated time: 10 minutes

Data Model

Identify the core entities and their attributes:

What are the main “things” in your system? Think about how they relate to each other.
Define your primary keys, foreign keys, and the indexes you’ll need for common queries.

Data Characteristics

Understand the shape of your data:

How big is each record?
Is the data append-only or mutable?
What’s the growth rate?
How will the data be queried?
- Are you doing point lookups by ID or complex joins?
- Which data needs strong consistency, and what can be eventually consistent?
- What absolutely cannot be lost?

API Contracts

Define your main endpoints with clear request and response formats. Think about pagination strategies, error handling, rate limiting, and versioning.

Database Choice

Pick a technology and explain why.

Don’t just say: “Use PostgreSQL.” Justify your choice based on the requirements, discuss alternatives, and explain why you chose one over another.

TIP: Check again with your interviewer: “Does this data model and API structure make sense for the requirements?”

Step 3: High-Level Architecture

Goal: Design system components and show how data flows through them.
Estimated time: 15 minutes

You should spend most of your interview time here.

Start by identifying your core components: load balancers, API servers, a cache layer, and queues. Keep this list focused and include only what your design needs.

Draw Data Flow

Walk through how a request moves through your system.

From the client, through the load balancer, to the API server, into the cache or database, and back. Do this for both reads and writes.

Discuss Asynchronous Workflows Where Appropriate

When appropriate, explain why you’d decouple work into background processing. Also discuss trade-offs: faster responses, but eventual consistency.

Caching Strategy

Be specific about what you’re caching, for how long, and how you handle invalidation. Plus, explain the trade-offs between freshness and performance.

CAP Theorem Considerations

For different parts of your system, choose between consistency, availability, and partition tolerance. Explain why each part of the system may need a different approach.

Step 4: Bottlenecks, Scale, and Reliability

Goal: Show you understand what breaks at scale and how to handle failures.
Estimated time: 5 minutes

Don’t just design the happy path; show you think about failure and scale.

Identify Bottlenecks

Where will your system struggle under load?

Name the component, quantify the limit, and propose a mitigation.

Single Points of Failure

What happens if a component goes down? Identify the risks and explain the mitigation.

Failure Modes and Graceful Degradation

Cache failure: fallback to a slower but functional path
Message queue backed up: serve stale data instead of failing requests
Network partition: degrade gracefully rather than returning errors

Redundancy

Database: multi-AZ replicas for high availability
Application servers: auto-scaling across availability zones
Cache: distributed cache with replication

Monitoring, Logging, and Alerting

This is my favourite part of the interview. It shows you’re thinking about operations and observability--one of the clearest signals separating junior from senior engineers.

You should cover:

API latency at p50, p95, and p99
Error rates and thresholds
Queue depth and backlog alerts
Structured logs with request IDs for distributed tracing

TIP: Ask your interviewer: “Are there specific failure scenarios you’d like me to explore further?”

Step 5: Tradeoffs and Extensions

Goal: Summarise key decisions and discuss what you’d build next.
Estimated time: 5 minutes

Wrap up by summarising the key trade-offs you made. What did you prioritise, and what did you sacrifice? Be explicit about the why.

Then discuss extensions. What would you add if you had more time? This shows you’re thinking beyond the immediate problem. That’s the framework.

Next let’s see it in action with a real example:

Connect with Abdirahman here:

He regularly shares free software engineering tips, system design breakdowns, and career advice for engineers breaking into big tech.

Reminder: this is a teaser of the subscriber-only post, exclusive to my golden members.

When you upgrade, you’ll get:

Full access to system design case studies
FREE access to (coming) Design, Build, Scale newsletter series
FREE access to (coming) popular interview question breakdowns

And more!

Access the interview example and quick reference framework 👇

Keep reading with a 7-day free trial

Subscribe to The System Design Newsletter to keep reading this post and get 7 days of free access to the full post archives.

A guest post by

Abdirahman Jama

I'm a software developer with experience in multiple industries, including travel and telecommunications, currently building cool things at Amazon.