Skip to main content

Module 1: System Design Methodology

Primary texts: System Design Interview (Alex Xu) and the System Design Primer (donnemartin)
Selective support: Fundamentals of Software Architecture (Richards & Ford) for architectural characteristics and trade-off reasoning; S7 (DDD, architecture styles, API design) for domain boundaries and contracts

This guide is the primary teacher. You do not need to read the source books front-to-back to complete this module. You do need to leave this module able to walk into a design discussion on an unfamiliar problem and produce a defensible end-to-end design using a repeatable method.


Scope of This Module

System design is not a vocabulary test. It is a disciplined process for turning an under-specified problem into a defensible architecture under time pressure.

What it covers in depth:

  • framing a problem into functional requirements, non-functional requirements, and explicit constraints
  • back-of-envelope estimation for QPS, storage, bandwidth, memory, and latency budgets
  • identifying the genuinely hard parts of a problem before drawing any boxes
  • high-level box-and-arrow design with explicit API surface, components, and data flow
  • choosing a storage approach (SQL, NoSQL, hybrid) with sizing and access-pattern reasoning
  • placing caches, CDNs, load balancers, and message queues where they actually help
  • decomposing components into internal contracts, algorithms, and ownership
  • data model design with entities, relationships, indexing, and partitioning keys
  • consistency, concurrency, and transaction-boundary decisions
  • scaling the hot path by 10x and 100x using reasoning rather than guessing
  • reasoning about failure by asking "what happens when X dies?" for every X
  • spotting bottlenecks and single points of failure before the reviewer does
  • the four-phase interview/review structure and how to time-box it
  • articulating trade-offs in writing: "I chose A over B because..."
  • producing a design document a senior reviewer would actually approve

What it deliberately does not try to finish here:

  • deep microservice decomposition and service-boundary theory (that is Module 2)
  • full event-driven architecture, sagas, and messaging patterns (Module 3)
  • deep reliability engineering, SLI/SLO math, and capacity modeling (Module 4)
  • organizational influence, ADRs at scale, and leadership strategy (Module 5)

This module is the methodology spine the other four modules in this semester plug into.


Before You Start

Answer these closed-book before starting the main path:

  1. Given a vague prompt like "design Twitter", what is the first thing you should clarify, and why?
  2. If a system has 100 million daily active users who each post 5 times a day at 300 bytes per post, roughly how much write storage does that generate per day?
  3. What is the difference between latency and throughput, and why can a system improve one without improving the other?
  4. What is the difference between a bottleneck and a single point of failure?
  5. Why is "SQL vs NoSQL" usually the wrong question to start with?

Diagnostic Interpretation

4-5 solid answers

  • You are ready for the full path. Expect Cluster 3 and Cluster 4 to still challenge you under time.

2-3 solid answers

  • Continue, but plan extra time on Cluster 1 estimation and Cluster 4 scaling/failure reasoning.

0-1 solid answers

  • Revisit S6 distributed-systems fundamentals and S7 architecture-characteristics reasoning before starting. This module assumes you already know what consistency, availability, and coupling mean in the abstract.

What This Module Is For

Every later design interview, architecture review, and ADR you write in this program follows the same shape. Without a method, you will:

  • jump to solutions before understanding the problem
  • pick databases by habit instead of access pattern
  • forget to stress-test your own design
  • lose the room by narrating implementation before stating the contract
  • produce design docs that senior reviewers cannot approve because the trade-offs are not written down

This module builds the methodology needed for:

  • S8M2 microservice decomposition (you cannot decompose what you have not framed)
  • S8M3 event-driven architecture (the methodology tells you when events win)
  • S8M4 scale/reliability/performance (Cluster 4 here is the seed)
  • S8M5 technical leadership (Cluster 5 here is the seed)
  • the capstone design interview and the semester project

You are learning to design under uncertainty, on a whiteboard, against a clock, in front of an audience that disagrees.


Concept Map


How To Use This Module

Work in order. The method compounds: you cannot stress-test a design whose data flow you have not drawn, and you cannot draw a data flow you have not estimated.

Cluster 1: Frame the Problem

OrderConceptTypeFocus
1Understanding the RequirementsPRIMARYFunctional, non-functional, and constraints as three different conversations
2Back-of-Envelope EstimationPRIMARYQPS, storage, bandwidth, and latency budgets from first principles
3Identifying the Hard PartsPRIMARYWhat is actually difficult in this problem, and what is boilerplate

Cluster mastery check: For any unfamiliar prompt, can you produce in 10 minutes a written list of functional requirements, three non-functional targets with numbers, three constraints, a back-of-envelope estimate for the hot path, and a ranked list of the two or three genuinely hard parts?

Cluster 2: High-Level Design

OrderConceptTypeFocus
4Draw the Box DiagramPRIMARYAPI surface, core components, and data flow on one sketch
5Choose a Storage ApproachPRIMARYSQL, NoSQL, and hybrid choices driven by access pattern and size
6Place the Caches, CDN, and Load BalancersPRIMARYWhere performance-critical infrastructure earns its keep

Cluster mastery check: Given your Cluster 1 framing, can you produce a box-and-arrow diagram in 10 minutes that names the API surface, the storage choice with reasoning, and every cache, CDN, and LB with a one-sentence justification?

Cluster 3: Deep Dive

OrderConceptTypeFocus
7Decompose Each ComponentPRIMARYInternal contracts, algorithms, and ownership per box
8Data Model DesignPRIMARYEntities, relationships, indexing, and the partitioning key
9Concurrency, Consistency, and Transaction BoundariesPRIMARYChoosing what is atomic, what is eventual, and where the contention lives

Cluster mastery check: For the component the reviewer points at, can you state its inputs, outputs, algorithm, data model, partitioning key, and consistency contract in five minutes?

Cluster 4: Stress Test the Design

OrderConceptTypeFocus
10Scale the Hot Path: 10x and 100x ReasoningPRIMARYWhat breaks first, what breaks next, and what you would change
11Reason About Failure: What Happens When X Dies?PRIMARYKill every box in turn and describe the user-visible result
12Identify Bottlenecks and Single Points of FailurePRIMARYFinding the choke points before a reviewer or an outage does

Cluster mastery check: Can you walk a reviewer through "what fails first at 10x traffic, what fails first in an AZ outage, and what is your current SPOF list" without pausing to invent the answer?

Cluster 5: Communicate and Decide

OrderConceptTypeFocus
13The Four-Phase Interview/Review StructurePRIMARYTime-boxing requirements, high-level design, deep dive, and wrap-up
14Articulating Trade-offs: Why This, Not ThatPRIMARYNaming the alternative you rejected and the cost you accepted
15Producing a Design Doc Worth ReviewingSUPPORTINGSections a senior reviewer expects, and which ones are deal-breakers

Cluster mastery check: Can you deliver a 45-minute design interview that hits all four phases on time, states every major trade-off explicitly, and leaves behind a document another engineer could implement from?

Then work these practice pages:

OrderPractice pathFocus
1Estimation and Framing LabRequirements, numbers, and ranking what is hard
2High-Level Design WorkshopBox diagrams, storage choice, cache placement
3Stress Test Clinic10x, failure injection, SPOF hunt
4Design Interview KatasFour classic problems walked through the full method

Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources only for targeted reinforcement.


Learning Objectives

By the end of this module you should be able to:

  1. Translate a vague prompt into explicit functional requirements, three non-functional targets with numbers, and a named constraint list, within 10 minutes.
  2. Produce back-of-envelope estimates for QPS, storage, bandwidth, and latency budgets using the powers-of-two table and Jeff Dean's latency numbers, without a calculator.
  3. Name the two or three genuinely hard parts of a problem before drawing any boxes.
  4. Draw a high-level box-and-arrow diagram with API surface, components, data flow, storage, caches, CDN, and load balancers.
  5. Choose SQL, NoSQL, or a hybrid with a written justification grounded in access pattern, consistency needs, and size.
  6. Decompose any component in the design into its inputs, outputs, algorithm, data model, partitioning key, and consistency contract.
  7. Walk a 10x and 100x scale scenario and state what breaks first, next, and after that.
  8. Describe the user-visible effect of the death of any single component and identify all current SPOFs.
  9. Run the four-phase design process (requirements, high-level, deep dive, wrap-up) against a timer.
  10. Articulate every major design decision as "I chose A over B because C", and produce a design doc a senior reviewer would approve.

Outputs

  • one written framing artifact for each of at least 4 classic problems: functional reqs, 3 non-functional targets with numbers, constraints, and the "hard parts" list
  • one estimation notebook with at least 8 worked back-of-envelope calculations (QPS, storage per day, cache memory, bandwidth, P99 budget breakdown)
  • at least 4 box-and-arrow diagrams with API surface, components, storage, and cache/CDN/LB placement
  • one data-model sheet per diagram naming entities, relationships, indexes, and the partitioning key
  • one stress-test journal with 10x and 100x walk-throughs and a per-box "what happens when this dies" entry for every design
  • one trade-off log listing at least 12 "A over B because C" decisions with the rejected alternative named
  • one full design doc (sections: problem, requirements, estimates, high-level design, deep dive per component, data model, failure and scale analysis, trade-offs, open questions) for one of the four interview katas
  • one mistake log tagged with errors like jumped to solution, no numbers, sharded on wrong key, ignored hot partition, assumed AZ does not fail, trade-off unstated

Completion Standard

You have completed Module 1 when all of these are true:

  • you can frame, estimate, and rank the hard parts of an unfamiliar prompt in 10 minutes, with numbers
  • you can produce a high-level design with storage and caching justified, in 10 more minutes
  • you can deep-dive any single component the reviewer picks without re-inventing the method
  • you can survive a 10x-and-fail-one-box attack on your design without panic
  • every design decision in your doc is stated as "A over B because C"
  • your written design doc is something you would hand to another engineer and expect them to implement from

If you can draw the boxes but cannot defend any of them under pressure, the module is not complete.


Reading Policy

  • Concept pages are the main path.
  • Local book chunks from System Design Primer and Fundamentals of Software Architecture are selective reinforcement, not a second syllabus.
  • Read only if stuck means try the concept page, self-check, and drill first.
  • Optional deep dive means additional nuance or more breadth, not required progression.
  • Written artifacts (framings, estimates, diagrams, design doc) are required deliverables, not optional enrichment.

Suggested Weekly Flow

DayWork
1Concepts 1-3; produce framing and estimation for one classic prompt
2Concepts 4-6; draw the high-level design for the same prompt
3Concepts 7-9; deep-dive two components and design the data model
4Concepts 10-12; run the 10x walk and the kill-every-box walk on your design
5Concepts 13-15; convert your work into a design doc
6Practice pages 1-3 on one new prompt
7Practice page 4 (all four katas), quiz, mistake-log cleanup

Reference

If you need exact links into the local chunked books, use Reference and Selective Reading.


Build Your Own X — elective

The Front-end Framework tutorial and the Game Engine 2D tutorial both force explicit API surface decisions — the same skill this module exercises in design docs. See Build Your Own X overview.

Rich Learning Pages

Worked Examples | Guided Labs | Case Studies | Mistake Clinic | Reading Guide | Capstone Thread