Module 1: System Design Methodology
Primary texts: System Design Interview (Alex Xu) and the System Design Primer (donnemartin)
Selective support: Fundamentals of Software Architecture (Richards & Ford) for architectural characteristics and trade-off reasoning; S7 (DDD, architecture styles, API design) for domain boundaries and contracts
This guide is the primary teacher. You do not need to read the source books front-to-back to complete this module. You do need to leave this module able to walk into a design discussion on an unfamiliar problem and produce a defensible end-to-end design using a repeatable method.
Scope of This Module
System design is not a vocabulary test. It is a disciplined process for turning an under-specified problem into a defensible architecture under time pressure.
What it covers in depth:
- framing a problem into functional requirements, non-functional requirements, and explicit constraints
- back-of-envelope estimation for QPS, storage, bandwidth, memory, and latency budgets
- identifying the genuinely hard parts of a problem before drawing any boxes
- high-level box-and-arrow design with explicit API surface, components, and data flow
- choosing a storage approach (SQL, NoSQL, hybrid) with sizing and access-pattern reasoning
- placing caches, CDNs, load balancers, and message queues where they actually help
- decomposing components into internal contracts, algorithms, and ownership
- data model design with entities, relationships, indexing, and partitioning keys
- consistency, concurrency, and transaction-boundary decisions
- scaling the hot path by 10x and 100x using reasoning rather than guessing
- reasoning about failure by asking "what happens when X dies?" for every X
- spotting bottlenecks and single points of failure before the reviewer does
- the four-phase interview/review structure and how to time-box it
- articulating trade-offs in writing: "I chose A over B because..."
- producing a design document a senior reviewer would actually approve
What it deliberately does not try to finish here:
- deep microservice decomposition and service-boundary theory (that is Module 2)
- full event-driven architecture, sagas, and messaging patterns (Module 3)
- deep reliability engineering, SLI/SLO math, and capacity modeling (Module 4)
- organizational influence, ADRs at scale, and leadership strategy (Module 5)
This module is the methodology spine the other four modules in this semester plug into.
Before You Start
Answer these closed-book before starting the main path:
- Given a vague prompt like "design Twitter", what is the first thing you should clarify, and why?
- If a system has 100 million daily active users who each post 5 times a day at 300 bytes per post, roughly how much write storage does that generate per day?
- What is the difference between latency and throughput, and why can a system improve one without improving the other?
- What is the difference between a bottleneck and a single point of failure?
- Why is "SQL vs NoSQL" usually the wrong question to start with?
Diagnostic Interpretation
4-5 solid answers
- You are ready for the full path. Expect Cluster 3 and Cluster 4 to still challenge you under time.
2-3 solid answers
- Continue, but plan extra time on Cluster 1 estimation and Cluster 4 scaling/failure reasoning.
0-1 solid answers
- Revisit S6 distributed-systems fundamentals and S7 architecture-characteristics reasoning before starting. This module assumes you already know what consistency, availability, and coupling mean in the abstract.
What This Module Is For
Every later design interview, architecture review, and ADR you write in this program follows the same shape. Without a method, you will:
- jump to solutions before understanding the problem
- pick databases by habit instead of access pattern
- forget to stress-test your own design
- lose the room by narrating implementation before stating the contract
- produce design docs that senior reviewers cannot approve because the trade-offs are not written down
This module builds the methodology needed for:
- S8M2 microservice decomposition (you cannot decompose what you have not framed)
- S8M3 event-driven architecture (the methodology tells you when events win)
- S8M4 scale/reliability/performance (Cluster 4 here is the seed)
- S8M5 technical leadership (Cluster 5 here is the seed)
- the capstone design interview and the semester project
You are learning to design under uncertainty, on a whiteboard, against a clock, in front of an audience that disagrees.
Concept Map
How To Use This Module
Work in order. The method compounds: you cannot stress-test a design whose data flow you have not drawn, and you cannot draw a data flow you have not estimated.
Cluster 1: Frame the Problem
| Order | Concept | Type | Focus |
|---|---|---|---|
| 1 | Understanding the Requirements | PRIMARY | Functional, non-functional, and constraints as three different conversations |
| 2 | Back-of-Envelope Estimation | PRIMARY | QPS, storage, bandwidth, and latency budgets from first principles |
| 3 | Identifying the Hard Parts | PRIMARY | What is actually difficult in this problem, and what is boilerplate |
Cluster mastery check: For any unfamiliar prompt, can you produce in 10 minutes a written list of functional requirements, three non-functional targets with numbers, three constraints, a back-of-envelope estimate for the hot path, and a ranked list of the two or three genuinely hard parts?
Cluster 2: High-Level Design
| Order | Concept | Type | Focus |
|---|---|---|---|
| 4 | Draw the Box Diagram | PRIMARY | API surface, core components, and data flow on one sketch |
| 5 | Choose a Storage Approach | PRIMARY | SQL, NoSQL, and hybrid choices driven by access pattern and size |
| 6 | Place the Caches, CDN, and Load Balancers | PRIMARY | Where performance-critical infrastructure earns its keep |
Cluster mastery check: Given your Cluster 1 framing, can you produce a box-and-arrow diagram in 10 minutes that names the API surface, the storage choice with reasoning, and every cache, CDN, and LB with a one-sentence justification?
Cluster 3: Deep Dive
| Order | Concept | Type | Focus |
|---|---|---|---|
| 7 | Decompose Each Component | PRIMARY | Internal contracts, algorithms, and ownership per box |
| 8 | Data Model Design | PRIMARY | Entities, relationships, indexing, and the partitioning key |
| 9 | Concurrency, Consistency, and Transaction Boundaries | PRIMARY | Choosing what is atomic, what is eventual, and where the contention lives |
Cluster mastery check: For the component the reviewer points at, can you state its inputs, outputs, algorithm, data model, partitioning key, and consistency contract in five minutes?
Cluster 4: Stress Test the Design
| Order | Concept | Type | Focus |
|---|---|---|---|
| 10 | Scale the Hot Path: 10x and 100x Reasoning | PRIMARY | What breaks first, what breaks next, and what you would change |
| 11 | Reason About Failure: What Happens When X Dies? | PRIMARY | Kill every box in turn and describe the user-visible result |
| 12 | Identify Bottlenecks and Single Points of Failure | PRIMARY | Finding the choke points before a reviewer or an outage does |
Cluster mastery check: Can you walk a reviewer through "what fails first at 10x traffic, what fails first in an AZ outage, and what is your current SPOF list" without pausing to invent the answer?
Cluster 5: Communicate and Decide
| Order | Concept | Type | Focus |
|---|---|---|---|
| 13 | The Four-Phase Interview/Review Structure | PRIMARY | Time-boxing requirements, high-level design, deep dive, and wrap-up |
| 14 | Articulating Trade-offs: Why This, Not That | PRIMARY | Naming the alternative you rejected and the cost you accepted |
| 15 | Producing a Design Doc Worth Reviewing | SUPPORTING | Sections a senior reviewer expects, and which ones are deal-breakers |
Cluster mastery check: Can you deliver a 45-minute design interview that hits all four phases on time, states every major trade-off explicitly, and leaves behind a document another engineer could implement from?
Then work these practice pages:
| Order | Practice path | Focus |
|---|---|---|
| 1 | Estimation and Framing Lab | Requirements, numbers, and ranking what is hard |
| 2 | High-Level Design Workshop | Box diagrams, storage choice, cache placement |
| 3 | Stress Test Clinic | 10x, failure injection, SPOF hunt |
| 4 | Design Interview Katas | Four classic problems walked through the full method |
Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources only for targeted reinforcement.
Learning Objectives
By the end of this module you should be able to:
- Translate a vague prompt into explicit functional requirements, three non-functional targets with numbers, and a named constraint list, within 10 minutes.
- Produce back-of-envelope estimates for QPS, storage, bandwidth, and latency budgets using the powers-of-two table and Jeff Dean's latency numbers, without a calculator.
- Name the two or three genuinely hard parts of a problem before drawing any boxes.
- Draw a high-level box-and-arrow diagram with API surface, components, data flow, storage, caches, CDN, and load balancers.
- Choose SQL, NoSQL, or a hybrid with a written justification grounded in access pattern, consistency needs, and size.
- Decompose any component in the design into its inputs, outputs, algorithm, data model, partitioning key, and consistency contract.
- Walk a 10x and 100x scale scenario and state what breaks first, next, and after that.
- Describe the user-visible effect of the death of any single component and identify all current SPOFs.
- Run the four-phase design process (requirements, high-level, deep dive, wrap-up) against a timer.
- Articulate every major design decision as "I chose A over B because C", and produce a design doc a senior reviewer would approve.
Outputs
- one written framing artifact for each of at least 4 classic problems: functional reqs, 3 non-functional targets with numbers, constraints, and the "hard parts" list
- one estimation notebook with at least 8 worked back-of-envelope calculations (QPS, storage per day, cache memory, bandwidth, P99 budget breakdown)
- at least 4 box-and-arrow diagrams with API surface, components, storage, and cache/CDN/LB placement
- one data-model sheet per diagram naming entities, relationships, indexes, and the partitioning key
- one stress-test journal with 10x and 100x walk-throughs and a per-box "what happens when this dies" entry for every design
- one trade-off log listing at least 12 "A over B because C" decisions with the rejected alternative named
- one full design doc (sections: problem, requirements, estimates, high-level design, deep dive per component, data model, failure and scale analysis, trade-offs, open questions) for one of the four interview katas
- one mistake log tagged with errors like
jumped to solution,no numbers,sharded on wrong key,ignored hot partition,assumed AZ does not fail,trade-off unstated
Completion Standard
You have completed Module 1 when all of these are true:
- you can frame, estimate, and rank the hard parts of an unfamiliar prompt in 10 minutes, with numbers
- you can produce a high-level design with storage and caching justified, in 10 more minutes
- you can deep-dive any single component the reviewer picks without re-inventing the method
- you can survive a 10x-and-fail-one-box attack on your design without panic
- every design decision in your doc is stated as "A over B because C"
- your written design doc is something you would hand to another engineer and expect them to implement from
If you can draw the boxes but cannot defend any of them under pressure, the module is not complete.
Reading Policy
- Concept pages are the main path.
- Local book chunks from System Design Primer and Fundamentals of Software Architecture are selective reinforcement, not a second syllabus.
Read only if stuckmeans try the concept page, self-check, and drill first.Optional deep divemeans additional nuance or more breadth, not required progression.- Written artifacts (framings, estimates, diagrams, design doc) are required deliverables, not optional enrichment.
Suggested Weekly Flow
| Day | Work |
|---|---|
| 1 | Concepts 1-3; produce framing and estimation for one classic prompt |
| 2 | Concepts 4-6; draw the high-level design for the same prompt |
| 3 | Concepts 7-9; deep-dive two components and design the data model |
| 4 | Concepts 10-12; run the 10x walk and the kill-every-box walk on your design |
| 5 | Concepts 13-15; convert your work into a design doc |
| 6 | Practice pages 1-3 on one new prompt |
| 7 | Practice page 4 (all four katas), quiz, mistake-log cleanup |
Reference
If you need exact links into the local chunked books, use Reference and Selective Reading.
The Front-end Framework tutorial and the Game Engine 2D tutorial both force explicit API surface decisions — the same skill this module exercises in design docs. See Build Your Own X overview.
Rich Learning Pages
Worked Examples | Guided Labs | Case Studies | Mistake Clinic | Reading Guide | Capstone Thread