Module 1: System Design Methodology

Primary texts: System Design Interview (Alex Xu) and the System Design Primer (donnemartin)
Selective support: Fundamentals of Software Architecture (Richards & Ford) for architectural characteristics and trade-off reasoning; S7 (DDD, architecture styles, API design) for domain boundaries and contracts

This guide is the primary teacher. You do not need to read the source books front-to-back to complete this module. You do need to leave this module able to walk into a design discussion on an unfamiliar problem and produce a defensible end-to-end design using a repeatable method.

Scope of This Module

System design is not a vocabulary test. It is a disciplined process for turning an under-specified problem into a defensible architecture under time pressure.

What it covers in depth:

framing a problem into functional requirements, non-functional requirements, and explicit constraints
back-of-envelope estimation for QPS, storage, bandwidth, memory, and latency budgets
identifying the genuinely hard parts of a problem before drawing any boxes
high-level box-and-arrow design with explicit API surface, components, and data flow
choosing a storage approach (SQL, NoSQL, hybrid) with sizing and access-pattern reasoning
placing caches, CDNs, load balancers, and message queues where they actually help
decomposing components into internal contracts, algorithms, and ownership
data model design with entities, relationships, indexing, and partitioning keys
consistency, concurrency, and transaction-boundary decisions
scaling the hot path by 10x and 100x using reasoning rather than guessing
reasoning about failure by asking "what happens when X dies?" for every X
spotting bottlenecks and single points of failure before the reviewer does
the four-phase interview/review structure and how to time-box it
articulating trade-offs in writing: "I chose A over B because..."
producing a design document a senior reviewer would actually approve

What it deliberately does not try to finish here:

deep microservice decomposition and service-boundary theory (that is Module 2)
full event-driven architecture, sagas, and messaging patterns (Module 3)
deep reliability engineering, SLI/SLO math, and capacity modeling (Module 4)
organizational influence, ADRs at scale, and leadership strategy (Module 5)

This module is the methodology spine the other four modules in this semester plug into.

Before You Start

Answer these closed-book before starting the main path:

Given a vague prompt like "design Twitter", what is the first thing you should clarify, and why?
If a system has 100 million daily active users who each post 5 times a day at 300 bytes per post, roughly how much write storage does that generate per day?
What is the difference between latency and throughput, and why can a system improve one without improving the other?
What is the difference between a bottleneck and a single point of failure?
Why is "SQL vs NoSQL" usually the wrong question to start with?

Diagnostic Interpretation

4-5 solid answers

You are ready for the full path. Expect Cluster 3 and Cluster 4 to still challenge you under time.

2-3 solid answers

Continue, but plan extra time on Cluster 1 estimation and Cluster 4 scaling/failure reasoning.

0-1 solid answers

Revisit S6 distributed-systems fundamentals and S7 architecture-characteristics reasoning before starting. This module assumes you already know what consistency, availability, and coupling mean in the abstract.

What This Module Is For

Every later design interview, architecture review, and ADR you write in this program follows the same shape. Without a method, you will:

jump to solutions before understanding the problem
pick databases by habit instead of access pattern
forget to stress-test your own design
lose the room by narrating implementation before stating the contract
produce design docs that senior reviewers cannot approve because the trade-offs are not written down

This module builds the methodology needed for:

S8M2 microservice decomposition (you cannot decompose what you have not framed)
S8M3 event-driven architecture (the methodology tells you when events win)
S8M4 scale/reliability/performance (Cluster 4 here is the seed)
S8M5 technical leadership (Cluster 5 here is the seed)
the capstone design interview and the semester project

You are learning to design under uncertainty, on a whiteboard, against a clock, in front of an audience that disagrees.

Concept Map

How To Use This Module

Work in order. The method compounds: you cannot stress-test a design whose data flow you have not drawn, and you cannot draw a data flow you have not estimated.

Cluster 1: Frame the Problem

Order	Concept	Type	Focus
1	Understanding the Requirements	PRIMARY	Functional, non-functional, and constraints as three different conversations
2	Back-of-Envelope Estimation	PRIMARY	QPS, storage, bandwidth, and latency budgets from first principles
3	Identifying the Hard Parts	PRIMARY	What is actually difficult in this problem, and what is boilerplate

Cluster mastery check: For any unfamiliar prompt, can you produce in 10 minutes a written list of functional requirements, three non-functional targets with numbers, three constraints, a back-of-envelope estimate for the hot path, and a ranked list of the two or three genuinely hard parts?

Cluster 2: High-Level Design

Order	Concept	Type	Focus
4	Draw the Box Diagram	PRIMARY	API surface, core components, and data flow on one sketch
5	Choose a Storage Approach	PRIMARY	SQL, NoSQL, and hybrid choices driven by access pattern and size
6	Place the Caches, CDN, and Load Balancers	PRIMARY	Where performance-critical infrastructure earns its keep

Cluster mastery check: Given your Cluster 1 framing, can you produce a box-and-arrow diagram in 10 minutes that names the API surface, the storage choice with reasoning, and every cache, CDN, and LB with a one-sentence justification?

Cluster 3: Deep Dive

Order	Concept	Type	Focus
7	Decompose Each Component	PRIMARY	Internal contracts, algorithms, and ownership per box
8	Data Model Design	PRIMARY	Entities, relationships, indexing, and the partitioning key
9	Concurrency, Consistency, and Transaction Boundaries	PRIMARY	Choosing what is atomic, what is eventual, and where the contention lives

Cluster mastery check: For the component the reviewer points at, can you state its inputs, outputs, algorithm, data model, partitioning key, and consistency contract in five minutes?

Cluster 4: Stress Test the Design

Order	Concept	Type	Focus
10	Scale the Hot Path: 10x and 100x Reasoning	PRIMARY	What breaks first, what breaks next, and what you would change
11	Reason About Failure: What Happens When X Dies?	PRIMARY	Kill every box in turn and describe the user-visible result
12	Identify Bottlenecks and Single Points of Failure	PRIMARY	Finding the choke points before a reviewer or an outage does

Cluster mastery check: Can you walk a reviewer through "what fails first at 10x traffic, what fails first in an AZ outage, and what is your current SPOF list" without pausing to invent the answer?

Cluster 5: Communicate and Decide

Order	Concept	Type	Focus
13	The Four-Phase Interview/Review Structure	PRIMARY	Time-boxing requirements, high-level design, deep dive, and wrap-up
14	Articulating Trade-offs: Why This, Not That	PRIMARY	Naming the alternative you rejected and the cost you accepted
15	Producing a Design Doc Worth Reviewing	SUPPORTING	Sections a senior reviewer expects, and which ones are deal-breakers

Cluster mastery check: Can you deliver a 45-minute design interview that hits all four phases on time, states every major trade-off explicitly, and leaves behind a document another engineer could implement from?

Then work these practice pages:

Order	Practice path	Focus
1	Estimation and Framing Lab	Requirements, numbers, and ranking what is hard
2	High-Level Design Workshop	Box diagrams, storage choice, cache placement
3	Stress Test Clinic	10x, failure injection, SPOF hunt
4	Design Interview Katas	Four classic problems walked through the full method

Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources only for targeted reinforcement.

Learning Objectives

By the end of this module you should be able to:

Translate a vague prompt into explicit functional requirements, three non-functional targets with numbers, and a named constraint list, within 10 minutes.
Produce back-of-envelope estimates for QPS, storage, bandwidth, and latency budgets using the powers-of-two table and Jeff Dean's latency numbers, without a calculator.
Name the two or three genuinely hard parts of a problem before drawing any boxes.
Draw a high-level box-and-arrow diagram with API surface, components, data flow, storage, caches, CDN, and load balancers.
Choose SQL, NoSQL, or a hybrid with a written justification grounded in access pattern, consistency needs, and size.
Decompose any component in the design into its inputs, outputs, algorithm, data model, partitioning key, and consistency contract.
Walk a 10x and 100x scale scenario and state what breaks first, next, and after that.
Describe the user-visible effect of the death of any single component and identify all current SPOFs.
Run the four-phase design process (requirements, high-level, deep dive, wrap-up) against a timer.
Articulate every major design decision as "I chose A over B because C", and produce a design doc a senior reviewer would approve.

Outputs

one written framing artifact for each of at least 4 classic problems: functional reqs, 3 non-functional targets with numbers, constraints, and the "hard parts" list
one estimation notebook with at least 8 worked back-of-envelope calculations (QPS, storage per day, cache memory, bandwidth, P99 budget breakdown)
at least 4 box-and-arrow diagrams with API surface, components, storage, and cache/CDN/LB placement
one data-model sheet per diagram naming entities, relationships, indexes, and the partitioning key
one stress-test journal with 10x and 100x walk-throughs and a per-box "what happens when this dies" entry for every design
one trade-off log listing at least 12 "A over B because C" decisions with the rejected alternative named
one full design doc (sections: problem, requirements, estimates, high-level design, deep dive per component, data model, failure and scale analysis, trade-offs, open questions) for one of the four interview katas
one mistake log tagged with errors like jumped to solution, no numbers, sharded on wrong key, ignored hot partition, assumed AZ does not fail, trade-off unstated

Completion Standard

You have completed Module 1 when all of these are true:

you can frame, estimate, and rank the hard parts of an unfamiliar prompt in 10 minutes, with numbers
you can produce a high-level design with storage and caching justified, in 10 more minutes
you can deep-dive any single component the reviewer picks without re-inventing the method
you can survive a 10x-and-fail-one-box attack on your design without panic
every design decision in your doc is stated as "A over B because C"
your written design doc is something you would hand to another engineer and expect them to implement from

If you can draw the boxes but cannot defend any of them under pressure, the module is not complete.

Reading Policy

Concept pages are the main path.
Local book chunks from System Design Primer and Fundamentals of Software Architecture are selective reinforcement, not a second syllabus.
Read only if stuck means try the concept page, self-check, and drill first.
Optional deep dive means additional nuance or more breadth, not required progression.
Written artifacts (framings, estimates, diagrams, design doc) are required deliverables, not optional enrichment.

Suggested Weekly Flow

Day	Work
1	Concepts 1-3; produce framing and estimation for one classic prompt
2	Concepts 4-6; draw the high-level design for the same prompt
3	Concepts 7-9; deep-dive two components and design the data model
4	Concepts 10-12; run the 10x walk and the kill-every-box walk on your design
5	Concepts 13-15; convert your work into a design doc
6	Practice pages 1-3 on one new prompt
7	Practice page 4 (all four katas), quiz, mistake-log cleanup

Reference

If you need exact links into the local chunked books, use Reference and Selective Reading.

Build Your Own X — elective

The Front-end Framework tutorial and the Game Engine 2D tutorial both force explicit API surface decisions — the same skill this module exercises in design docs. See Build Your Own X overview.

Rich Learning Pages

Scope of This Module​

Before You Start​

Diagnostic Interpretation​

What This Module Is For​

Concept Map​

How To Use This Module​

Cluster 1: Frame the Problem​

Cluster 2: High-Level Design​

Cluster 3: Deep Dive​

Cluster 4: Stress Test the Design​

Cluster 5: Communicate and Decide​

Learning Objectives​

Outputs​

Completion Standard​

Reading Policy​

Suggested Weekly Flow​

Reference​

Rich Learning Pages​