Learning Resources
This module is populated from the local chunked books in library/raw/semester-08-system-design-leadership/books. Use this page as a source map, not as an instruction to read everything.
Source Stack
| Book / Source | Role | How to use it in this module |
|---|---|---|
| System Design Interview (Alex Xu) | Primary teaching reference for the end-to-end method | Cross-reference for concept pages and the four katas; Chapters 3-7 track Clusters 1-4 |
| System Design Primer (donnemartin) | Primary local chunked source | Default escalation target for API-style questions, cache/LB/CDN placement, storage family trade-offs, and latency numbers |
| Fundamentals of Software Architecture (Richards & Ford) | Selective support | Go here for architecture-characteristics language, trade-off framing, and component-based thinking; Chapters 1-4 are the most useful |
| S7 Module 3 (DDD) and S7 Module 4 (API Design) | Prerequisite knowledge | Framing uses bounded contexts; deep-dive component contracts reuse API design habits |
Resource Map by Cluster
Cluster 1: Frame the Problem
| Need | Best local chunk | Why |
|---|---|---|
| overall interview shape and framing | SDP 01: How to approach a system design interview | Best compact opener; maps onto the four-phase structure |
| performance vs scalability vocabulary | SDP 02: Performance vs scalability | Forces the distinction that framing depends on |
| latency vs throughput vocabulary | SDP 03: Latency vs throughput | Short, clarifies a common candidate error |
| back-of-envelope numbers | SDP 23: Powers of two and latency numbers | Primary estimation reference; Jeff Dean numbers in the Chapter |
| architecture characteristics vocabulary | FoSA: Architecture characteristics defined | Best formal treatment of non-functional requirements |
| extracting characteristics from requirements | FoSA: Extracting characteristics from requirements | Shows how the requirements list becomes architecture pressure |
Cluster 2: High-Level Design
| Need | Best local chunk | Why |
|---|---|---|
| real-world architectures to pattern-match | SDP 25: Real-world architectures | The best short catalog of proven shapes |
| application tier and microservices | SDP 11: Application layer and microservices | Compact reading on stateless app tiers |
| SQL vs NoSQL decision | SDP 15: SQL or NoSQL | Short decision table; pairs with the storage concept |
| SQL replication fundamentals | SDP 12: Database RDBMS replication | For the "why replicas" conversation |
| NoSQL family overview | SDP 14: NoSQL databases | KV, document, wide-column, graph in one place |
| sharding and federation | SDP 13: Database federation and sharding | Essential before 100× scaling conversations |
| caching overview | SDP 16: Cache overview and levels | Taxonomy of cache layers |
| cache update patterns | SDP 17: Cache update patterns | Cache-aside, write-through, write-back, refresh-ahead |
| load balancers | SDP 09: Load balancer | L4/L7, health checks, anycast |
| reverse proxies | SDP 10: Reverse proxy | Distinguishes from LBs |
| CDN | SDP 08: Content delivery network | Push vs pull CDN, invalidation |
| component-based thinking | FoSA: Component-based thinking | Box selection discipline |
Cluster 3: Deep Dive
| Need | Best local chunk | Why |
|---|---|---|
| async decoupling and queues | SDP 18: Asynchronism | When to put a queue between components |
| consistency patterns | SDP 05: Consistency patterns | Strict, eventual, weak; read/write guarantees |
| availability patterns | SDP 06: Availability patterns | Fail-over, replication, health-check patterns |
| CAP for design conversations | SDP 04: CAP theorem | Short, correct framing for partition behaviour |
| data model discipline | FoSA: Database partitioning | Logical vs physical partitioning discussions |
| connascence | FoSA: Connascence | Vocabulary for "what does a change to one component force in another" |
| discovering components | FoSA: Discovering components | Actor/action and workflow-based decomposition |
Cluster 4: Stress Test the Design
| Need | Best local chunk | Why |
|---|---|---|
| how large-scale systems scale | SDP 25: Real-world architectures | Reuse; shows 100× designs at real companies |
| availability patterns for failure walk | SDP 06: Availability patterns | Fail-over classes and their TTR |
| database sharding for hot partitions | SDP 13: Database federation and sharding | Primary reference for partition-key discussions |
| scalability fundamentals | SDP 02: Performance vs scalability | Bottleneck analysis vocabulary |
| analyzing trade-offs under scale | FoSA: Analyzing trade-offs | The right language for "we accept this degradation because..." |
Cluster 5: Communicate and Decide
| Need | Best local chunk | Why |
|---|---|---|
| interview-question catalog | SDP 24: System design interview questions | Bank of prompts for self-run katas |
| the interview-approach structure | SDP 01: How to approach a system design interview | Re-read now through the lens of the four phases |
| architectural thinking | FoSA: Architectural thinking | Senior-reviewer mindset; trade-off framing |
| trade-off analysis | FoSA: Analyzing trade-offs | Primary support for concept 14 |
| identifying characteristics | FoSA: Identifying architectural characteristics | For the "what is this design optimizing for" paragraph |
External Resources
Use these sparingly; the primary path is the concept pages plus local chunks. External links are included only when they add something the local material cannot: authoritative numbers, real-world cases, or up-to-date reference architectures.
Canonical Numbers & Primers
- donnemartin/system-design-primer on GitHub -- upstream repo for the primer chunks. Validated.
- Jeff Dean's "Latency numbers every programmer should know" (jboner gist) -- canonical latency numbers used across Cluster 1 estimation. Validated.
- Jeff Dean "Numbers everyone should know" (Brenoc mirror) -- summary with commentary.
- Amazon Builders' Library -- Timeouts, retries, and backoff with jitter -- operational discipline for failure walks (Cluster 4). Validated.
- Amazon Builders' Library -- Beyond five 9s -- senior-voice trade-off write-ups. Validated.
- Amazon Builders' Library -- Static stability using Availability Zones -- SPOF avoidance across AZs. Validated.
Reference Architectures & Case Studies
- High Scalability (highscalability.com) -- real-world architecture case studies for pattern-matching 10×/100× walks and failure scenarios. Validated.
- AWS Architecture Center -- reference architectures. Validated.
- AWS Well-Architected Framework -- Reliability Pillar -- availability and SPOF checklist. Validated.
- Google Cloud Architecture Framework -- Reliability -- graceful degradation emphasis. Validated.
- Google SRE Workbook -- Managing risk -- SPOF/bottleneck ranking against error budget. Validated.
- Azure Architecture Center -- complementary reference designs. Validated.
Blogs & Deep-Dive Sources
- ByteByteGo Blog (Alex Xu) -- compact newsletter/blog covering estimation, scaling, interview framing. Validated.
- ByteByteGo -- System Design Interview Framework -- Alex Xu's four-step interview framework (Cluster 5). Validated.
- Martin Fowler -- Software Architecture Guide -- review-friendly architecture documentation and style. Validated.
- Martin Fowler -- Catalog of Patterns of Distributed Systems -- patterns referenced in Cluster 3 and 4 (Fixed Partitions, Follower Reads, Request Batch). Validated.
- Martin Fowler -- Polyglot Persistence -- Cluster 2 storage approach. Validated.
- Jepsen -- consistency-model hierarchy and real-world distributed-systems test reports (Cluster 3 concept 9). Validated.
- Principles of Chaos Engineering -- failure as a control variable (Cluster 4 concept 11). Validated.
- Stripe -- Engineering Writing -- voice and compression for design docs (Cluster 5 concept 15). Validated.
- Google -- How to write a design doc -- canonical reference for engineering design-doc practice. Validated.
- Michael Nygard -- Documenting Architecture Decisions (ADRs) -- ADR template for trade-off articulation. Validated.
- ThoughtWorks Technology Radar -- industry-scale trade-off statements. Validated.
- Netflix Tech Blog -- real 10×/100× transitions and chaos engineering practice. Validated.
- AWS DynamoDB -- NoSQL data modeling -- access-pattern-first data modeling (Cluster 3 concept 8). Validated.
Optional Deep Dive
- Google "Designs, Lessons and Advice from Building Large Distributed Systems" -- Jeff Dean (PDF) -- foundational talk referenced across many primer chunks.
- Software Engineering Advice from Building Large-Scale Distributed Systems -- Jeff Dean (PDF) -- expansion of the latency-numbers talk with building-block guidance.
Use Rules
- If the question is "what order should I do things in", the concept pages answer it. Do not start a design session by opening external resources.
- If the question is "what specifically does this pattern look like at a big company", use High Scalability or AWS Architecture Center.
- If the question is "what number should I use here", go to Jeff Dean's numbers first.
- Open one chunk for one concept gap; do not chain-read the whole book.
- If rereading is not fixing the problem, stop reading and redo the relevant drill from scratch -- the fluency gap is practice, not text.