Module 1: System Design Methodology: Case Studies
These case studies train the system-design loop: clarify, estimate, identify hard parts, choose a high-level design, stress it, and defend tradeoffs with evidence.
Case Study 1: Dropbox Magic Pocket As Storage Design Method
Scenario: Design a file-storage backend for billions of immutable chunks. The naive answer is "put files in object storage." The real design has chunking, metadata, durability, repair, encryption, and cost constraints.
Source anchor: Dropbox's engineering writeup Inside the Magic Pocket, which describes Magic Pocket as an immutable block storage system for encrypted file chunks.
Module concepts: requirements clarification, storage estimation, metadata/data split, durability, repair, cost tradeoff.
Wrong Approach
Start drawing boxes before naming the object model and durability target.
Better Approach
Clarify:
Object: immutable encrypted block
Max block size:
Metadata lookup:
Durability target:
Read/write ratio:
Repair/re-replication:
Cost per stored TB:
Then separate metadata path from blob-storage path and estimate chunk count, storage growth, replication overhead, and repair bandwidth.
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| object storage provider | low ops burden | vendor cost/control tradeoff |
| custom storage fleet | cost/control at scale | operational complexity |
| large chunks | less metadata | poor dedupe and partial update behavior |
| small chunks | better dedupe/repair | more metadata and lookup pressure |
Required Artifact
Produce a one-page design brief: requirements, rough capacity math, data model, hot path, failure path, and three rejected alternatives.
Case Study 2: News Feed Fanout Under Vague Requirements
Scenario: A prompt says "design a social feed." Some users follow 50 people; some follow 50 million. A single feed strategy will punish one side.
Source anchor: Meta's TAO: Facebook's Distributed Data Store for the Social Graph is a useful anchor for read-heavy social workloads where feed and graph access patterns dominate design choices.
Module concepts: clarifying questions, fanout-on-write, fanout-on-read, celebrity users, cache, backfill.
Wrong Approach
"Use Kafka and Redis" before defining read/write shape.
Better Approach
Classify users:
Normal producer:
fanout-on-write to follower timelines
Celebrity producer:
fanout-on-read or hybrid injection
Consumer read:
cached timeline with cursor pagination
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| fanout-on-write | fast reads | huge celebrity write amplification |
| fanout-on-read | cheap writes | expensive reads |
| hybrid | handles skew | more logic and cache invalidation |
| ranked feed | product relevance | model freshness and explainability |
Required Artifact
Write the fanout decision with QPS estimates, celebrity threshold, cache key, pagination contract, and backfill plan.
Case Study 3: Rate Limiter Design Beyond A Counter
Scenario: An API gateway must protect login, payments, and public search. Each endpoint has different abuse and fairness requirements.
Source anchor: Cloudflare's rate limiting rules documentation is a concrete anchor for matching rate-limiting behavior to endpoint-specific abuse and fairness requirements.
Module concepts: functional requirements, NFRs, data structure choice, edge/global consistency, failure behavior.
Wrong Approach
Use one global requests_per_minute counter for every endpoint.
Better Approach
Design per operation:
Login:
per account + per IP, strict abuse control
Payments:
per customer + idempotency key, low false positives
Search:
per API key, burst tolerance
Tradeoff Table
| Algorithm | Gain | Cost |
|---|---|---|
| fixed window | simple | boundary bursts |
| sliding window | smoother | more state |
| token bucket | burst tolerant | tuning complexity |
| distributed global limiter | consistent policy | latency and availability cost |
Required Artifact
Create a limiter design: key dimensions, algorithm, storage, TTL, edge behavior, fail-open/fail-closed decision.
Case Study 4: URL Shortener With Abuse And Analytics
Scenario: A basic URL shortener is easy. A production shortener must handle custom aliases, abuse reports, analytics, deletion, and redirects with very low latency.
Source anchor: Bitly's URL shortener product documentation is a lightweight product anchor for the real operational concerns behind short-link systems: custom aliases, redirects, analytics, and link lifecycle.
Module concepts: API design, ID generation, read-heavy system, cache, analytics pipeline, abuse workflow.
Wrong Approach
Only design POST /shorten and GET /{code}.
Better Approach
Include:
Write path:
generate code, validate custom alias, store destination
Read path:
cache lookup, redirect, async analytics event
Safety:
malware/phishing flag
takedown workflow
deletion semantics
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| random code | simple distribution | collision handling |
| sequential code | compact | enumeration risk |
| cache redirects | low latency | invalidation after takedown |
| async analytics | fast redirect | eventual analytics |
Required Artifact
Draw the read/write path and write a failure-mode table for cache outage, DB outage, analytics lag, and abuse takedown.
Case Study 5: Design Review That Misses The Hard Part
Scenario: A candidate designs a chat system and spends all the time on WebSocket servers. They never discuss offline delivery, ordering, fanout, presence, or multi-device state.
Source anchor: The Matrix Client-Server API is a useful concrete anchor for message send, sync, and multi-device conversation state in chat-like systems.
Module concepts: hard-part identification, tradeoff framing, state ownership, ordering, offline sync.
Wrong Approach
Optimize the obvious component while ignoring correctness questions.
Better Approach
Call out hard parts:
Message order:
per conversation sequence
Delivery:
online push + offline inbox
Presence:
ephemeral, best-effort
Multi-device:
read receipt and sync model
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| global ordering | simple story | expensive and unnecessary |
| per-conversation ordering | matches user model | partitioning by conversation |
| best-effort presence | scalable | occasionally stale |
| durable inbox | reliable delivery | storage and sync complexity |
Required Artifact
Write a "hard parts first" design outline with the top five risks and the component responsible for each.
Source Map
| Source | Use it for |
|---|---|
| Dropbox: Inside the Magic Pocket | large-scale storage design, immutable blocks, durability/cost tradeoffs |
| Meta: TAO | social-graph access patterns and read-heavy feed-adjacent workloads |
| Cloudflare rate limiting docs | endpoint-specific rate-limiting behavior and enforcement knobs |
| Bitly URL shortener | real product constraints for redirects, aliases, and analytics |
| Matrix Client-Server API | chat sync, message flow, and multi-device state expectations |
Completion Standard
- At least three design briefs are completed.
- Each design includes estimates before components.
- At least one design names a hot-key/skew problem.
- At least one design includes degradation behavior.