Skip to main content

Module 1: System Design Methodology: Case Studies

These case studies train the system-design loop: clarify, estimate, identify hard parts, choose a high-level design, stress it, and defend tradeoffs with evidence.


Case Study 1: Dropbox Magic Pocket As Storage Design Method

Scenario: Design a file-storage backend for billions of immutable chunks. The naive answer is "put files in object storage." The real design has chunking, metadata, durability, repair, encryption, and cost constraints.

Source anchor: Dropbox's engineering writeup Inside the Magic Pocket, which describes Magic Pocket as an immutable block storage system for encrypted file chunks.

Module concepts: requirements clarification, storage estimation, metadata/data split, durability, repair, cost tradeoff.

Wrong Approach

Start drawing boxes before naming the object model and durability target.

Better Approach

Clarify:

Object: immutable encrypted block
Max block size:
Metadata lookup:
Durability target:
Read/write ratio:
Repair/re-replication:
Cost per stored TB:

Then separate metadata path from blob-storage path and estimate chunk count, storage growth, replication overhead, and repair bandwidth.

Tradeoff Table

ChoiceGainCost
object storage providerlow ops burdenvendor cost/control tradeoff
custom storage fleetcost/control at scaleoperational complexity
large chunksless metadatapoor dedupe and partial update behavior
small chunksbetter dedupe/repairmore metadata and lookup pressure

Required Artifact

Produce a one-page design brief: requirements, rough capacity math, data model, hot path, failure path, and three rejected alternatives.


Case Study 2: News Feed Fanout Under Vague Requirements

Scenario: A prompt says "design a social feed." Some users follow 50 people; some follow 50 million. A single feed strategy will punish one side.

Source anchor: Meta's TAO: Facebook's Distributed Data Store for the Social Graph is a useful anchor for read-heavy social workloads where feed and graph access patterns dominate design choices.

Module concepts: clarifying questions, fanout-on-write, fanout-on-read, celebrity users, cache, backfill.

Wrong Approach

"Use Kafka and Redis" before defining read/write shape.

Better Approach

Classify users:

Normal producer:
fanout-on-write to follower timelines

Celebrity producer:
fanout-on-read or hybrid injection

Consumer read:
cached timeline with cursor pagination

Tradeoff Table

ChoiceGainCost
fanout-on-writefast readshuge celebrity write amplification
fanout-on-readcheap writesexpensive reads
hybridhandles skewmore logic and cache invalidation
ranked feedproduct relevancemodel freshness and explainability

Required Artifact

Write the fanout decision with QPS estimates, celebrity threshold, cache key, pagination contract, and backfill plan.


Case Study 3: Rate Limiter Design Beyond A Counter

Scenario: An API gateway must protect login, payments, and public search. Each endpoint has different abuse and fairness requirements.

Source anchor: Cloudflare's rate limiting rules documentation is a concrete anchor for matching rate-limiting behavior to endpoint-specific abuse and fairness requirements.

Module concepts: functional requirements, NFRs, data structure choice, edge/global consistency, failure behavior.

Wrong Approach

Use one global requests_per_minute counter for every endpoint.

Better Approach

Design per operation:

Login:
per account + per IP, strict abuse control

Payments:
per customer + idempotency key, low false positives

Search:
per API key, burst tolerance

Tradeoff Table

AlgorithmGainCost
fixed windowsimpleboundary bursts
sliding windowsmoothermore state
token bucketburst toleranttuning complexity
distributed global limiterconsistent policylatency and availability cost

Required Artifact

Create a limiter design: key dimensions, algorithm, storage, TTL, edge behavior, fail-open/fail-closed decision.


Case Study 4: URL Shortener With Abuse And Analytics

Scenario: A basic URL shortener is easy. A production shortener must handle custom aliases, abuse reports, analytics, deletion, and redirects with very low latency.

Source anchor: Bitly's URL shortener product documentation is a lightweight product anchor for the real operational concerns behind short-link systems: custom aliases, redirects, analytics, and link lifecycle.

Module concepts: API design, ID generation, read-heavy system, cache, analytics pipeline, abuse workflow.

Wrong Approach

Only design POST /shorten and GET /{code}.

Better Approach

Include:

Write path:
generate code, validate custom alias, store destination

Read path:
cache lookup, redirect, async analytics event

Safety:
malware/phishing flag
takedown workflow
deletion semantics

Tradeoff Table

ChoiceGainCost
random codesimple distributioncollision handling
sequential codecompactenumeration risk
cache redirectslow latencyinvalidation after takedown
async analyticsfast redirecteventual analytics

Required Artifact

Draw the read/write path and write a failure-mode table for cache outage, DB outage, analytics lag, and abuse takedown.


Case Study 5: Design Review That Misses The Hard Part

Scenario: A candidate designs a chat system and spends all the time on WebSocket servers. They never discuss offline delivery, ordering, fanout, presence, or multi-device state.

Source anchor: The Matrix Client-Server API is a useful concrete anchor for message send, sync, and multi-device conversation state in chat-like systems.

Module concepts: hard-part identification, tradeoff framing, state ownership, ordering, offline sync.

Wrong Approach

Optimize the obvious component while ignoring correctness questions.

Better Approach

Call out hard parts:

Message order:
per conversation sequence

Delivery:
online push + offline inbox

Presence:
ephemeral, best-effort

Multi-device:
read receipt and sync model

Tradeoff Table

ChoiceGainCost
global orderingsimple storyexpensive and unnecessary
per-conversation orderingmatches user modelpartitioning by conversation
best-effort presencescalableoccasionally stale
durable inboxreliable deliverystorage and sync complexity

Required Artifact

Write a "hard parts first" design outline with the top five risks and the component responsible for each.


Source Map

SourceUse it for
Dropbox: Inside the Magic Pocketlarge-scale storage design, immutable blocks, durability/cost tradeoffs
Meta: TAOsocial-graph access patterns and read-heavy feed-adjacent workloads
Cloudflare rate limiting docsendpoint-specific rate-limiting behavior and enforcement knobs
Bitly URL shortenerreal product constraints for redirects, aliases, and analytics
Matrix Client-Server APIchat sync, message flow, and multi-device state expectations

Completion Standard

  • At least three design briefs are completed.
  • Each design includes estimates before components.
  • At least one design names a hot-key/skew problem.
  • At least one design includes degradation behavior.