Design Interview Katas
Four classic system design problems walked through the full methodology. Each kata targets a 45-minute timed run. Expected output per kata is listed as phase-by-phase artifacts. Repeat each kata at least twice; on the second run, aim to finish in 35 minutes with better trade-off articulation.
Format for every kata:
- Time limit: 45 minutes (target 35 on second pass).
- Artifacts: framing lists, estimation lines, box diagram, 2 component deep-dives, data model, scale/failure walks, trade-off log, bottleneck/SPOF list.
- Self-score: at the end of each phase, write a one-sentence "did I hit the time box?" and, at the end of the session, a one-paragraph "what I would do differently next time".
Kata 1: URL Shortener
Prompt: Design a globally available URL shortening service. Users submit long URLs and receive a 7-character code; anyone hitting the code is redirected to the original URL.
Phase 1 -- Requirements and estimation (7 min)
- Functional: shorten, redirect, optional custom alias, optional expiration, per-owner list.
- Non-functional: 100 M new URLs/month (~40 writes/s avg, ~120 peak); reads dominate ~100:1 (~4 K reads/s avg, ~12 K peak); redirect P99 < 100 ms globally; 99.9% availability.
- Constraints: code ≤ 7 chars; non-guessable; case-sensitive; existing cloud account.
- Estimates:
- storage = 3.3 M × ~250 B / day ≈ 1 GB/day raw, ~2 TB / 5 years
- cache (hot set) ≈ 5 GB (last 30 days × 20%)
- latency budget (100 ms): edge 20, CDN lookup 5, app 10, cache 2, misc 10, return 30 -> 77 ms; slack 23 ms
- Hard parts: (1) unique, non-guessable 7-char ID generation at scale; (2) sub-100 ms P99 globally; (3) graceful handling of long-tail reads that miss the cache.
Phase 2 -- High-level design (12 min)
Boxes: Client -> CDN -> Regional LB -> Redirect Service (read) / Write API (create) -> Redis cache / ID Gen Service -> URL store (wide-column or KV) -> Kafka -> Analytics warehouse.
Key annotations: CDN caches the 301 for hot codes (TTL minutes); Redis absorbs the cache miss tier; write path goes to the store directly, then populates cache; analytics async; read and write services are deployed independently.
Phase 3 -- Deep dive (20 min)
- Deep-dive 1: ID Generator. Approach: base62 encoding of a 42-bit counter, with each app instance pre-allocating 10 K-sized ID ranges from a central allocator. Avoids centralized per-request coordination; collisions impossible within a range; allocator traffic is only the refill rate.
- Deep-dive 2: Data model. Table
shortspartitioned onshort_code, columnslong_url, owner_user_id, created_at, expires_at. Secondary tableshorts_by_ownerpartitioned onowner_user_id, clusteredcreated_at desc. No joins; two tables per access pattern. - Consistency: strong on create (cannot double-issue a code); eventual on
shorts_by_ownervia event stream; at-least-once analytics with de-dup on event_id.
Phase 4 -- Stress test and wrap-up (6 min)
- 10× walk: cache cluster scales horizontally; read replicas absorb DB reads; ID allocator refill rate rises but stays negligible.
- 100× walk: CDN absorbs 60% of reads; store partitioned on
short_code(8-16 shards); cache per region. - Failure walk: cache cluster dark -> DB must survive cache-cold (sized for this); DB primary fails -> per-shard failover within 30-60 s; ID allocator fails -> instances run out of IDs after their current range (bounded).
- Trade-offs (top 3):
- Wide-column KV over sharded MySQL because point-lookup dominates; cost: no ad-hoc SQL on shorts.
- Centralized allocator with ranges over per-instance generation with collision retry; cost: a slow path when a range is exhausted.
- Cache-aside over write-through; cost: brief staleness on alias updates.
- SPOFs: ID allocator (accepted; pre-allocated ranges make transient failure tolerable); cache cluster (per-region clusters).
Kata 2: Distributed Rate Limiter
Prompt: Design a rate-limiting service for an API gateway serving 10 M QPS across a global fleet. Limits are configurable per tenant and per route.
Phase 1 -- Requirements and estimation (7 min)
- Functional: evaluate (tenant, route, request) -> allow/deny with remaining count and reset time; support configurable limits (per second, per minute, per hour); expose observability.
- Non-functional: 10 M QPS global; P99 allow-decision < 5 ms; 99.99% availability; survives one region outage.
- Constraints: integrates with existing gateway as an in-process filter; tenant configs from existing config service (cached); Redis is available in every region.
- Estimates:
- 10 M QPS × 24 B counter check ≈ 240 MB/s of counter reads (per region, not global)
- 100 M tenant × route combos × 64 B state ≈ 6 GB hot state
- Drift bound: batch-sync every 10 ms -> <=10 ms × 10 M QPS / N-regions drift
- Hard parts: (1) low-latency allow/deny in the gateway's critical path; (2) accurate global counting under tight latency; (3) graceful behavior under region partition.
Phase 2 -- High-level design (12 min)
Boxes: Gateway filter (in-process token bucket) -> Local counter -> Regional counter store (Redis) -> Global reconciliation (async, Kafka) -> Config service (cached locally).
Key annotations: the allow-decision never crosses a network for the "happy" common case; only occasional sync to regional counter; only periodic reconciliation across regions.
Phase 3 -- Deep dive (20 min)
- Deep-dive 1: Token bucket algorithm. 64-byte state per (tenant, route); refill computed on each check by time-delta; local to the gateway instance; TTL evicts cold state. Decision is allocate-a-token-or-reject, CPU-bound.
- Deep-dive 2: Regional sync. Gateway batches consumed-tokens every 10 ms or 1 K decisions, HINCRBY into Redis keyed by (tenant, route, window). Drift bounded to one batch interval × peak QPS; design explicitly accepts up to that drift.
- Consistency: local decisions are eventually consistent with regional counters; regional counters are eventually consistent with global counters; acceptable because the SLA is "approximately respect the limit within a bounded drift".
Phase 4 -- Stress test and wrap-up (6 min)
- 10× walk: gateway CPU; scale horizontally.
- 100× walk: Redis connection count per gateway; fix with connection pool + sharded Redis.
- Failure walk: Redis down in a region -> local buckets continue, drift grows; region isolation -> local limits still enforced, global reconciliation catches up on heal.
- Trade-offs:
- Token bucket over sliding window; cost: half the state, slightly looser interval semantics.
- Bounded drift over strict exactness; cost: acceptable over-counts under extreme failures, explicit in SLA.
- Local-first decision over Redis-first decision; cost: approximate under extreme burst across gateway instances.
- Bottlenecks: Redis sync batch size under spike; adaptive batching. SPOFs: global reconciliation service (accepted; regional decisions are authoritative).
Kata 3: Social News Feed
Prompt: Design a social news feed serving 1 B DAU. Users follow others; when a user opens the app they see recent posts from whom they follow, ranked by relevance.
Phase 1 -- Requirements and estimation (7 min)
- Functional: post, follow/unfollow, view home feed, view profile, (optional) engagement (like, comment).
- Non-functional: 1 B DAU; ~10 posts/day/user -> 10 B posts/day, ~120 K writes/s avg, ~360 K peak; ~100 reads/day/user -> 100 B reads/day, ~1.2 M/s avg, ~3.6 M peak; feed-load P99 < 200 ms; durability required for posts.
- Constraints: global footprint; per-user feed must feel recent (< 1 minute staleness typical).
- Estimates:
- raw storage: 10 B × 1 KB = 10 TB/day; 3× replication + indexes -> ~30 TB/day ingest
- cache hot set: 20% of last-day posts ≈ 2 TB distributed
- latency budget (200 ms): edge 20, regional routing 30, auth 10, cache 5, fan-in 30, ranking 30, render 20, return 30 -> 175 ms; slack 25
- Hard parts: (1) fan-out on write vs read under celebrity skew; (2) feed ranking at scale under latency; (3) write amplification / storage cost from fan-out.
Phase 2 -- High-level design (12 min)
Boxes: Clients -> CDN -> Global LB -> Regional LB -> Feed Service (read) / Post Service (write) -> Redis cache -> Timeline store (wide-column) -> Kafka -> Fan-out worker -> Ranking service -> Follow graph service.
Key annotations: fan-out on write for normal users; read-fanout for celebrities; cache hot tail of each user's timeline; async fan-out and async analytics; ranking runs server-side per request with cached per-post scores.
Phase 3 -- Deep dive (20 min)
- Deep-dive 1: Fan-out worker. Consumes
post.created; fetches followers paginated; decides materialize vs read-fanout by follower-count threshold (>100 K = celebrity); batch-writes 500 entries/batch into timeline store. - Deep-dive 2: Timeline data model.
user_timelinepartitioned byuser_id, clusteredcreated_at desc;postspartitioned by(author_id, day_bucket), clusteredcreated_at desc; intentional denormalization ofsummaryandauthor_handlefor list rendering. - Consistency: eventual across timelines (stale < 30 s target); read-your-write on own timeline via a per-user local cache on write.
Phase 4 -- Stress test and wrap-up (6 min)
- 10× walk: fan-out worker CPU rises; scale workers and raise celebrity threshold.
- 100× walk: per-user timeline partition grows fast for users who follow many active accounts; add time-bucket within
user_timelineand cold-tier older entries. - Failure walk: Kafka down -> fan-out stops, feeds become stale, no errors; ranking service down -> fallback to time-sort with a visible "degraded ranking" header; timeline store shard failover -> writes to that shard retry, users of that shard see 5-30 s blip.
- Trade-offs:
- Read-fanout for celebrities over always-materialize; cost: 30-50% slower feed reads for celebrity followers.
- Denormalized summaries in timeline rows over per-read joins; cost: extra storage, stale summaries on post edits.
- Eventual consistency on follower timelines over strong; cost: up to 30 s staleness.
- SPOFs: ranking service (accepted with fallback); follow-graph (replicate across regions).
Kata 4: Distributed Cache Service
Prompt: Design a managed distributed cache service (think Memcached/Redis-as-a-service) for internal teams. 1000+ tenants; per-tenant isolation; at-least-once delivery of invalidations; multi-region deployment.
Phase 1 -- Requirements and estimation (7 min)
- Functional: GET/SET/DEL/EXPIRE per tenant namespace; pub/sub for invalidations; capacity auto-scaling per tenant; per-tenant metrics and quotas.
- Non-functional: P99 GET < 1 ms within AZ; P99 SET < 5 ms; 99.99% availability per tenant; tenants isolated (noisy neighbor must not kill others).
- Constraints: commodity SSDs available for larger tiers; VPC-level network isolation.
- Estimates: assume 100 K QPS / tenant peak, 1000 tenants -> 100 M QPS peak across fleet; 100 TB total across tenants; typical key size 64 B, value 1 KB.
- Hard parts: (1) tenant isolation without per-tenant full cluster cost; (2) consistent hashing under node churn; (3) cross-region replication for tenants that require it.
Phase 2 -- High-level design (12 min)
Boxes: Client SDK -> Regional router (consistent-hash lookup) -> Cache node (in-memory + optional SSD tier) -> Invalidation pub/sub (regional topic) -> Cross-region replicator (async) -> Control plane (scaling, tenant onboarding, metering).
Key annotations: consistent hashing with virtual nodes; per-tenant quotas enforced at the router; tenants can opt into "cross-region replicated" at a cost; control plane is a separate deployment from the data plane.
Phase 3 -- Deep dive (20 min)
- Deep-dive 1: Consistent hashing with virtual nodes. Each physical node owns N virtual nodes; adding/removing a node moves ~1/N of keys; requests hit the router, which holds the hash ring locally (updated via a gossip or coordination store); handoff rules for moved keys.
- Deep-dive 2: Tenant isolation. Per-tenant quotas (memory, QPS); per-node cgroups/limits if co-tenanted; noisy-neighbor detection that reroutes or throttles tenants exceeding their quota; separate cache-node pools for "premium isolation" tenants who pay more.
- Consistency: eventual between replicas; best-effort invalidation across regions; clients must tolerate brief staleness; TTLs are primary correctness tool.
Phase 4 -- Stress test and wrap-up (6 min)
- 10× walk: router CPU; scale horizontally; keep ring version gossiped.
- 100× walk: per-tenant hot key saturates one node; implement hot-key detection + client-side caching of that key + rehashing of that key to additional nodes.
- Failure walk: node dies -> ring reconfigures, affected keys cold-miss for a window; region-wide replicator lag -> clients in other regions see older data (within TTL); control plane down -> data plane continues; no new tenants onboarded until restored.
- Trade-offs:
- Consistent hashing over explicit sharding; cost: slightly more complex operations, but cheap rebalancing.
- Best-effort cross-region over synchronous replication; cost: tenants must accept stale cross-region; synchronous available as premium.
- Shared cache-node pools over per-tenant clusters; cost: need strong isolation primitives; premium tier gets dedicated pools.
- SPOFs: control plane (not on the data plane path); regional router (active-active within region).
Completion Standard
- Each kata completed at least once, ending on time (45 min).
- At least one kata completed a second time at 35 min with stronger trade-off articulation.
- Each kata produced all phase-by-phase artifacts listed.
- You can explain the single most-likely-to-be-challenged design decision for each kata.
- Across the four katas, you recognized at least three recurring patterns (e.g., "celebrity skew", "range allocators", "per-tenant isolation") and named them as tools you now carry.