Design Interview Katas

Four classic system design problems walked through the full methodology. Each kata targets a 45-minute timed run. Expected output per kata is listed as phase-by-phase artifacts. Repeat each kata at least twice; on the second run, aim to finish in 35 minutes with better trade-off articulation.

Format for every kata:

Time limit: 45 minutes (target 35 on second pass).
Artifacts: framing lists, estimation lines, box diagram, 2 component deep-dives, data model, scale/failure walks, trade-off log, bottleneck/SPOF list.
Self-score: at the end of each phase, write a one-sentence "did I hit the time box?" and, at the end of the session, a one-paragraph "what I would do differently next time".

Kata 1: URL Shortener

Prompt: Design a globally available URL shortening service. Users submit long URLs and receive a 7-character code; anyone hitting the code is redirected to the original URL.

Phase 1 -- Requirements and estimation (7 min)

Functional: shorten, redirect, optional custom alias, optional expiration, per-owner list.
Non-functional: 100 M new URLs/month (~40 writes/s avg, ~120 peak); reads dominate ~100:1 (~4 K reads/s avg, ~12 K peak); redirect P99 < 100 ms globally; 99.9% availability.
Constraints: code ≤ 7 chars; non-guessable; case-sensitive; existing cloud account.
Estimates:
- storage = 3.3 M × ~250 B / day ≈ 1 GB/day raw, ~2 TB / 5 years
- cache (hot set) ≈ 5 GB (last 30 days × 20%)
- latency budget (100 ms): edge 20, CDN lookup 5, app 10, cache 2, misc 10, return 30 -> 77 ms; slack 23 ms
Hard parts: (1) unique, non-guessable 7-char ID generation at scale; (2) sub-100 ms P99 globally; (3) graceful handling of long-tail reads that miss the cache.

Phase 2 -- High-level design (12 min)

Boxes: Client -> CDN -> Regional LB -> Redirect Service (read) / Write API (create) -> Redis cache / ID Gen Service -> URL store (wide-column or KV) -> Kafka -> Analytics warehouse.

Key annotations: CDN caches the 301 for hot codes (TTL minutes); Redis absorbs the cache miss tier; write path goes to the store directly, then populates cache; analytics async; read and write services are deployed independently.

Phase 3 -- Deep dive (20 min)

Deep-dive 1: ID Generator. Approach: base62 encoding of a 42-bit counter, with each app instance pre-allocating 10 K-sized ID ranges from a central allocator. Avoids centralized per-request coordination; collisions impossible within a range; allocator traffic is only the refill rate.
Deep-dive 2: Data model. Table shorts partitioned on short_code, columns long_url, owner_user_id, created_at, expires_at. Secondary table shorts_by_owner partitioned on owner_user_id, clustered created_at desc. No joins; two tables per access pattern.
Consistency: strong on create (cannot double-issue a code); eventual on shorts_by_owner via event stream; at-least-once analytics with de-dup on event_id.

Phase 4 -- Stress test and wrap-up (6 min)

10× walk: cache cluster scales horizontally; read replicas absorb DB reads; ID allocator refill rate rises but stays negligible.
100× walk: CDN absorbs 60% of reads; store partitioned on short_code (8-16 shards); cache per region.
Failure walk: cache cluster dark -> DB must survive cache-cold (sized for this); DB primary fails -> per-shard failover within 30-60 s; ID allocator fails -> instances run out of IDs after their current range (bounded).
Trade-offs (top 3):
1. Wide-column KV over sharded MySQL because point-lookup dominates; cost: no ad-hoc SQL on shorts.
2. Centralized allocator with ranges over per-instance generation with collision retry; cost: a slow path when a range is exhausted.
3. Cache-aside over write-through; cost: brief staleness on alias updates.
SPOFs: ID allocator (accepted; pre-allocated ranges make transient failure tolerable); cache cluster (per-region clusters).

Kata 2: Distributed Rate Limiter

Prompt: Design a rate-limiting service for an API gateway serving 10 M QPS across a global fleet. Limits are configurable per tenant and per route.

Phase 1 -- Requirements and estimation (7 min)

Functional: evaluate (tenant, route, request) -> allow/deny with remaining count and reset time; support configurable limits (per second, per minute, per hour); expose observability.
Non-functional: 10 M QPS global; P99 allow-decision < 5 ms; 99.99% availability; survives one region outage.
Constraints: integrates with existing gateway as an in-process filter; tenant configs from existing config service (cached); Redis is available in every region.
Estimates:
- 10 M QPS × 24 B counter check ≈ 240 MB/s of counter reads (per region, not global)
- 100 M tenant × route combos × 64 B state ≈ 6 GB hot state
- Drift bound: batch-sync every 10 ms -> <=10 ms × 10 M QPS / N-regions drift
Hard parts: (1) low-latency allow/deny in the gateway's critical path; (2) accurate global counting under tight latency; (3) graceful behavior under region partition.

Phase 2 -- High-level design (12 min)

Boxes: Gateway filter (in-process token bucket) -> Local counter -> Regional counter store (Redis) -> Global reconciliation (async, Kafka) -> Config service (cached locally).

Key annotations: the allow-decision never crosses a network for the "happy" common case; only occasional sync to regional counter; only periodic reconciliation across regions.

Phase 3 -- Deep dive (20 min)

Deep-dive 1: Token bucket algorithm. 64-byte state per (tenant, route); refill computed on each check by time-delta; local to the gateway instance; TTL evicts cold state. Decision is allocate-a-token-or-reject, CPU-bound.
Deep-dive 2: Regional sync. Gateway batches consumed-tokens every 10 ms or 1 K decisions, HINCRBY into Redis keyed by (tenant, route, window). Drift bounded to one batch interval × peak QPS; design explicitly accepts up to that drift.
Consistency: local decisions are eventually consistent with regional counters; regional counters are eventually consistent with global counters; acceptable because the SLA is "approximately respect the limit within a bounded drift".

Phase 4 -- Stress test and wrap-up (6 min)

10× walk: gateway CPU; scale horizontally.
100× walk: Redis connection count per gateway; fix with connection pool + sharded Redis.
Failure walk: Redis down in a region -> local buckets continue, drift grows; region isolation -> local limits still enforced, global reconciliation catches up on heal.
Trade-offs:
1. Token bucket over sliding window; cost: half the state, slightly looser interval semantics.
2. Bounded drift over strict exactness; cost: acceptable over-counts under extreme failures, explicit in SLA.
3. Local-first decision over Redis-first decision; cost: approximate under extreme burst across gateway instances.
Bottlenecks: Redis sync batch size under spike; adaptive batching. SPOFs: global reconciliation service (accepted; regional decisions are authoritative).

Prompt: Design a social news feed serving 1 B DAU. Users follow others; when a user opens the app they see recent posts from whom they follow, ranked by relevance.

Phase 1 -- Requirements and estimation (7 min)

Functional: post, follow/unfollow, view home feed, view profile, (optional) engagement (like, comment).
Non-functional: 1 B DAU; ~10 posts/day/user -> 10 B posts/day, ~120 K writes/s avg, ~360 K peak; ~100 reads/day/user -> 100 B reads/day, ~1.2 M/s avg, ~3.6 M peak; feed-load P99 < 200 ms; durability required for posts.
Constraints: global footprint; per-user feed must feel recent (< 1 minute staleness typical).
Estimates:
- raw storage: 10 B × 1 KB = 10 TB/day; 3× replication + indexes -> ~30 TB/day ingest
- cache hot set: 20% of last-day posts ≈ 2 TB distributed
- latency budget (200 ms): edge 20, regional routing 30, auth 10, cache 5, fan-in 30, ranking 30, render 20, return 30 -> 175 ms; slack 25
Hard parts: (1) fan-out on write vs read under celebrity skew; (2) feed ranking at scale under latency; (3) write amplification / storage cost from fan-out.

Phase 2 -- High-level design (12 min)

Boxes: Clients -> CDN -> Global LB -> Regional LB -> Feed Service (read) / Post Service (write) -> Redis cache -> Timeline store (wide-column) -> Kafka -> Fan-out worker -> Ranking service -> Follow graph service.

Key annotations: fan-out on write for normal users; read-fanout for celebrities; cache hot tail of each user's timeline; async fan-out and async analytics; ranking runs server-side per request with cached per-post scores.

Phase 3 -- Deep dive (20 min)

Deep-dive 1: Fan-out worker. Consumes post.created; fetches followers paginated; decides materialize vs read-fanout by follower-count threshold (>100 K = celebrity); batch-writes 500 entries/batch into timeline store.
Deep-dive 2: Timeline data model. user_timeline partitioned by user_id, clustered created_at desc; posts partitioned by (author_id, day_bucket), clustered created_at desc; intentional denormalization of summary and author_handle for list rendering.
Consistency: eventual across timelines (stale < 30 s target); read-your-write on own timeline via a per-user local cache on write.

Phase 4 -- Stress test and wrap-up (6 min)

10× walk: fan-out worker CPU rises; scale workers and raise celebrity threshold.
100× walk: per-user timeline partition grows fast for users who follow many active accounts; add time-bucket within user_timeline and cold-tier older entries.
Failure walk: Kafka down -> fan-out stops, feeds become stale, no errors; ranking service down -> fallback to time-sort with a visible "degraded ranking" header; timeline store shard failover -> writes to that shard retry, users of that shard see 5-30 s blip.
Trade-offs:
1. Read-fanout for celebrities over always-materialize; cost: 30-50% slower feed reads for celebrity followers.
2. Denormalized summaries in timeline rows over per-read joins; cost: extra storage, stale summaries on post edits.
3. Eventual consistency on follower timelines over strong; cost: up to 30 s staleness.
SPOFs: ranking service (accepted with fallback); follow-graph (replicate across regions).

Kata 4: Distributed Cache Service

Prompt: Design a managed distributed cache service (think Memcached/Redis-as-a-service) for internal teams. 1000+ tenants; per-tenant isolation; at-least-once delivery of invalidations; multi-region deployment.

Phase 1 -- Requirements and estimation (7 min)

Functional: GET/SET/DEL/EXPIRE per tenant namespace; pub/sub for invalidations; capacity auto-scaling per tenant; per-tenant metrics and quotas.
Non-functional: P99 GET < 1 ms within AZ; P99 SET < 5 ms; 99.99% availability per tenant; tenants isolated (noisy neighbor must not kill others).
Constraints: commodity SSDs available for larger tiers; VPC-level network isolation.
Estimates: assume 100 K QPS / tenant peak, 1000 tenants -> 100 M QPS peak across fleet; 100 TB total across tenants; typical key size 64 B, value 1 KB.
Hard parts: (1) tenant isolation without per-tenant full cluster cost; (2) consistent hashing under node churn; (3) cross-region replication for tenants that require it.

Phase 2 -- High-level design (12 min)

Boxes: Client SDK -> Regional router (consistent-hash lookup) -> Cache node (in-memory + optional SSD tier) -> Invalidation pub/sub (regional topic) -> Cross-region replicator (async) -> Control plane (scaling, tenant onboarding, metering).

Key annotations: consistent hashing with virtual nodes; per-tenant quotas enforced at the router; tenants can opt into "cross-region replicated" at a cost; control plane is a separate deployment from the data plane.

Phase 3 -- Deep dive (20 min)

Deep-dive 1: Consistent hashing with virtual nodes. Each physical node owns N virtual nodes; adding/removing a node moves ~1/N of keys; requests hit the router, which holds the hash ring locally (updated via a gossip or coordination store); handoff rules for moved keys.
Deep-dive 2: Tenant isolation. Per-tenant quotas (memory, QPS); per-node cgroups/limits if co-tenanted; noisy-neighbor detection that reroutes or throttles tenants exceeding their quota; separate cache-node pools for "premium isolation" tenants who pay more.
Consistency: eventual between replicas; best-effort invalidation across regions; clients must tolerate brief staleness; TTLs are primary correctness tool.

Phase 4 -- Stress test and wrap-up (6 min)

10× walk: router CPU; scale horizontally; keep ring version gossiped.
100× walk: per-tenant hot key saturates one node; implement hot-key detection + client-side caching of that key + rehashing of that key to additional nodes.
Failure walk: node dies -> ring reconfigures, affected keys cold-miss for a window; region-wide replicator lag -> clients in other regions see older data (within TTL); control plane down -> data plane continues; no new tenants onboarded until restored.
Trade-offs:
1. Consistent hashing over explicit sharding; cost: slightly more complex operations, but cheap rebalancing.
2. Best-effort cross-region over synchronous replication; cost: tenants must accept stale cross-region; synchronous available as premium.
3. Shared cache-node pools over per-tenant clusters; cost: need strong isolation primitives; premium tier gets dedicated pools.
SPOFs: control plane (not on the data plane path); regional router (active-active within region).

Completion Standard

Each kata completed at least once, ending on time (45 min).
At least one kata completed a second time at 35 min with stronger trade-off articulation.
Each kata produced all phase-by-phase artifacts listed.
You can explain the single most-likely-to-be-challenged design decision for each kata.
Across the four katas, you recognized at least three recurring patterns (e.g., "celebrity skew", "range allocators", "per-tenant isolation") and named them as tools you now carry.

Kata 1: URL Shortener​

Phase 1 -- Requirements and estimation (7 min)​

Phase 2 -- High-level design (12 min)​

Phase 3 -- Deep dive (20 min)​

Phase 4 -- Stress test and wrap-up (6 min)​

Kata 2: Distributed Rate Limiter​

Phase 1 -- Requirements and estimation (7 min)​

Phase 2 -- High-level design (12 min)​

Phase 3 -- Deep dive (20 min)​

Phase 4 -- Stress test and wrap-up (6 min)​

Kata 3: Social News Feed​

Phase 1 -- Requirements and estimation (7 min)​

Phase 2 -- High-level design (12 min)​

Phase 3 -- Deep dive (20 min)​

Phase 4 -- Stress test and wrap-up (6 min)​

Kata 4: Distributed Cache Service​

Phase 1 -- Requirements and estimation (7 min)​

Phase 2 -- High-level design (12 min)​

Phase 3 -- Deep dive (20 min)​

Phase 4 -- Stress test and wrap-up (6 min)​

Completion Standard​

Kata 1: URL Shortener

Phase 1 -- Requirements and estimation (7 min)

Phase 2 -- High-level design (12 min)

Phase 3 -- Deep dive (20 min)

Phase 4 -- Stress test and wrap-up (6 min)

Kata 2: Distributed Rate Limiter

Phase 1 -- Requirements and estimation (7 min)

Phase 2 -- High-level design (12 min)

Phase 3 -- Deep dive (20 min)

Phase 4 -- Stress test and wrap-up (6 min)

Kata 3: Social News Feed

Phase 1 -- Requirements and estimation (7 min)

Phase 2 -- High-level design (12 min)

Phase 3 -- Deep dive (20 min)

Phase 4 -- Stress test and wrap-up (6 min)

Kata 4: Distributed Cache Service

Phase 1 -- Requirements and estimation (7 min)

Phase 2 -- High-level design (12 min)

Phase 3 -- Deep dive (20 min)

Phase 4 -- Stress test and wrap-up (6 min)

Completion Standard