The Four-Phase Interview/Review Structure
What This Concept Is
A 45-minute system design interview (or a 60-minute architecture review) has four phases. Running them in order, on a timer, is how you avoid the "we only got halfway through" failure that sinks most candidates.
| Phase | Time (45 min session) | Goal | Artifacts produced |
|---|---|---|---|
| 1. Requirements | 5-7 min | Convert prompt -> functional + non-functional + constraints + back-of-envelope | A written list on the board |
| 2. High-level design | 10-12 min | Draw the 6-12 box diagram; commit to API surface, storage, caches/CDN/LB | Box diagram, labeled |
| 3. Deep dive | 15-20 min | Zoom into 2-3 components of reviewer's choice; decompose and stress-test | Data model sketches, algorithm sketches |
| 4. Wrap-up | 3-5 min | Trade-offs, bottlenecks, SPOFs, what you would do differently with more time | Trade-off list, open-questions list |
Times scale linearly for longer sessions. A 60-minute review gives you roughly 10 / 15 / 25 / 10. A 30-minute whiteboard screen compresses to 4 / 8 / 13 / 5.
What makes the four phases work is that each one produces a specific artifact that the next phase builds on. Phase 1 outputs a requirements list -- without it, phase 2 has no target. Phase 2 outputs a diagram -- without it, phase 3 has nothing to zoom into. Phase 3 outputs decisions and trade-offs -- without them, phase 4 is hand-waving. Skipping an artifact does not save time; it forces you to improvise the missing one later under worse conditions. A framing from Fundamentals of Software Architecture: the four phases correspond to architectural thinking at increasing granularity -- scope, structure, mechanism, critique. Senior reviewers grade the transitions at least as much as the content of any single phase.
The numeric budgets are not dogma; they are defaults you adjust by 1-2 minutes based on the prompt's profile. A prompt heavy on requirements (ambiguous product, unusual scale) can spend 8 minutes in phase 1; a prompt you have essentially seen before (top-10 classic) can compress phase 1 to 4 minutes and spend the extra in deep-dive. What does not change is the order and the existence of all four phases.
Why It Matters Here
This is the methodology-shaped version of everything in the first four clusters. Each cluster maps directly onto one phase:
- Cluster 1 -> Phase 1
- Cluster 2 -> Phase 2
- Cluster 3 -> Phase 3
- Cluster 4 -> Phase 3 under pressure + Phase 4
- Cluster 5 -> Phase 4 and the doc
Time-boxing is not optional. The most common failure is spending 25 minutes on requirements and having five minutes for high-level and zero for deep dive. Candidates who time-box win on designs that are narrower than the expansive candidate's, but deeper.
Concrete Example
45-minute interview on "Design a distributed rate limiter":
Phase 1 -- Requirements (minutes 0-6):
- Restate: "Token-bucket-style limiter across a multi-region API gateway. Per-user and per-route limits."
- Functional: allow/deny; configurable limits per tenant; expose consumed/remaining; do not undercount; do not over-count.
- Non-functional: 10 M QPS global; P99 allow-decision under 5 ms; 99.99% availability; survives one region outage.
- Constraints: must integrate with existing gateway (HTTP filter); per-tenant configs come from an existing config service.
- Numbers: 10 M QPS × 24 B per counter check ≈ 240 MB/s of counter reads; storage for 100 M tenant × route combos ≈ a few GB.
Phase 2 -- High-level design (minutes 6-18):
- Diagram: gateway filter -> local in-process token bucket -> async sync to regional counter store (Redis cluster) -> async reconciliation with global counter store.
- Named trade-off on the board: local bucket = fast and approximate; regional counter = accurate but ~1 ms slower; choice depends on per-tenant config.
Phase 3 -- Deep dive (minutes 18-38):
- Deep-dive 1: the local token bucket algorithm (sliding window vs fixed window vs token bucket; pick token bucket; 64-byte struct per (user, route); TTL-cleaned).
- Deep-dive 2: the regional counter sync (batch every 10 ms or 1 K decisions; use Redis HINCRBY; drift bounded to "one batch interval × max QPS").
- Stress test 1: 100× traffic; local path is CPU-bound; horizontal scaling is trivial.
- Stress test 2: region dies. Local buckets survive (stateless); regional counter in DR region takes over; bounded over-count during failover window.
Phase 4 -- Wrap-up (minutes 38-43):
- Trade-offs: accepted bounded over-count during region failover (why: user-impact small, availability bar high). Chose token bucket over sliding window because of 5× less state.
- Bottleneck: the Redis sync batch if QPS spikes; mitigation is adaptive batch sizing.
- SPOF: the config service is a soft SPOF; with cached configs for 5 min, the gateway continues on last-known.
- Open questions: cross-tenant fairness under abuse, explicit retry-after header semantics, circuit-breaker integration.
Ended on time, every phase hit.
Concrete Example 2: 60-Minute Architecture Review
Real review in an engineering org -- "Can we replace the in-process cache with Redis?"
Phase 1 -- Requirements (minutes 0-10):
- Reviewer (presenter) states what problem the change solves: in-process cache causes cold-start latency on deploy, and invalidation is broadcast-over-cluster which bursts CPU.
- Functional: same read API; invalidation via pub/sub; per-key TTL; optional negative caching.
- Non-functional: P99 read < 3 ms (currently 0.4 ms in-process); 99.95% availability; blast-radius of Redis outage is bounded.
- Constraints: existing Redis infra in-region (no new cluster); budget for ~200 GB working set; migration over 2 quarters.
Phase 2 -- High-level design (minutes 10-25):
- Diagram: app -> local L1 (small, in-process) -> L2 (regional Redis) -> origin DB. Fallback path on Redis outage returns to origin DB with adaptive-timeout protection.
- Named trade-off on the whiteboard: two-tier cache doubles coherency work; chosen because P99 latency cannot regress and L1 protects against Redis brownouts.
Phase 3 -- Deep dive (minutes 25-50):
- Deep-dive 1: invalidation protocol (pub/sub channel per table; L1 drops entry on message; L2 writes-through on updater). Discussion of lost-message handling -> periodic full-rebuild of small L1 acceptable.
- Deep-dive 2: failure walk -- Redis dies: app runs on L1 only; hit rate drops from 99% to 60%; origin DB load triples; origin is sized to survive this for 30 minutes (measured in load test).
- Stress test at 10× write rate: pub/sub channel saturates at ~2 M msg/s. Remediation: partition channels by table hash; add per-partition consumer.
Phase 4 -- Wrap-up (minutes 50-60):
- Trade-offs: accepted L1+L2 coherency complexity for P99 + availability targets; rejected "Redis only" (P99 regression); rejected "DB-level cache (pgbouncer + result cache)" because cross-region scope is needed.
- Bottlenecks: pub/sub channel; migrations-vs-rollback windows.
- SPOFs: Redis cluster is a soft SPOF; L1 + origin-survivable is the mitigation; documented.
- Open questions: per-tenant cache isolation for abuse prevention; schema-drift detection in the cache layer.
The review produces (a) a go/no-go on the change, (b) a short list of follow-up tickets, and (c) the trade-off log that goes into the ADR. Staying on structure means the go/no-go is informed, not improvised.
Common Confusion / Misconceptions
"Spending more time in phase 1 helps." It does not. 25 minutes of requirements gathering is a failure mode, not thoroughness. The target is 5-7 minutes. If the interviewer keeps expanding scope, time-box yourself and say "let me lock in what we have and design against it; we can loop back if time allows".
"The interviewer drives phase 3." Half-yes. The interviewer picks which component; you decide the shape of the deep dive. If they ask about X and X is straightforward, give a crisp answer and offer to move to the more interesting Y.
"Wrap-up is optional if we're running out of time." Wrap-up is the phase senior reviewers weight most heavily. It is where you show you can articulate trade-offs, name what is not done, and communicate open questions. Preserve the last 3-5 minutes at almost any cost.
"Interview structure differs from real reviews." Minor differences (in a real review you have more time, a pre-written doc, and more stakeholders), but the four phases map cleanly: requirements (problem statement), high-level (architecture section), deep dive (component sections), wrap-up (trade-offs + open questions).
"If I know the answer, I can skip phase 1." Almost always wrong. Skipping requirements deprives the reviewer of the shared target the rest of the design is evaluated against. They will read your design and grade it against their imagined requirements, which will not match yours. Even 2 minutes of "let me restate what I heard" buys calibration that lasts the rest of the session.
How To Use It
On a timer:
- Start a visible clock.
- Announce the time-box when you enter each phase: "Let's lock requirements, we have five minutes."
- If a phase is about to overrun, say so and offer to defer: "Let me move on; we can return to this if time allows."
- At the transition, restate where you are: "Phase 2 done; the diagram shows X, Y, Z. Want me to deep-dive on A or B?"
- Protect the wrap-up. At minute 40 of 45, stop whatever you are doing and wrap up.
Transfer / Where This Shows Up Later
- Cluster 5 concept 14 (trade-offs) supplies the content the wrap-up phase delivers.
- Cluster 5 concept 15 (design doc) mirrors the four phases as doc sections -- requirements -> high-level -> deep-dive -> trade-offs/wrap-up.
- S8M5 (technical leadership) applies this structure to ADR writing and architecture review ceremonies.
- S9 (cloud + DevOps) translates each phase into an operational artifact: requirements into SLOs, high-level into infrastructure-as-code modules, deep-dive into runbooks and dashboards, wrap-up into postmortem templates.
- S10 capstone/interviews: this is the interview. Running it on a timer 20+ times, on unfamiliar prompts, is the single most effective preparation for staff-level design screens.
Check Yourself
- If you have 25 minutes total for a screen, what are your phase budgets?
- If at minute 20 of 45 you are still in phase 2, what do you do next?
- What specifically goes in wrap-up that does not go in any other phase?
- Why does skipping phase 1 cost you credibility even if you know the answer?
Mini Drill or Application
On a real timer (phone, watch, tab timer), run a 45-minute design session solo on one unfamiliar prompt. Narrate out loud. At the end, score yourself:
- Did you finish phase 1 by minute 7?
- Did you hold phase 2 to 12 minutes?
- Did you deep-dive two components in phase 3?
- Did you leave 3-5 minutes for wrap-up?
- Is the board intelligible to someone who did not watch you draw it?
Repeat until all five yeses become routine.
Read This Only If Stuck
- System Design Primer: How to approach a system design interview -- the canonical phase breakdown.
- System Design Primer: System design interview questions -- prompts to rehearse the four phases against.
- System Design Primer: Real-world architectures -- what a fully-expanded deep-dive looks like.
- System Design Primer: Index of patterns (caches, LBs, etc.) -- vocabulary you can reach for quickly in phase 2.
- Fundamentals: Architectural thinking -- scoping, structure, mechanism, critique as stages.
- Fundamentals: Analyzing trade-offs -- content for the wrap-up phase.
- Fundamentals: Identifying architectural characteristics -- phase 1 as characteristic elicitation.
- ByteByteGo -- System Design Interview Framework -- Alex Xu's standard four-step interview framework.
- Pramp / Interviewing.io -- structured design practice -- prompts with timing and rubrics.
- Martin Fowler -- Software Architecture Guide -- organizing structure for reviews that last longer than 60 minutes.