Estimation and Framing Lab
Retrieval Prompts
- Name the three separate lists the framing step produces, in order. Why are they three and not one?
- State the formula for QPS from DAU and actions-per-user.
- From memory: the powers-of-two values for 10, 20, 30, 40. The Jeff Dean latency numbers for memory, SSD random read, intra-DC round trip, and cross-continent round trip.
- What is the 80/20 heuristic for cache sizing, in one sentence?
- Define "hard part" in the sense used in Cluster 1 concept 3.
Compare and Distinguish
Separate these cleanly:
- functional requirement vs non-functional requirement vs constraint
- latency vs throughput
- read:write ratio vs cache hit ratio
- bottleneck vs single point of failure
- skew (in data distribution) vs hot key
For each pair, produce a one-sentence distinction you could state out loud on a whiteboard.
Common Mistake Check
For each statement, identify the error:
- "The system must support 1 billion users." (Missing what?)
- "P99 latency under 200 ms." (Acceptable as stated, or incomplete?)
- "We need a cache." (What makes this wrong as framing?)
- "Read:write is 10:1, so we need three read replicas." (What hidden assumption is this making?)
- "At 100× scale, we will shard." (Shard on what, why, and at what partition?)
Mini Application
Pick two prompts you have not used yet. For each, produce in 15 minutes:
- Three lists (functional, non-functional with numbers, constraints).
- Five estimation lines: QPS avg, QPS peak, storage/day, cache size, latency-budget breakdown.
- A ranked list of 2-3 hard parts.
- The single number that would most change the design if it were 10× larger.
Candidate prompts (or invent your own):
- Global distributed rate limiter for 100 K tenants.
- Photo sharing app with 500 M DAU and 3 photos/day uploaded.
- Real-time sports score feed for an app with 50 M DAU.
- Centralized API usage metering service for a multi-tenant cloud platform.
Numbers Drill (No Calculator)
Do these in under 10 minutes. Write order of magnitude only.
- 300 M DAU × 20 actions/day -> average QPS?
- 1 B posts/day × 2 KB each -> storage/day?
- 10 K QPS × 512 B request + 8 KB response -> bandwidth in MB/s each direction?
- 2 TB hot set × 3× replication -> total cluster memory needed?
- Latency budget of 150 ms with 30 ms network each way: how much budget remains for compute?
- 99.95% SLO -> minutes of allowed downtime per 30-day month?
- If your cache hit rate drops from 95% to 90%, what is the relative increase in origin load?
- Typical cross-continent round trip (Jeff Dean table) -> how many serial cross-continent hops can fit in a 400 ms P99 budget?
Evidence Check
This page is complete only if:
- you wrote down five estimation lines for each of your two chosen prompts, without a calculator
- you named 2-3 hard parts per prompt with one-sentence reasons
- you can recite the powers-of-two row (10, 20, 30, 40) from memory
- you identified at least one common-mistake statement in your own first-pass framing and corrected it