Back-of-Envelope Estimation
What This Concept Is
Back-of-envelope estimation is the habit of turning vague scale words ("many users", "large", "fast") into numbers you can design against, without a calculator, in under five minutes.
You need fluency in five classes of numbers:
- Traffic: daily active users, requests per day, QPS average, QPS peak.
- Storage: bytes per item, items per day, total at one year.
- Bandwidth: bytes per request × QPS, both directions.
- Memory: bytes per cached item × cached items.
- Latency budget: per-step time budgets that must sum under the P99 target.
The two cheat sheets you carry in your head are the powers-of-two table (10 -> 1K, 20 -> 1M, 30 -> 1B, 40 -> 1T bytes) and Jeff Dean's latency numbers (L1 ≈ 0.5 ns, memory ≈ 100 ns, SSD random read ≈ 150 µs, intra-DC round trip ≈ 500 µs, cross-continent round trip ≈ 150 ms).
A useful mental split: Dean's numbers tell you what one machine can physically do; powers-of-two tell you when you have crossed into a new architectural regime. Crossing 1 GB -> 1 TB changes the storage conversation (does it fit on one node?); 1 K -> 1 M QPS changes the server count and often the protocol; 1 ms -> 100 ms changes whether the call can be synchronous at all. Estimation is not arithmetic for its own sake; it is scanning for regime crossings.
Why It Matters Here
Numbers are how designs are argued. Without them:
- "We need a cache" becomes unfalsifiable.
- "This database will work" has no size context.
- "We need to shard" becomes a guess instead of a decision.
With numbers, you can say: "The hot index is 400 GB, which does not fit in a single 128 GB instance's RAM; we need either to shard or to move to an SSD-backed layer with a memory-resident hot set."
Concrete Example: Social Feed
Worked example for a social feed system.
Assumptions (proposed, then confirmed):
- 1 billion daily active users (1 B DAU).
- Each user posts on average 10 items per day.
- Average post size: 1 KB (text + small metadata; images are links).
- Each user reads roughly 100 items per day.
- Peak traffic is 3× average.
- 1 year retention for the feed hot set; full history in cold storage.
Write traffic:
- Writes/day = 1 B × 10 = 10 B writes/day.
- Writes/sec average = 10 B / 86,400 ≈ 1.2 × 10⁵ writes/sec (~120K).
- Peak writes/sec ≈ 360K.
Read traffic:
- Reads/day = 1 B × 100 = 100 B reads/day.
- Reads/sec average ≈ 1.2 × 10⁶ reads/sec (~1.2M).
- Peak reads/sec ≈ 3.6M.
Storage:
- Raw posts/day = 10 B × 1 KB = 10 TB/day.
- Per year = 10 TB × 365 ≈ 3.65 PB/year raw.
- With 3× replication and indexes: roughly 10-12 PB/year.
Bandwidth:
- Write egress (server-side) ≈ 120K × 1 KB = 120 MB/sec average.
- Read egress ≈ 1.2M × 1 KB = 1.2 GB/sec average ingress from caches.
Cache sizing (80/20 rule):
- If 20% of recent posts serve 80% of reads, cache 20% of one day's posts.
- Cache size ≈ 0.2 × 10 TB = 2 TB distributed cache for the hot set.
Latency budget for a 200 ms P99 feed-load response:
- Client -> edge: ~20 ms (TLS + geo)
- Edge -> region: ~30 ms
- Auth + routing: ~10 ms
- Feed service -> cache lookup: ~5 ms (intra-DC, often < 1 ms; be generous)
- Fan-in of 20 followee timelines: ~30 ms (parallel)
- Rendering + serialization: ~20 ms
- Return trip: ~30 ms
- Budget used: ~145 ms. Slack: ~55 ms for GC pauses, retries, slow paths.
Every one of these numbers is defensible to an order of magnitude with three seconds of arithmetic. That is the bar.
Concrete Example 2: URL Shortener Latency Budget
For a 100 ms global P99 on redirect:
- Client -> nearest PoP (TLS + geo): 20 ms.
- CDN cache hit check: 5 ms. If hit -> 301 response -> total 45 ms, 55 ms slack.
- On CDN miss, PoP -> regional origin: 20-30 ms.
- Regional LB -> redirect service: 1 ms.
- Redirect service -> Redis cache hit: 2 ms.
- On Redis miss -> DB lookup: 5-10 ms (SSD random read + single index lookup).
- Response serialization + return: 20 ms (mirrors ingress).
- Budget on cache hit: ~70-80 ms. Budget on DB miss: ~95 ms. On both misses + DC round trip: blown.
This single sheet tells you why the redirect architecture must be cache-first, why the cache miss rate has to stay under ~2%, and why the DB must be regionally present (not just globally replicated). The numbers are the architecture.
Common Confusion / Misconceptions
"I will be precise later." You will not. Precision is noise at this stage; order of magnitude is signal. 10⁵ vs 10⁸ changes the architecture; 1.2M vs 1.3M does not.
"The interviewer will give me the numbers." They will give you one and expect you to derive the rest. "1 B DAU" is not an answer; it is a starting number. You must convert it to QPS, storage, bandwidth yourself.
"I don't need the latency budget." You do. P99 targets get missed one step at a time. The only way to know a design meets 200 ms is to write down where the 200 ms goes.
"Dean's numbers are too old to trust." The ratios are not. Disk is ~10⁴ slower than RAM; intra-DC network is ~10² slower than RAM; cross-continent is ~10² slower than intra-DC. Hardware has gotten faster, but the ratios and regime boundaries hold.
How To Use It
On the board, always produce five lines:
QPS_avg = users × actions_per_user / 86,400QPS_peak ≈ 3 × QPS_avg(or the interviewer's number)Storage/day = writes/day × bytes/item × replication × (1 + index overhead)Cache_size ≈ 0.2 × hot_period × storage/day(apply the 80/20 heuristic)Latency_budget = sum of each step; reserve 25% slack
Then annotate: any line that crosses a power-of-two threshold (1 GB -> 1 TB, 1K QPS -> 1M QPS, 1 ms -> 100 ms) is a design-change trigger. Circle it.
Transfer / Where This Shows Up Later
- S8M3 (data patterns) translates these numbers into partition counts, replication factors, and backup cadence.
- S8M4 (scale/reliability/performance) treats the latency budget as an SLO and runs load tests against it; your back-of-envelope becomes the p99 latency SLI.
- S9 (cloud + DevOps) uses these numbers for cloud capacity planning, autoscaling policies, and cost modelling. An instance type decision is downstream of QPS/memory/storage estimates.
- S10 capstone + interviews: estimation questions are a frequent "weed-out" step in staff-level interviews. A candidate who cannot derive 1 B DAU -> 120 K writes/sec in ten seconds is visibly juniorized regardless of the rest of the design.
Check Yourself
- Roughly how many seconds are in a day? (You must know this to two decimals.)
- 1 KB × 1M writes/sec = how many MB/sec? How much per day, to the nearest TB?
- What is a 99.9% SLO in minutes of allowed downtime per 30-day month?
- If your P99 is 200 ms and one network round trip inside a datacenter costs 0.5 ms, how many serial intra-DC calls can you afford?
- At what storage-per-day threshold would you consider moving from a single primary to a sharded store? Defend with one sentence.
Mini Drill or Application
For each prompt, write all five estimation lines in under five minutes:
- Design a chat system with 500 M DAU sending 40 messages/day, 200 bytes each.
- Design a video-sharing service with 100 M DAU watching 10 videos/day at 2 MB/min average, 3 minutes average.
- Design a search index for 10 B documents at 1 KB of indexed text each, 10 K QPS query load.
- Design a rate limiter handling 10 M QPS across 50 M users with a sliding-window counter.
For each, state which number would most change if the DAU were 10× and whether that multiplier crosses an architectural regime boundary.
Read This Only If Stuck
- System Design Primer: Appendix -- Powers of two and latency numbers -- primary estimation reference with Dean's numbers in a digestible table.
- System Design Primer: Performance vs scalability -- clarifies which of the five estimation classes governs each SLO.
- System Design Primer: Latency vs throughput -- short chunk that forces the distinction.
- Fundamentals: Measuring architecture characteristics -- how to turn vague SLOs into numbers a fitness function can check.
- Fundamentals: Architecture characteristics defined -- characteristics that estimation supports.
- Jeff Dean -- Latency Numbers Every Programmer Should Know (jboner gist) -- canonical source; the numbers every estimation relies on.
- Amazon Builders' Library -- Timeouts, retries, and backoff with jitter -- real-world latency-budget reasoning at Amazon scale; shows how P99 timeouts are derived.
- High Scalability -- case studies where "the numbers" drove a specific scaling decision.