Back-of-Envelope Estimation

What This Concept Is

Back-of-envelope estimation is the habit of turning vague scale words ("many users", "large", "fast") into numbers you can design against, without a calculator, in under five minutes.

You need fluency in five classes of numbers:

Traffic: daily active users, requests per day, QPS average, QPS peak.
Storage: bytes per item, items per day, total at one year.
Bandwidth: bytes per request × QPS, both directions.
Memory: bytes per cached item × cached items.
Latency budget: per-step time budgets that must sum under the P99 target.

The two cheat sheets you carry in your head are the powers-of-two table (10 -> 1K, 20 -> 1M, 30 -> 1B, 40 -> 1T bytes) and Jeff Dean's latency numbers (L1 ≈ 0.5 ns, memory ≈ 100 ns, SSD random read ≈ 150 µs, intra-DC round trip ≈ 500 µs, cross-continent round trip ≈ 150 ms).

A useful mental split: Dean's numbers tell you what one machine can physically do; powers-of-two tell you when you have crossed into a new architectural regime. Crossing 1 GB -> 1 TB changes the storage conversation (does it fit on one node?); 1 K -> 1 M QPS changes the server count and often the protocol; 1 ms -> 100 ms changes whether the call can be synchronous at all. Estimation is not arithmetic for its own sake; it is scanning for regime crossings.

Why It Matters Here

Numbers are how designs are argued. Without them:

"We need a cache" becomes unfalsifiable.
"This database will work" has no size context.
"We need to shard" becomes a guess instead of a decision.

With numbers, you can say: "The hot index is 400 GB, which does not fit in a single 128 GB instance's RAM; we need either to shard or to move to an SSD-backed layer with a memory-resident hot set."

Worked example for a social feed system.

Assumptions (proposed, then confirmed):

1 billion daily active users (1 B DAU).
Each user posts on average 10 items per day.
Average post size: 1 KB (text + small metadata; images are links).
Each user reads roughly 100 items per day.
Peak traffic is 3× average.
1 year retention for the feed hot set; full history in cold storage.

Write traffic:

Writes/day = 1 B × 10 = 10 B writes/day.
Writes/sec average = 10 B / 86,400 ≈ 1.2 × 10⁵ writes/sec (~120K).
Peak writes/sec ≈ 360K.

Read traffic:

Reads/day = 1 B × 100 = 100 B reads/day.
Reads/sec average ≈ 1.2 × 10⁶ reads/sec (~1.2M).
Peak reads/sec ≈ 3.6M.

Storage:

Raw posts/day = 10 B × 1 KB = 10 TB/day.
Per year = 10 TB × 365 ≈ 3.65 PB/year raw.
With 3× replication and indexes: roughly 10-12 PB/year.

Bandwidth:

Write egress (server-side) ≈ 120K × 1 KB = 120 MB/sec average.
Read egress ≈ 1.2M × 1 KB = 1.2 GB/sec average ingress from caches.

Cache sizing (80/20 rule):

If 20% of recent posts serve 80% of reads, cache 20% of one day's posts.
Cache size ≈ 0.2 × 10 TB = 2 TB distributed cache for the hot set.

Latency budget for a 200 ms P99 feed-load response:

Client -> edge: ~20 ms (TLS + geo)
Edge -> region: ~30 ms
Auth + routing: ~10 ms
Feed service -> cache lookup: ~5 ms (intra-DC, often < 1 ms; be generous)
Fan-in of 20 followee timelines: ~30 ms (parallel)
Rendering + serialization: ~20 ms
Return trip: ~30 ms
Budget used: ~145 ms. Slack: ~55 ms for GC pauses, retries, slow paths.

Every one of these numbers is defensible to an order of magnitude with three seconds of arithmetic. That is the bar.

Concrete Example 2: URL Shortener Latency Budget

For a 100 ms global P99 on redirect:

Client -> nearest PoP (TLS + geo): 20 ms.
CDN cache hit check: 5 ms. If hit -> 301 response -> total 45 ms, 55 ms slack.
On CDN miss, PoP -> regional origin: 20-30 ms.
Regional LB -> redirect service: 1 ms.
Redirect service -> Redis cache hit: 2 ms.
On Redis miss -> DB lookup: 5-10 ms (SSD random read + single index lookup).
Response serialization + return: 20 ms (mirrors ingress).
Budget on cache hit: ~70-80 ms. Budget on DB miss: ~95 ms. On both misses + DC round trip: blown.

This single sheet tells you why the redirect architecture must be cache-first, why the cache miss rate has to stay under ~2%, and why the DB must be regionally present (not just globally replicated). The numbers are the architecture.

Common Confusion / Misconceptions

"I will be precise later." You will not. Precision is noise at this stage; order of magnitude is signal. 10⁵ vs 10⁸ changes the architecture; 1.2M vs 1.3M does not.

"The interviewer will give me the numbers." They will give you one and expect you to derive the rest. "1 B DAU" is not an answer; it is a starting number. You must convert it to QPS, storage, bandwidth yourself.

"I don't need the latency budget." You do. P99 targets get missed one step at a time. The only way to know a design meets 200 ms is to write down where the 200 ms goes.

"Dean's numbers are too old to trust." The ratios are not. Disk is ~10⁴ slower than RAM; intra-DC network is ~10² slower than RAM; cross-continent is ~10² slower than intra-DC. Hardware has gotten faster, but the ratios and regime boundaries hold.

How To Use It

On the board, always produce five lines:

QPS_avg = users × actions_per_user / 86,400
QPS_peak ≈ 3 × QPS_avg (or the interviewer's number)
Storage/day = writes/day × bytes/item × replication × (1 + index overhead)
Cache_size ≈ 0.2 × hot_period × storage/day (apply the 80/20 heuristic)
Latency_budget = sum of each step; reserve 25% slack

Then annotate: any line that crosses a power-of-two threshold (1 GB -> 1 TB, 1K QPS -> 1M QPS, 1 ms -> 100 ms) is a design-change trigger. Circle it.

Transfer / Where This Shows Up Later

S8M3 (data patterns) translates these numbers into partition counts, replication factors, and backup cadence.
S8M4 (scale/reliability/performance) treats the latency budget as an SLO and runs load tests against it; your back-of-envelope becomes the p99 latency SLI.
S9 (cloud + DevOps) uses these numbers for cloud capacity planning, autoscaling policies, and cost modelling. An instance type decision is downstream of QPS/memory/storage estimates.
S10 capstone + interviews: estimation questions are a frequent "weed-out" step in staff-level interviews. A candidate who cannot derive 1 B DAU -> 120 K writes/sec in ten seconds is visibly juniorized regardless of the rest of the design.

Check Yourself

Roughly how many seconds are in a day? (You must know this to two decimals.)
1 KB × 1M writes/sec = how many MB/sec? How much per day, to the nearest TB?
What is a 99.9% SLO in minutes of allowed downtime per 30-day month?
If your P99 is 200 ms and one network round trip inside a datacenter costs 0.5 ms, how many serial intra-DC calls can you afford?
At what storage-per-day threshold would you consider moving from a single primary to a sharded store? Defend with one sentence.

Mini Drill or Application

For each prompt, write all five estimation lines in under five minutes:

Design a chat system with 500 M DAU sending 40 messages/day, 200 bytes each.
Design a video-sharing service with 100 M DAU watching 10 videos/day at 2 MB/min average, 3 minutes average.
Design a search index for 10 B documents at 1 KB of indexed text each, 10 K QPS query load.
Design a rate limiter handling 10 M QPS across 50 M users with a sliding-window counter.

For each, state which number would most change if the DAU were 10× and whether that multiplier crosses an architectural regime boundary.

Read This Only If Stuck

System Design Primer: Appendix -- Powers of two and latency numbers -- primary estimation reference with Dean's numbers in a digestible table.
System Design Primer: Performance vs scalability -- clarifies which of the five estimation classes governs each SLO.
System Design Primer: Latency vs throughput -- short chunk that forces the distinction.
Fundamentals: Measuring architecture characteristics -- how to turn vague SLOs into numbers a fitness function can check.
Fundamentals: Architecture characteristics defined -- characteristics that estimation supports.
Jeff Dean -- Latency Numbers Every Programmer Should Know (jboner gist) -- canonical source; the numbers every estimation relies on.
Amazon Builders' Library -- Timeouts, retries, and backoff with jitter -- real-world latency-budget reasoning at Amazon scale; shows how P99 timeouts are derived.
High Scalability -- case studies where "the numbers" drove a specific scaling decision.

What This Concept Is​

Why It Matters Here​

Concrete Example: Social Feed​

Concrete Example 2: URL Shortener Latency Budget​

Common Confusion / Misconceptions​

How To Use It​

Transfer / Where This Shows Up Later​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​