Skip to main content

Back-of-Envelope Estimation

What This Concept Is

Back-of-envelope estimation is the habit of turning vague scale words ("many users", "large", "fast") into numbers you can design against, without a calculator, in under five minutes.

You need fluency in five classes of numbers:

  • Traffic: daily active users, requests per day, QPS average, QPS peak.
  • Storage: bytes per item, items per day, total at one year.
  • Bandwidth: bytes per request × QPS, both directions.
  • Memory: bytes per cached item × cached items.
  • Latency budget: per-step time budgets that must sum under the P99 target.

The two cheat sheets you carry in your head are the powers-of-two table (10 -> 1K, 20 -> 1M, 30 -> 1B, 40 -> 1T bytes) and Jeff Dean's latency numbers (L1 ≈ 0.5 ns, memory ≈ 100 ns, SSD random read ≈ 150 µs, intra-DC round trip ≈ 500 µs, cross-continent round trip ≈ 150 ms).

A useful mental split: Dean's numbers tell you what one machine can physically do; powers-of-two tell you when you have crossed into a new architectural regime. Crossing 1 GB -> 1 TB changes the storage conversation (does it fit on one node?); 1 K -> 1 M QPS changes the server count and often the protocol; 1 ms -> 100 ms changes whether the call can be synchronous at all. Estimation is not arithmetic for its own sake; it is scanning for regime crossings.

Why It Matters Here

Numbers are how designs are argued. Without them:

  • "We need a cache" becomes unfalsifiable.
  • "This database will work" has no size context.
  • "We need to shard" becomes a guess instead of a decision.

With numbers, you can say: "The hot index is 400 GB, which does not fit in a single 128 GB instance's RAM; we need either to shard or to move to an SSD-backed layer with a memory-resident hot set."

Concrete Example: Social Feed

Worked example for a social feed system.

Assumptions (proposed, then confirmed):

  • 1 billion daily active users (1 B DAU).
  • Each user posts on average 10 items per day.
  • Average post size: 1 KB (text + small metadata; images are links).
  • Each user reads roughly 100 items per day.
  • Peak traffic is 3× average.
  • 1 year retention for the feed hot set; full history in cold storage.

Write traffic:

  • Writes/day = 1 B × 10 = 10 B writes/day.
  • Writes/sec average = 10 B / 86,400 ≈ 1.2 × 10⁵ writes/sec (~120K).
  • Peak writes/sec ≈ 360K.

Read traffic:

  • Reads/day = 1 B × 100 = 100 B reads/day.
  • Reads/sec average ≈ 1.2 × 10⁶ reads/sec (~1.2M).
  • Peak reads/sec ≈ 3.6M.

Storage:

  • Raw posts/day = 10 B × 1 KB = 10 TB/day.
  • Per year = 10 TB × 365 ≈ 3.65 PB/year raw.
  • With 3× replication and indexes: roughly 10-12 PB/year.

Bandwidth:

  • Write egress (server-side) ≈ 120K × 1 KB = 120 MB/sec average.
  • Read egress ≈ 1.2M × 1 KB = 1.2 GB/sec average ingress from caches.

Cache sizing (80/20 rule):

  • If 20% of recent posts serve 80% of reads, cache 20% of one day's posts.
  • Cache size ≈ 0.2 × 10 TB = 2 TB distributed cache for the hot set.

Latency budget for a 200 ms P99 feed-load response:

  • Client -> edge: ~20 ms (TLS + geo)
  • Edge -> region: ~30 ms
  • Auth + routing: ~10 ms
  • Feed service -> cache lookup: ~5 ms (intra-DC, often < 1 ms; be generous)
  • Fan-in of 20 followee timelines: ~30 ms (parallel)
  • Rendering + serialization: ~20 ms
  • Return trip: ~30 ms
  • Budget used: ~145 ms. Slack: ~55 ms for GC pauses, retries, slow paths.

Every one of these numbers is defensible to an order of magnitude with three seconds of arithmetic. That is the bar.

Concrete Example 2: URL Shortener Latency Budget

For a 100 ms global P99 on redirect:

  • Client -> nearest PoP (TLS + geo): 20 ms.
  • CDN cache hit check: 5 ms. If hit -> 301 response -> total 45 ms, 55 ms slack.
  • On CDN miss, PoP -> regional origin: 20-30 ms.
  • Regional LB -> redirect service: 1 ms.
  • Redirect service -> Redis cache hit: 2 ms.
  • On Redis miss -> DB lookup: 5-10 ms (SSD random read + single index lookup).
  • Response serialization + return: 20 ms (mirrors ingress).
  • Budget on cache hit: ~70-80 ms. Budget on DB miss: ~95 ms. On both misses + DC round trip: blown.

This single sheet tells you why the redirect architecture must be cache-first, why the cache miss rate has to stay under ~2%, and why the DB must be regionally present (not just globally replicated). The numbers are the architecture.

Common Confusion / Misconceptions

"I will be precise later." You will not. Precision is noise at this stage; order of magnitude is signal. 10⁵ vs 10⁸ changes the architecture; 1.2M vs 1.3M does not.

"The interviewer will give me the numbers." They will give you one and expect you to derive the rest. "1 B DAU" is not an answer; it is a starting number. You must convert it to QPS, storage, bandwidth yourself.

"I don't need the latency budget." You do. P99 targets get missed one step at a time. The only way to know a design meets 200 ms is to write down where the 200 ms goes.

"Dean's numbers are too old to trust." The ratios are not. Disk is ~10⁴ slower than RAM; intra-DC network is ~10² slower than RAM; cross-continent is ~10² slower than intra-DC. Hardware has gotten faster, but the ratios and regime boundaries hold.

How To Use It

On the board, always produce five lines:

  1. QPS_avg = users × actions_per_user / 86,400
  2. QPS_peak ≈ 3 × QPS_avg (or the interviewer's number)
  3. Storage/day = writes/day × bytes/item × replication × (1 + index overhead)
  4. Cache_size ≈ 0.2 × hot_period × storage/day (apply the 80/20 heuristic)
  5. Latency_budget = sum of each step; reserve 25% slack

Then annotate: any line that crosses a power-of-two threshold (1 GB -> 1 TB, 1K QPS -> 1M QPS, 1 ms -> 100 ms) is a design-change trigger. Circle it.

Transfer / Where This Shows Up Later

  • S8M3 (data patterns) translates these numbers into partition counts, replication factors, and backup cadence.
  • S8M4 (scale/reliability/performance) treats the latency budget as an SLO and runs load tests against it; your back-of-envelope becomes the p99 latency SLI.
  • S9 (cloud + DevOps) uses these numbers for cloud capacity planning, autoscaling policies, and cost modelling. An instance type decision is downstream of QPS/memory/storage estimates.
  • S10 capstone + interviews: estimation questions are a frequent "weed-out" step in staff-level interviews. A candidate who cannot derive 1 B DAU -> 120 K writes/sec in ten seconds is visibly juniorized regardless of the rest of the design.

Check Yourself

  1. Roughly how many seconds are in a day? (You must know this to two decimals.)
  2. 1 KB × 1M writes/sec = how many MB/sec? How much per day, to the nearest TB?
  3. What is a 99.9% SLO in minutes of allowed downtime per 30-day month?
  4. If your P99 is 200 ms and one network round trip inside a datacenter costs 0.5 ms, how many serial intra-DC calls can you afford?
  5. At what storage-per-day threshold would you consider moving from a single primary to a sharded store? Defend with one sentence.

Mini Drill or Application

For each prompt, write all five estimation lines in under five minutes:

  1. Design a chat system with 500 M DAU sending 40 messages/day, 200 bytes each.
  2. Design a video-sharing service with 100 M DAU watching 10 videos/day at 2 MB/min average, 3 minutes average.
  3. Design a search index for 10 B documents at 1 KB of indexed text each, 10 K QPS query load.
  4. Design a rate limiter handling 10 M QPS across 50 M users with a sliding-window counter.

For each, state which number would most change if the DAU were 10× and whether that multiplier crosses an architectural regime boundary.

Read This Only If Stuck