Capacity Planning: Headroom, Peaks, and Growth Modeling

What This Concept Is

Capacity planning answers two questions:

How much capacity do I need right now to serve current load within the SLO?
How much capacity will I need six to twelve months from now, given a plausible growth curve?

The three inputs to a capacity number:

Mean load: the typical traffic over a representative period. Too easy to over-index on.
Peak load: the highest load observed in a recent cycle (daily peak, weekly peak, seasonal peak, event-driven peak like Black Friday).
Headroom: the gap between peak and capacity that lets the system absorb variance, spikes, and the latency cliff past 80% utilization.

Output: a capacity number in deploy-relevant units (cores, instances, partitions, IOPS) that you would defend to a skeptical reviewer.

Why It Matters Here

Under-provisioning is a latency outage waiting for the next peak. Over-provisioning is money you cannot spend elsewhere. The cost of getting capacity wrong is asymmetric: the business tolerates some waste but never tolerates an outage on the biggest sales day of the year.

Your capacity model also is your understanding of the system. Every coefficient you cannot justify (peak-to-mean ratio, headroom target, growth rate) is a gap between your mental model and reality that an incident will eventually surface.

Concrete Example

A web service has one month of traffic data.

Mean traffic (lambda_mean) = 500 req/s.
Peak traffic observed (lambda_peak) = 1,800 req/s (Tuesday 11am local, from a marketing push).
Peak-to-mean ratio: 1,800 / 500 = 3.6x.
Per-instance capacity (load-tested): 200 req/s at p99 = 300ms, which is the SLO.

Current capacity. At peak 1,800 req/s, naive sizing is 1800 / 200 = 9 instances. But we cannot run at 100%. Target 70% utilization (safely below the Little's Law cliff):

instances_needed = ceil(1800 / (200 * 0.7)) = ceil(12.86) = 13 instances

That is 13 instances to hit peak at 70% utilization - the headroom number.

Growth model. Weekly data shows traffic growing at 4% per week, compound. After 26 weeks:

growth_factor = 1.04 ^ 26 ≈ 2.77
lambda_peak_6mo ≈ 1800 * 2.77 ≈ 4,985 req/s
instances_needed_6mo = ceil(4985 / (200 * 0.7)) = 36 instances

So we need to go from 13 to 36 instances over six months. That drives the infrastructure budget and the work to ensure the service can actually scale to that count (Amdahl, USL, database capacity, config limits).

Worst-case model. A plausible marketing-driven spike is a one-day traffic multiplier of 2x on top of trend. At t = 6mo, peak becomes ~10,000 req/s; we would need 72 instances or automated scale-out within minutes of detection. Either design choice is defensible; "we'll handle it" is not.

Common Confusion / Misconception

"We have enough headroom." Measured where? If headroom is measured at the mean, it is a lie. Headroom must be computed at peak, with a peak-to-mean multiplier justified by data.

"We scale automatically, so capacity planning is obsolete." Auto-scaling buys you faster reaction, not infinite capacity. Every auto-scaler has lag (detect overload, start new instance, warm up), an upper bound (quotas, instance types), and a failure mode (cannot scale if the control plane is also overloaded). You still plan capacity; auto-scaling just lets you plan for the mean rather than the peak, and exposes the peak as a reaction problem.

"Linear extrapolation is enough." Rarely. Traffic growth is often geometric, step-function (after marketing events), or tied to external cycles (tax day, Black Friday, an election). Always model nominal and worst-case, and state the assumptions.

How To Use It

Measure peak, not mean. Capture daily, weekly, monthly peaks. Know the peak-to-mean ratio for your service.
Pick a headroom target. 70% at peak is the common default (past that, latency curve steepens). Stricter for latency-sensitive services, looser for batch.
Load test to find per-instance capacity. Include realistic request mix and realistic dependencies. "200 req/s in staging with a mocked DB" is not capacity; it is a marketing number.
Model growth. Fit a curve to at least three months of data. Extrapolate. Annotate assumptions ("assumes marketing push continues; assumes no new geo expansion").
Produce two numbers: nominal 6mo and worst-case 6mo. Document the difference.
Review quarterly. The model degrades as reality changes. An old capacity plan is a wrong capacity plan.

Check Yourself

Why is "we are at 40% CPU" not a complete capacity argument?
Name two sources of peak-to-mean variance that a weekly-average graph will hide.
What assumption should any growth extrapolation state explicitly?

Mini Drill or Application

Take traffic data (or a believable stand-in) for a service you know. Compute mean, peak (daily and weekly), peak-to-mean ratio. Pick a headroom target. Propose a 6mo capacity plan: nominal and worst-case. Write down the three assumptions most likely to be wrong.

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconception​

How To Use It​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​