Performance Profiling Lab

Retrieval Prompts

State the USE method's three axes and which resource types they apply to.
State the RED method's three metrics and which service types they apply to.
State the Four Golden Signals and how they relate to USE and RED.
Define p50, p95, p99, and p999 in one sentence each.
State Amdahl's Law in its formula form and name the term that bounds maximum speedup.
State the Universal Scalability Law's two penalty terms and what each represents.

Separate these pairs cleanly in writing:

For each statement, identify the error:

"Our average response time is 50ms, so users are happy."
"We added 10 more CPUs and throughput only went up 2x - the load balancer must be broken."
"CPU is at 60% so we have 40% headroom."
"p95 of p99 across our ten servers was 200ms."
"At 95% CPU utilization we're making maximum use of the machine."

You have two services, both serving 10,000 requests/minute.

Service A latency distribution (milliseconds, 10 sampled buckets representing the distribution):

[20, 25, 28, 30, 32, 35, 40, 50, 60, 80]

Service B latency distribution:

[20, 22, 24, 26, 28, 30, 34, 40, 50, 626]

Compute the mean latency for each.
Compute p50, p90, and p99 for each (use the sorted sample directly).
One team proposes "the two services have essentially the same performance because their averages are close." Write a 3-sentence rebuttal grounded in the numbers.
For a user opening a page that fans out to 20 calls against the service, estimate the probability the slowest of those 20 calls sees the p99 latency under Service B. (Hint: tail-at-scale.)

For a single Kubernetes Node running a Postgres database, design:

The USE view (resources × U, S, E) -- list at least 5 resources and the specific metric you would graph for each.
The RED view for the Postgres query workload -- rate, errors, duration.
Which panels would you promote to SLO alerts? Which are investigation-only?
Cite the one concrete signal that would have told you Postgres is saturated before query latency degrades.

A workload has s = 0.05 serial fraction (Amdahl) or α = 0.05, β = 0.0005 in USL with reference throughput C(1) = 100 req/s.

Compute Amdahl's predicted speedup at N = 10 and N = 100.
Compute USL throughput at N = 10, 50, 100, 200. Identify the peak N.
Past the peak, USL predicts throughput decreases with more nodes. Give one real mechanism that could cause that in a distributed service.
Which law is more useful for predicting peak capacity, and which is more useful for sanity-checking a parallel algorithm's ceiling?

You are told "latency doubled over the last week." You have access to production CPU flame graphs from a week ago and today.

Describe what you would look for in the diff (new columns, widened columns, shifted flames).
Name one change category each for: (a) a new code path, (b) a dependency slowdown, (c) a lock contention issue.
What instrumentation would you add before the next deploy to shorten your next such investigation?

This practice page is complete only if you can:

Compute percentiles from a small sample and explain why averaging them across shards is invalid.
Draw USE and RED dashboards for a real service from memory.
Apply Amdahl and USL numerically to a proposed scale-out plan and identify the regime change.
Tell the difference between "more CPU" and "faster CPU" for a given bottleneck.