Module 4: Scale, Reliability & Performance: Mistake Clinic

This clinic turns wrong moves into reusable judgment. Use it after each practice page and again before the quiz or checkpoint.

Module-Specific Mistake Radar

Start with these traps. Replace or extend them with real mistakes from your own work.

Mistake to look for	Where it shows up	Symptom	Repair evidence
Finishing Performance Profiling Lab with only a final answer	Performance Profiling Lab	The work has no failed case, trace, test, proof gap, or design stress point.	Add the smallest broken example and show the repair that changes the result.
Finishing Scaling Design Workshop with only a final answer	Scaling Design Workshop	The work has no failed case, trace, test, proof gap, or design stress point.	Add the smallest broken example and show the repair that changes the result.
Finishing Reliability and SLO Clinic with only a final answer	Reliability and SLO Clinic	The work has no failed case, trace, test, proof gap, or design stress point.	Add the smallest broken example and show the repair that changes the result.
Finishing Scale, Reliability, and Performance Katas with only a final answer	Scale, Reliability, and Performance Katas	The work has no failed case, trace, test, proof gap, or design stress point.	Add the smallest broken example and show the repair that changes the result.
Treating Latency, Throughput, Utilization, and the USE / RED / Four Golden Signals as vocabulary instead of a tool	Latency, Throughput, Utilization, and the USE / RED / Four Golden Signals	The explanation names the concept but cannot decide between two cases.	Write one example, one non-example, and the rule that separates them.
Treating Percentile Latency and Why Averages Lie as vocabulary instead of a tool	Percentile Latency and Why Averages Lie	The explanation names the concept but cannot decide between two cases.	Write one example, one non-example, and the rule that separates them.

Practice Mistake Checks

Pull any miss from these checks into your mistake log.

Performance Profiling Lab

Source: practice/01-performance-profiling-lab.md

For each statement, identify the error:

"Our average response time is 50ms, so users are happy."
"We added 10 more CPUs and throughput only went up 2x - the load balancer must be broken."
"CPU is at 60% so we have 40% headroom."
"p95 of p99 across our ten servers was 200ms."
"At 95% CPU utilization we're making maximum use of the machine."

Scaling Design Workshop

Source: practice/02-scaling-design-workshop.md

For each, identify the error:

"We made the service horizontally scalable by adding a load balancer."
"Sticky sessions are fine as long as the load balancer is smart."
"Write-behind is safe because we eventually write to the DB."
"We have a CDN, so we don't need any other caching."
"The cache was slow so we doubled its memory."

Reliability and SLO Clinic

Source: practice/03-reliability-and-slo-clinic.md

For each, identify the error:

"Our SLO is 99.999% because that's what AWS offers."
"Availability was 99.93% this month, so we're within the 99.9% SLO."
"We have redundancy across three servers, so correlated failure is impossible."
"Chaos engineering is just deliberately breaking things in production."
"The dashboard is green so nothing is wrong."

Repair Protocol

For each real mistake:

Reproduce the failure on the smallest example, trace, proof, query, command, or design sketch.
Name the hidden assumption.
Repair the artifact.
Save evidence that changed: failing then passing test, corrected proof step, revised diagram, safer command, benchmark, or review note.
Add one retrieval card beginning with Check... before... or Do not use... when....

Mistake Log

Date	Mistake	Symptom	Root cause	Repair evidence	Retrieval card
Starter	Pick one radar row above	Explain how it would fail in this module	Name the assumption	Add a counterexample or corrected artifact	Write the card before closing the page

Completion Standard

At least five real mistakes are logged.
At least two mistakes include a counterexample or failing test.
At least one mistake connects to an older semester skill.
At least one correction changes code, a proof, a diagram, a command transcript, a query, or a design decision.

Module-Specific Mistake Radar​

Practice Mistake Checks​

Performance Profiling Lab​

Scaling Design Workshop​

Reliability and SLO Clinic​

Repair Protocol​

Mistake Log​

Completion Standard​