Skip to main content

Module 5: Distributed Systems Fundamentals: Mistake Clinic

This clinic turns wrong moves into reusable judgment. Use it after each practice page and again before the quiz or checkpoint.


Module-Specific Mistake Radar

Start with these traps. Replace or extend them with real mistakes from your own work.

Mistake to look forWhere it shows upSymptomRepair evidence
Finishing Time and Ordering Lab with only a final answerTime and Ordering LabThe work has no failed case, trace, test, proof gap, or design stress point.Add the smallest broken example and show the repair that changes the result.
Finishing Failure Model Workshop with only a final answerFailure Model WorkshopThe work has no failed case, trace, test, proof gap, or design stress point.Add the smallest broken example and show the repair that changes the result.
Finishing Consensus Reasoning Clinic with only a final answerConsensus Reasoning ClinicThe work has no failed case, trace, test, proof gap, or design stress point.Add the smallest broken example and show the repair that changes the result.
Finishing Distributed Systems Code Katas with only a final answerDistributed Systems Code KatasThe work has no failed case, trace, test, proof gap, or design stress point.Add the smallest broken example and show the repair that changes the result.
Treating The Eight Fallacies of Distributed Computing as vocabulary instead of a toolThe Eight Fallacies of Distributed ComputingThe explanation names the concept but cannot decide between two cases.Write one example, one non-example, and the rule that separates them.
Treating Partial Failure: The Single Defining Property as vocabulary instead of a toolPartial Failure: The Single Defining PropertyThe explanation names the concept but cannot decide between two cases.Write one example, one non-example, and the rule that separates them.

Practice Mistake Checks

Pull any miss from these checks into your mistake log.

Time and Ordering Lab

Source: practice/01-time-and-ordering-lab.md

For each, identify the error:

  1. "Our servers run NTP so timestamp-based last-write-wins is safe."
  2. "I used System.nanoTime() to stamp events and compared them across two services."
  3. "Lamport timestamps L(a) = 3 and L(b) = 3 mean a and b are concurrent."
  4. "Vector clocks give a total order over events."
  5. "We resolve write conflicts by the timestamp with the largest value."

Failure Model Workshop

Source: practice/02-failure-model-workshop.md

For each statement, identify the error:

  1. "Since we run a modern cloud, we don't have partial failures anymore."
  2. "TCP told us the connection is broken, so the peer process is down."
  3. "We need Byzantine fault tolerance because our nodes sometimes return weird data."
  4. "Our heartbeat is every 10ms with a 30ms timeout, so we detect failures fast."
  5. "A failure detector that always says 'all alive' is at least safe."

Consensus Reasoning Clinic

Source: practice/03-consensus-reasoning-clinic.md

For each, identify the error:

  1. "Raft guarantees exactly-once client semantics."
  2. "A 4-node Raft cluster is fine - it can tolerate one failure."
  3. "If a Paxos proposer gets no reply, it should propose a new value with a lower proposal number."
  4. "Under Raft, if the leader crashes, any follower can become the new leader."
  5. "FLP says consensus is unsolvable, so Paxos and Raft are unsound."

Repair Protocol

For each real mistake:

  1. Reproduce the failure on the smallest example, trace, proof, query, command, or design sketch.
  2. Name the hidden assumption.
  3. Repair the artifact.
  4. Save evidence that changed: failing then passing test, corrected proof step, revised diagram, safer command, benchmark, or review note.
  5. Add one retrieval card beginning with Check... before... or Do not use... when....

Mistake Log

DateMistakeSymptomRoot causeRepair evidenceRetrieval card
StarterPick one radar row aboveExplain how it would fail in this moduleName the assumptionAdd a counterexample or corrected artifactWrite the card before closing the page

Completion Standard

  • At least five real mistakes are logged.
  • At least two mistakes include a counterexample or failing test.
  • At least one mistake connects to an older semester skill.
  • At least one correction changes code, a proof, a diagram, a command transcript, a query, or a design decision.