Distributed Systems Code Katas

Focused, repeatable exercises to build fluency on the core distributed-systems patterns. Complete each kata at least twice, ideally in different languages or against different coordination services.

Kata 1: Simulate Lamport and Vector Clocks on a Small System

Time limit: 20 minutes per run
Goal: concrete intuition for logical time

Setup: implement a simulator in any language. Represent processes as objects with:

a local Lamport counter (integer)
a local vector clock (array of length N)
a method local_event() that increments both
a method send(receiver) that increments, tags the message with both timestamps, and delivers it (optionally with random delay)
a method receive(msg) that updates both according to the rules

Run this script:

3 processes P1, P2, P3.
Random interleaving: each tick, a random process either does a local event, sends to a random peer, or delivers one queued message.
After 50 ticks, print every event's Lamport timestamp, vector timestamp, and process id.
List every concurrent pair (detected via vector clocks).
Verify: for every a -> b in the trace, L(a) < L(b) and V(a) < V(b) componentwise.

Repeat until: you can write the update rules without looking them up and your verification never fails.

Kata 2: Implement a Toy Raft Leader Election

Time limit: 45-60 minutes per run
Goal: internalize the election protocol and its invariants

Setup: implement, in any language:

Node objects with id, currentTerm, votedFor, state (Follower/Candidate/Leader), log (may be empty for this kata).
RequestVote(term, candidateId, lastLogIndex, lastLogTerm) RPC.
A randomized election timeout per node (150-300ms).
A heartbeat from the leader every 50ms (empty AppendEntries).

Behavior requirements:

A Follower with an expired election timer becomes a Candidate, increments currentTerm, votes for itself, sends RequestVote to all peers.
On RequestVote receipt: if term < currentTerm, reject. If you haven't voted this term and the candidate's log is at least as up-to-date (last entry's term then index), grant the vote.
On winning a majority, become Leader and start heartbeats.
On seeing a higher term, revert to Follower.
Test: inject a leader crash by stopping its heartbeats. Verify exactly one node becomes the new leader within a bounded time.

Bonus: simulate a network partition that splits the cluster 2-3; verify the 3-side elects a leader and the 2-side does not.

Repeat until: you can write the state machine from scratch and your test never sees two leaders in the same term.

Kata 3: Design an Idempotent HTTP API

Time limit: 30 minutes per run
Goal: apply the idempotency pattern end-to-end

Setup: sketch (on paper or in code) a POST /payments endpoint with the following behavior:

Accepts Idempotency-Key: <uuid> header.
Body: { "amount": N, "currency": "USD", "card_token": "..." }.
On first call with a given key: process the charge, store (key, request_hash, response) with 24h TTL, return response.
On duplicate call with same key and same request body: return the stored response without reprocessing.
On duplicate call with same key but different body: return 422 Unprocessable Entity (key reuse is a client bug).
On duplicate call where the first is still in flight: either return 409 Conflict or block until the first finishes.

Deliverables:

The handler pseudo-code.
The schema of the idempotency store (key, body hash, response, status, created_at).
A failure-mode table: what happens if the store is unavailable? If the downstream charge API returns but our response is lost?
An argument for why this guarantees effectively-once processing despite at-least-once delivery.

Repeat until: you cover at least four failure modes correctly and the handler is under 30 lines.

Kata 4: Analyze a Real Distributed Outage Postmortem

Time limit: 60 minutes per postmortem
Goal: recognize module concepts in the wild

Setup: pick one public postmortem from the list below (or equivalent). Read it once, then write a 1-2 page analysis answering each question.

Completion Standard

Ran Kata 1 at least twice and can reproduce the clock rules from memory.
Ran Kata 2 at least once and your election never produced two leaders in the same term.
Wrote Kata 3 in under 30 minutes with at least four failure modes covered.
Analyzed at least two postmortems in Kata 4.
You can explain each kata's core technique in one sentence.

Kata 1: Simulate Lamport and Vector Clocks on a Small System​

Kata 2: Implement a Toy Raft Leader Election​

Kata 3: Design an Idempotent HTTP API​

Kata 4: Analyze a Real Distributed Outage Postmortem​

Completion Standard​

Kata 1: Simulate Lamport and Vector Clocks on a Small System

Kata 2: Implement a Toy Raft Leader Election

Kata 3: Design an Idempotent HTTP API

Kata 4: Analyze a Real Distributed Outage Postmortem

Completion Standard