Skip to main content

Code Katas

Focused, repeatable exercises that force the material from this module into working code. Complete each kata multiple times until it is automatic.

Format for every kata:

  • Time limit: as stated per kata
  • Polya comments: in your code, include four comments: # Understand, # Plan, # Invariant/Recurrence, # Reflect
  • Tests: write at least three test cases before coding, including one edge case and (where possible) one stress-test comparison against a reference

Most katas assume Linux. On macOS/WSL most work with minor substitutions (dtruss for strace, for example).

Kata 1: Implement a Mini Scheduler in Python

Time limit: 45 minutes
Goal: Implement FCFS, SJF, SRTF, and Round-Robin as pure functions over a workload and produce correct metrics.

Data model:

Job = namedtuple("Job", "pid arrival length")
Event = namedtuple("Event", "t pid") # pid running at time t

Implement:

  1. run_fcfs(jobs) -> list[Event]
  2. run_sjf(jobs) -> list[Event] (non-preemptive)
  3. run_srtf(jobs) -> list[Event] (preemptive, 1-unit ticks)
  4. run_rr(jobs, q) -> list[Event]
  5. metrics(events, jobs) -> dict returning average turnaround, waiting, response, and throughput

Tests: the two workloads (A, B) from the Scheduling Policy Lab, plus a stress test that asserts sum(job.length) == total CPU time used.

Repeat until: all four policies are correct on both workloads in under 20 minutes from scratch.

Kata 2: Measure Context-Switch Cost

Time limit: 45 minutes
Goal: Measure actual context-switch cost on your machine with a micro-benchmark.

Approach: two processes (or two threads) ping-ponging a single byte over a pipe/eventfd, and measuring wall time across N round trips. Each round trip is approximately two context switches, plus two small syscalls.

// pseudo-code
int fd[2]; pipe(fd);
if (fork() == 0) {
for (i = 0; i < N; i++) { read(fd[0], &b, 1); write(fd[1], &b, 1); }
} else {
clock_gettime(&t0);
for (i = 0; i < N; i++) { write(fd[1], &b, 1); read(fd[0], &b, 1); }
clock_gettime(&t1);
printf("ctx-switch-pair: %.1f ns\n", elapsed_ns(t0, t1) / (double)N);
}

Tasks:

  1. Run with N = 1_000_000 on the same core (taskset -c 0 ./bench) and across two cores (taskset -c 0,1).
  2. Repeat with threads instead of processes.
  3. Report cost-per-switch for: (same-core processes, cross-core processes, same-core threads, cross-core threads).
  4. Explain the ordering of the four numbers in 2-3 sentences each.

Repeat until: you can articulate why cross-core is usually more expensive (cache coherence, migration cost) and why threads beat processes (no full TLB flush needed).

Kata 3: Trace fork via strace

Time limit: 30 minutes
Goal: See the process lifecycle from the kernel's side.

Steps:

  1. Write a small C program that fork()s once, the child execves /bin/ls, the parent waitpids.
  2. Run strace -f -e trace=clone,execve,wait4,exit ./prog and capture the output.
  3. Annotate the trace: for each syscall, write (a) which process made it, (b) what it returned, (c) what process state changed.
  4. Repeat with strace -f -e trace=process ./prog and compare.

Answer in writing:

  • Why does strace show clone and not fork? (Hint: glibc.)
  • What is the exit status you see in wait4's result?
  • If you remove the waitpid, what happens to the child and why does ps show it as <defunct>?

Repeat until: you can read a strace -f of any fork/exec/wait program and narrate what each line is doing.

Kata 4: Compute CFS Virtual Runtime by Hand

Time limit: 25 minutes
Goal: Simulate CFS's vruntime logic on paper and in a spreadsheet.

Setup: three tasks with weights w1 = 1024 (nice 0), w2 = 1024, w3 = 3121 (nice -5). Each runs for 10 ms when dispatched, then another task becomes leftmost.

  1. Initialize vruntime to 0 for all three.
  2. Simulate 10 rounds: pick the leftmost task, run it for 10 ms of real time, increment its vruntime by 10 * (1024 / weight).
  3. Record each task's accumulated real-CPU time and vruntime after each round.
  4. Confirm that the real-CPU ratio is approximately the weight ratio.

Compare your numbers to the formula: for runnable weights w_i, long-run share is w_i / sum(w_j).

Repeat until: you can explain why this pick-leftmost rule naturally produces weighted fair sharing without explicit time quanta.

Kata 5: Affinity and Migrations

Time limit: 30 minutes
Goal: Observe load balancing and affinity empirically.

Steps:

  1. Start four CPU-bound loops on a 4-core host; run mpstat -P ALL 1 and perf stat -a -e migrations -- sleep 10. Record migrations and the per-core load.
  2. Pin all four with taskset -c 0. Re-run both commands. Explain what you expect and what you actually observe.
  3. Pin one worker per core: taskset -c 0 ... & taskset -c 1 .... Re-run and explain.

Deliverable: a three-paragraph note explaining the measurements and why they match (or do not match) the multi-core model in Concept 13.

Repeat until: you can predict the mpstat output and the migration count before running the command.

Kata 6: Priority Inversion Simulator

Time limit: 30 minutes
Goal: Build a small simulator that exhibits priority inversion and then fixes it with priority inheritance.

Model three tasks L, M, H with priorities 1, 2, 3 and a single mutex m. L and H share the mutex, M does not use it. Simulate on a single CPU with non-preemptive critical sections.

  1. Construct an arrival pattern where H's completion is blocked by M because L holds m.
  2. Compute H's response time without priority inheritance.
  3. Implement priority inheritance (L's effective priority jumps to 3 while holding m).
  4. Recompute H's response time.

Repeat until: you can explain the Mars Pathfinder incident to another engineer without notes.

Kata 7: cgroup Throttle Experiment

Time limit: 25 minutes
Goal: Cause and then diagnose a cgroup CPU throttle.

Steps (requires cgroup v2 and root):

  1. Create a cgroup test; set cpu.max = "20000 100000" (20% of one core).
  2. Start a CPU-bound worker in it and observe cat /sys/fs/cgroup/test/cpu.stat over time.
  3. Note nr_throttled and throttled_usec.
  4. Raise the cap to "200000 100000" (2 cores). Observe the change.

Deliverable: a one-paragraph note explaining why throttling was happening even though the host CPU had headroom, and which cgroup field proves it.

Repeat until: you can read a cpu.stat file and decide in under 30 seconds whether a workload is being throttled.

Kata 8: Triage Sprint -- "my process is slow"

Time limit: 20 minutes
Goal: Given a vague complaint, pick the right next measurement in under 60 seconds per scenario.

For each scenario, state: (a) the most likely mechanism, (b) the single command you would run next, (c) what you would look for in its output.

  1. "My pod's p99 latency went up after deployment; CPU usage is at 30%."
  2. "My batch job is slow; top shows load average 50 on a 4-core box."
  3. "My server's tail latency doubled when a neighbor started a backup."
  4. "My multi-threaded program is slower with 16 threads than with 4 on a 16-core box."
  5. "My realtime loop misses deadlines randomly a few times per minute."
  6. "My program is fast the first time I run it after boot and slow after that."

Repeat until: you can triage all six in under 20 minutes total with a correct first diagnostic for each.

Completion Standard

  • Katas 1, 2, 3 completed at least once with working code or captured output
  • Katas 1 and 2 completed three or more times, within time limit, from scratch
  • Katas 4 and 6 completed with hand-computed numbers that match the formulas
  • Katas 5 and 7 have captured real measurements and written explanations
  • Polya comments present in every final code solution
  • You can recite the core pattern (invariant / recurrence / exchange / state / triage-tree) for every kata without reference