Code Katas
Focused, repeatable exercises that force the material from this module into working code. Complete each kata multiple times until it is automatic.
Format for every kata:
- Time limit: as stated per kata
- Polya comments: in your code, include four comments:
# Understand,# Plan,# Invariant/Recurrence,# Reflect - Tests: write at least three test cases before coding, including one edge case and (where possible) one stress-test comparison against a reference
Most katas assume Linux. On macOS/WSL most work with minor substitutions (dtruss for strace, for example).
Kata 1: Implement a Mini Scheduler in Python
Time limit: 45 minutes
Goal: Implement FCFS, SJF, SRTF, and Round-Robin as pure functions over a workload and produce correct metrics.
Data model:
Job = namedtuple("Job", "pid arrival length")
Event = namedtuple("Event", "t pid") # pid running at time t
Implement:
run_fcfs(jobs) -> list[Event]run_sjf(jobs) -> list[Event](non-preemptive)run_srtf(jobs) -> list[Event](preemptive, 1-unit ticks)run_rr(jobs, q) -> list[Event]metrics(events, jobs) -> dictreturning average turnaround, waiting, response, and throughput
Tests: the two workloads (A, B) from the Scheduling Policy Lab, plus a stress test that asserts sum(job.length) == total CPU time used.
Repeat until: all four policies are correct on both workloads in under 20 minutes from scratch.
Kata 2: Measure Context-Switch Cost
Time limit: 45 minutes
Goal: Measure actual context-switch cost on your machine with a micro-benchmark.
Approach: two processes (or two threads) ping-ponging a single byte over a pipe/eventfd, and measuring wall time across N round trips. Each round trip is approximately two context switches, plus two small syscalls.
// pseudo-code
int fd[2]; pipe(fd);
if (fork() == 0) {
for (i = 0; i < N; i++) { read(fd[0], &b, 1); write(fd[1], &b, 1); }
} else {
clock_gettime(&t0);
for (i = 0; i < N; i++) { write(fd[1], &b, 1); read(fd[0], &b, 1); }
clock_gettime(&t1);
printf("ctx-switch-pair: %.1f ns\n", elapsed_ns(t0, t1) / (double)N);
}
Tasks:
- Run with
N = 1_000_000on the same core (taskset -c 0 ./bench) and across two cores (taskset -c 0,1). - Repeat with threads instead of processes.
- Report cost-per-switch for: (same-core processes, cross-core processes, same-core threads, cross-core threads).
- Explain the ordering of the four numbers in 2-3 sentences each.
Repeat until: you can articulate why cross-core is usually more expensive (cache coherence, migration cost) and why threads beat processes (no full TLB flush needed).
Kata 3: Trace fork via strace
Time limit: 30 minutes
Goal: See the process lifecycle from the kernel's side.
Steps:
- Write a small C program that
fork()s once, the childexecves/bin/ls, the parentwaitpids. - Run
strace -f -e trace=clone,execve,wait4,exit ./progand capture the output. - Annotate the trace: for each syscall, write (a) which process made it, (b) what it returned, (c) what process state changed.
- Repeat with
strace -f -e trace=process ./progand compare.
Answer in writing:
- Why does
straceshowcloneand notfork? (Hint: glibc.) - What is the exit status you see in
wait4's result? - If you remove the
waitpid, what happens to the child and why doespsshow it as<defunct>?
Repeat until: you can read a strace -f of any fork/exec/wait program and narrate what each line is doing.
Kata 4: Compute CFS Virtual Runtime by Hand
Time limit: 25 minutes
Goal: Simulate CFS's vruntime logic on paper and in a spreadsheet.
Setup: three tasks with weights w1 = 1024 (nice 0), w2 = 1024, w3 = 3121 (nice -5). Each runs for 10 ms when dispatched, then another task becomes leftmost.
- Initialize vruntime to 0 for all three.
- Simulate 10 rounds: pick the leftmost task, run it for
10 msof real time, increment its vruntime by10 * (1024 / weight). - Record each task's accumulated real-CPU time and vruntime after each round.
- Confirm that the real-CPU ratio is approximately the weight ratio.
Compare your numbers to the formula: for runnable weights w_i, long-run share is w_i / sum(w_j).
Repeat until: you can explain why this pick-leftmost rule naturally produces weighted fair sharing without explicit time quanta.
Kata 5: Affinity and Migrations
Time limit: 30 minutes
Goal: Observe load balancing and affinity empirically.
Steps:
- Start four CPU-bound loops on a 4-core host; run
mpstat -P ALL 1andperf stat -a -e migrations -- sleep 10. Record migrations and the per-core load. - Pin all four with
taskset -c 0. Re-run both commands. Explain what you expect and what you actually observe. - Pin one worker per core:
taskset -c 0 ... & taskset -c 1 .... Re-run and explain.
Deliverable: a three-paragraph note explaining the measurements and why they match (or do not match) the multi-core model in Concept 13.
Repeat until: you can predict the mpstat output and the migration count before running the command.
Kata 6: Priority Inversion Simulator
Time limit: 30 minutes
Goal: Build a small simulator that exhibits priority inversion and then fixes it with priority inheritance.
Model three tasks L, M, H with priorities 1, 2, 3 and a single mutex m. L and H share the mutex, M does not use it. Simulate on a single CPU with non-preemptive critical sections.
- Construct an arrival pattern where
H's completion is blocked byMbecauseLholdsm. - Compute
H's response time without priority inheritance. - Implement priority inheritance (
L's effective priority jumps to3while holdingm). - Recompute
H's response time.
Repeat until: you can explain the Mars Pathfinder incident to another engineer without notes.
Kata 7: cgroup Throttle Experiment
Time limit: 25 minutes
Goal: Cause and then diagnose a cgroup CPU throttle.
Steps (requires cgroup v2 and root):
- Create a cgroup
test; setcpu.max = "20000 100000"(20% of one core). - Start a CPU-bound worker in it and observe
cat /sys/fs/cgroup/test/cpu.statover time. - Note
nr_throttledandthrottled_usec. - Raise the cap to
"200000 100000"(2 cores). Observe the change.
Deliverable: a one-paragraph note explaining why throttling was happening even though the host CPU had headroom, and which cgroup field proves it.
Repeat until: you can read a cpu.stat file and decide in under 30 seconds whether a workload is being throttled.
Kata 8: Triage Sprint -- "my process is slow"
Time limit: 20 minutes
Goal: Given a vague complaint, pick the right next measurement in under 60 seconds per scenario.
For each scenario, state: (a) the most likely mechanism, (b) the single command you would run next, (c) what you would look for in its output.
- "My pod's p99 latency went up after deployment; CPU usage is at 30%."
- "My batch job is slow;
topshows load average 50 on a 4-core box." - "My server's tail latency doubled when a neighbor started a backup."
- "My multi-threaded program is slower with 16 threads than with 4 on a 16-core box."
- "My realtime loop misses deadlines randomly a few times per minute."
- "My program is fast the first time I run it after boot and slow after that."
Repeat until: you can triage all six in under 20 minutes total with a correct first diagnostic for each.
Completion Standard
- Katas 1, 2, 3 completed at least once with working code or captured output
- Katas 1 and 2 completed three or more times, within time limit, from scratch
- Katas 4 and 6 completed with hand-computed numbers that match the formulas
- Katas 5 and 7 have captured real measurements and written explanations
- Polya comments present in every final code solution
- You can recite the core pattern (invariant / recurrence / exchange / state / triage-tree) for every kata without reference