Code Katas

Five katas. Each targets one canonical systems-programming skill. For every kata:

Read the specification.
Predict the syscall sequence (which calls, in which order, with which arguments) before you open an editor.
Implement it. Use -Wall -Wextra -pthread (where relevant) and -g -O0 for the first build.
Verify the behavior with the listed tool (strace, ps, valgrind, gdb, or a direct comparison against the system utility).
Repeat from scratch until you can produce a working version in the time limit without reference.

Kata 1: Mini Shell with One Pipe

Time limit: 60 minutes
Targets: fork, execvp, pipe, dup2, waitpid.

Write a shell msh that:

reads a line from stdin
splits it on whitespace
handles exactly one | (e.g., ls -l | wc -l)
runs each side in a child with stdin/stdout rewired through a single pipe
closes both pipe ends in the parent before waiting
loops until EOF

Verification: echo hello | ./msh with input cat | wc -c should print 6. Run ./msh under strace -f and confirm you see exactly one pipe2 (or pipe), two forks, two execves, four closes in the parent path, and two wait4s.

Stretch: support < file and > file redirection.

Kata 2: `cat` and `wc` From Raw Syscalls

Time limit: 45 minutes combined
Targets: open, read, write, close, EINTR handling, byte loops.

Write mycat and mywc:

mycat [FILE...] -- behave like cat; if no args, read stdin. Byte-exact output against system cat.
mywc [FILE...] -- print lines words bytes FILE like wc, and a total line when given multiple files.

Both must: loop read and write, handle EINTR, use a 4 KB buffer, and close every opened fd.

Verification: compare output with cat and wc on /etc/hostname, /etc/services, and any 1 MB text file (diff must be empty).

Stretch: make mycat support -n (line numbers) using only raw I/O.

Kata 3: Producer-Consumer with Condition Variables

Time limit: 45 minutes
Targets: pthread, pthread_mutex, pthread_cond, bounded buffer.

Write a program where:

2 producer threads each push integers 1..100000 into a shared bounded queue (capacity 16)
4 consumer threads pop and accumulate into a single shared total (protected by the same mutex, or an atomic counter)
at shutdown, each producer pushes a sentinel per consumer
the main thread joins all 6 threads and prints the final total
the final total equals 2 * 100000 * (100000 + 1) / 2 = 10000100000

Annotate your source with a comment on every synchronization call stating which race it prevents.

Verification: run 50 times with ./pc | sort -u | wc -l -- should always be exactly 1 (one unique correct answer).

Kata 4: Tiny HTTP Server

Time limit: 60 minutes
Targets: socket, bind, listen, accept, recv, send, HTTP framing, short-write loops.

Write tinyhttp that:

listens on port 8080 with SO_REUSEADDR
accepts one connection at a time (single-threaded is fine)
reads the HTTP request headers until it sees \r\n\r\n
ignores the method and path and responds with:

HTTP/1.1 200 OK\r\n
Content-Type: text/plain\r\n
Content-Length: 14\r\n
\r\n
hello, world\n

Verification: curl -v http://localhost:8080/ returns 200 OK and hello, world. curl repeated many times never leaks file descriptors (ls /proc/$(pidof tinyhttp)/fd | wc -l is stable).

Stretch: add a fork-per-connection model with a SIGCHLD reaper. Serve two curls in parallel.

Kata 5: `gdb` Hunt on a Planted Bug

Time limit: 30 minutes
Targets: gdb breakpoints, watchpoints, core dumps.

Build and debug this program (name it bughunt.c):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef struct { int id; char name[16]; int score; } User;

static User *users;
static int count;

static void load(const char *names[], int n) {
    users = calloc(n, sizeof *users);
    count = n;
    for (int i = 0; i < n; i++) {
        users[i].id = i;
        strcpy(users[i].name, names[i]);   /* bug: no length check */
        users[i].score = 0;
    }
}

static void bump(int id, int delta) { users[id].score += delta; }

int main(void) {
    const char *names[] = { "alice", "bob", "this-name-is-definitely-way-too-long", "dave" };
    load(names, 4);
    for (int i = 0; i < 4; i++) bump(i, 10);
    printf("%s has score %d\n", users[2].name, users[2].score);
    return 0;
}

Tasks:

Build with -g -O0, run, observe either a crash or wrong score for a later user.
Enable core dumps and open the core in gdb. Produce a backtrace.
Set a watchpoint on users[3].score at the start of main and run until it changes. Record the instruction address and the caller.
Explain which write corrupted what, in one sentence.
Fix load by using strncpy with sizeof users[i].name - 1 and an explicit null terminator. Rebuild. Confirm the watchpoint no longer triggers unexpectedly.

Repeat until: you can enter gdb, set a breakpoint, run, print locals, and produce a backtrace without looking anything up.

How to Score Yourself

Kata 1: complete if ./msh runs ls | wc -l and cat | wc -c correctly and strace -f shows no leaked fds after a pipeline finishes.
Kata 2: complete if diff against system cat/wc is empty on three distinct inputs.
Kata 3: complete if the program produces the same correct total across 50 runs with no helgrind warnings (valgrind --tool=helgrind ./pc).
Kata 4: complete if curl reports 200 OK and tinyhttp survives 1,000 sequential requests without leaking fds.
Kata 5: complete if you can reproduce, locate, and fix the overflow in under 30 minutes without asking anyone.

Kata 1: Mini Shell with One Pipe​

Kata 2: cat and wc From Raw Syscalls​

Kata 3: Producer-Consumer with Condition Variables​

Kata 4: Tiny HTTP Server​

Kata 5: gdb Hunt on a Planted Bug​

How to Score Yourself​