Build Your Own Shell
"A shell is just a
while(1) { read; parse; fork; exec; wait; }loop." — every Unix systems professor
A shell is the canonical "tiny but real" systems project. Three hundred lines of C will give you a working interactive shell with pipes, redirection, builtins, and job control basics. Every line of it teaches one of the core Unix system calls.
1. Overview & motivation
A shell does three things:
- Reads a command line.
- Parses it into commands, arguments, redirections, pipes.
- Executes by spawning processes with
fork/exec/wait, wiring up file descriptors withpipe/dup2.
What you can only learn by building one:
- Why
forkis conceptually weird (two return values: parent and child). - Why pipes are unidirectional and why
dup2is the right tool to wire them. - Why
wait/waitpidand zombie processes exist. - Why signal handling in shells is harder than it looks (Ctrl-C should kill the child, not the shell).
- Why command parsing is a tiny language with its own grammar.
2. Where this fits in the degree
- Phase: Systems
- Semester: 4 (Systems Programming)
- Modules deepened: Module 1 (C fundamentals) — string handling, struct layout. Module 4 (systems-level programming) — the entire point:
fork,exec,pipe,dup2,wait, signals.
Cross-phase relevance:
- Background for the Docker / container tutorial — containers extend the same primitives with namespaces.
- Parsing skeleton transfers directly to the Interpreter project.
3. Prerequisites
- C: strings, pointers, structs. Comfort with
argv-style arrays. - Basic Unix: knowing what
ls | grep foo > out.txtdoes conceptually.
4. Theory & research
Required reading
- Stephen Brennan, "Tutorial - Write a Shell in C" (brennan.io/2015/01/16/write-a-shell-in-c/) — the canonical 250-line tutorial. ⭐ start here.
- Eli Bendersky, "Let's Build a Shell" (eli.thegreenplace.net) — another excellent walkthrough.
Recommended
- Stevens & Rago, Advanced Programming in the UNIX Environment — Chapters 8 (process control), 14 (advanced I/O), 15 (interprocess communication). The Unix systems-programming bible.
- Kerrisk, The Linux Programming Interface — Chapter 27 (process creation), 28 (execve), 44 (pipes). Even better than APUE for modern Linux.
- Bash source code —
parse.yis illustrative. Don't read it all; sample it after writing your own.
For parsing specifically
- A recursive descent parser is enough. No need for lex/yacc. Most BYO-X shell tutorials parse with hand-written code.
5. Curated tutorial list (from BYO-X)
- C: Tutorial - Write a Shell in C — Stephen Brennan ⭐ recommended primary
- C: Let's build a shell! — Indradhanush Gupta
- C: Writing a UNIX Shell — Sotirios Mantziaris and others
- C: Build Your Own Shell — App Codecrafters
- C: Write a shell in C
- Go: Writing a simple shell in Go — Adam Presley
- Rust: Build Your Own Shell using Rust — Joseph Lenton
6. Recommended primary path
Stephen Brennan's "Write a Shell in C" is the tutorial. ~250 lines. Walks through:
- The REPL loop.
- Tokenizing input.
- Executing one command with
fork/exec. - Built-ins (
cd,exit,help).
Read it once. Then implement your own version without copying. Then extend with pipes, redirection, and job control.
For a more challenging path with paid feedback: CodeCrafters' "Build Your Own Shell" course goes deeper into POSIX compliance.
7. Implementation milestones
Milestone 1: REPL with one command, no arguments
int main() {
char line[1024];
while (1) {
printf("$ ");
if (!fgets(line, sizeof line, stdin)) break;
line[strcspn(line, "\n")] = 0;
if (strcmp(line, "exit") == 0) break;
pid_t pid = fork();
if (pid == 0) {
execlp(line, line, (char*)NULL);
perror("exec"); _exit(1);
}
wait(NULL);
}
return 0;
}
Evidence: Run ls, pwd, whoami, exit. Demonstrate that pressing Ctrl-D exits cleanly.
Milestone 2: Tokenizer and arguments
Split input on whitespace. Build a char *argv[] and pass to execvp.
char **tokenize(char *line) {
int bufsize = 64, position = 0;
char **tokens = malloc(bufsize * sizeof(char*));
char *token = strtok(line, " \t\r\n");
while (token) {
tokens[position++] = token;
if (position >= bufsize) {
bufsize *= 2;
tokens = realloc(tokens, bufsize * sizeof(char*));
}
token = strtok(NULL, " \t\r\n");
}
tokens[position] = NULL;
return tokens;
}
Evidence: Run ls -la /tmp, echo hello world, cat /etc/passwd | head.
(The last one won't work yet — pipes are Milestone 4.)
Milestone 3: Built-ins
cd, exit, pwd, help. Built-ins must be handled in the parent process — cd in a child has no effect on the parent.
int builtin_cd(char **args) {
if (!args[1]) chdir(getenv("HOME"));
else if (chdir(args[1]) != 0) perror("cd");
return 1;
}
Evidence: cd /tmp; pwd — must print /tmp. If cd were not a built-in, this would print the original directory.
Milestone 4: Pipes
The big one. cat foo | grep bar | wc -l needs:
- Parse into N commands.
- Create N-1 pipes.
forkfor each command.- In each child,
dup2the right ends of the pipes to stdin/stdout. - Close every fd the child doesn't need.
exec.- Parent
waits for all children.
int execute_pipeline(char ***commands, int n) {
int pipes[2 * (n - 1)];
for (int i = 0; i < n - 1; i++) pipe(pipes + 2 * i);
for (int i = 0; i < n; i++) {
pid_t pid = fork();
if (pid == 0) {
if (i > 0) dup2(pipes[2 * (i - 1)], 0);
if (i < n - 1) dup2(pipes[2 * i + 1], 1);
for (int j = 0; j < 2 * (n - 1); j++) close(pipes[j]);
execvp(commands[i][0], commands[i]);
perror("exec"); _exit(1);
}
}
for (int j = 0; j < 2 * (n - 1); j++) close(pipes[j]);
for (int i = 0; i < n; i++) wait(NULL);
return 1;
}
The single biggest source of bugs: close every fd you do not need. Forgetting one leaves pipes open and your shell hangs forever.
Evidence: ls /etc | grep conf | wc -l produces the same output as in bash.
Milestone 5: Redirection
>, >>, <, 2>. Open the file with the appropriate flags and dup2 over the right fd.
int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0644);
dup2(fd, 1); close(fd);
Evidence: echo hello > out; cat out prints hello.
Milestone 6: Job control & signals
Foreground vs background (&). Handle SIGINT so Ctrl-C kills the child, not the shell. Reap background processes with SIGCHLD handler or non-blocking waitpid.
This is hard to get fully right. A subset that handles foreground Ctrl-C correctly is enough for the tutorial.
Evidence: Run sleep 30 &, then sleep 30, then Ctrl-C. The foreground sleep dies. The background sleep keeps running. jobs (if implemented) lists it.
Milestone 7 (optional): Globbing, variables, history
* and ? expansion. $HOME, $PATH. Up-arrow history with readline library.
This is where bash starts diverging from POSIX. Stop here unless you genuinely want a usable interactive shell.
8. Tests & evidence
| Test | How |
|---|---|
| Single command | ls, echo foo, whoami |
| Args | ls -la, echo hello world |
| Built-ins | cd /tmp; pwd correctly changes directory |
| Pipes | `ls |
| Long pipes | 5-stage pipe runs correctly |
| Redirection | echo x > /tmp/y; cat /tmp/y |
| Append | echo a > f; echo b >> f; cat f produces a\nb |
| Stdin redirection | wc -l < /etc/passwd |
| Stderr redirection | ls /nonexistent 2> err; cat err |
| Empty input | Pressing Enter on empty line does nothing |
| EOF | Ctrl-D exits cleanly |
| Background | sleep 10 & doesn't block; child reaped on completion |
| No zombie processes | After running 100 commands, ps shows no zombies for your shell |
9. Common pitfalls
- Forgetting to close unused pipe fds. Causes hangs. The rule: in each child, close every pipe fd except the one you
dup2'd. In the parent, close all pipe fds beforewaiting. - Using
exec*instead ofexiton exec failure. Ifexecvpreturns, it failed; you must_exit, notexit(which would run atexit handlers in the child). - Built-ins in subshells.
cdonly works in the parent. If you fork before checking for built-ins, the cd does nothing. - Forgetting to wait. Without
wait, you leave zombie children. - Ignoring signals. Without handling
SIGINT, Ctrl-C kills your shell instead of the running command. - Naive tokenizer.
echo "hello world"should be one argument, not two. The simple tutorials skip quoting; if you add it, write a proper state-machine tokenizer. - Reading from
stdinafter a child closedstdin. If the foreground child has stdin redirected, your shell's stdin can get into a strange state.
10. Extensions
- Quoting (
"...",'...') and escaping (\). Tokenizer becomes a small state machine. - Environment variables with
$VARexpansion. - Aliases (
alias ll='ls -la'). - Tab completion via the
readlinelibrary. - Scripting — read commands from a file (
sh foo.sh). - Subshells with
$(...)substitution.
A complete shell is a small operating system, which is why bash, zsh, fish are all serious software. Know where to stop.
11. Module integration
| Module | What the shell deepens |
|---|---|
| Sem 4 Module 1 — C fundamentals | String parsing, struct design, memory ownership of token arrays. |
| Sem 4 Module 4 — Systems-level programming | The entire point: fork, exec, pipe, dup2, wait. |
| Sem 5 Module 1 — Processes & scheduling | Direct application — you see process states change. |
| Sem 5 Module 3 — Concurrency | Pipes are concurrent producers/consumers across processes. |
| Docker tutorial | A container is "fork + namespaces + chroot + exec". The shell is the foundation. |
| Interpreter tutorial | Your tokenizer/parser code transfers directly. |
12. Portfolio framing
What to publish:
- Clean C source with
src/,tests/,Makefile, and aREADME.mdshowing usage examples. - The list of tests above, with output transcripts.
- A "what this does not do" section: job control beyond background, scripting, quoting.
Reviewer entry points:
src/main.c— the REPL loop.src/execute.c— pipeline and redirection.src/builtins.c—cd,exit, etc.- README must include a section "How is this different from bash?" — required honesty.
Source
This tutorial draws from the BYO-X catalog "Shell" section. Brennan's tutorial is the canonical starting point; APUE and TLPI are the systems-programming references.