Skip to main content

Build Your Own Shell

"A shell is just a while(1) { read; parse; fork; exec; wait; } loop." — every Unix systems professor

A shell is the canonical "tiny but real" systems project. Three hundred lines of C will give you a working interactive shell with pipes, redirection, builtins, and job control basics. Every line of it teaches one of the core Unix system calls.


1. Overview & motivation

A shell does three things:

  1. Reads a command line.
  2. Parses it into commands, arguments, redirections, pipes.
  3. Executes by spawning processes with fork/exec/wait, wiring up file descriptors with pipe/dup2.

What you can only learn by building one:

  • Why fork is conceptually weird (two return values: parent and child).
  • Why pipes are unidirectional and why dup2 is the right tool to wire them.
  • Why wait/waitpid and zombie processes exist.
  • Why signal handling in shells is harder than it looks (Ctrl-C should kill the child, not the shell).
  • Why command parsing is a tiny language with its own grammar.

2. Where this fits in the degree

  • Phase: Systems
  • Semester: 4 (Systems Programming)
  • Modules deepened: Module 1 (C fundamentals) — string handling, struct layout. Module 4 (systems-level programming) — the entire point: fork, exec, pipe, dup2, wait, signals.

Cross-phase relevance:


3. Prerequisites

  • C: strings, pointers, structs. Comfort with argv-style arrays.
  • Basic Unix: knowing what ls | grep foo > out.txt does conceptually.

4. Theory & research

Required reading

  • Stevens & Rago, Advanced Programming in the UNIX Environment — Chapters 8 (process control), 14 (advanced I/O), 15 (interprocess communication). The Unix systems-programming bible.
  • Kerrisk, The Linux Programming Interface — Chapter 27 (process creation), 28 (execve), 44 (pipes). Even better than APUE for modern Linux.
  • Bash source codeparse.y is illustrative. Don't read it all; sample it after writing your own.

For parsing specifically

  • A recursive descent parser is enough. No need for lex/yacc. Most BYO-X shell tutorials parse with hand-written code.

5. Curated tutorial list (from BYO-X)


Stephen Brennan's "Write a Shell in C" is the tutorial. ~250 lines. Walks through:

  1. The REPL loop.
  2. Tokenizing input.
  3. Executing one command with fork/exec.
  4. Built-ins (cd, exit, help).

Read it once. Then implement your own version without copying. Then extend with pipes, redirection, and job control.

For a more challenging path with paid feedback: CodeCrafters' "Build Your Own Shell" course goes deeper into POSIX compliance.


7. Implementation milestones

Milestone 1: REPL with one command, no arguments

int main() {
char line[1024];
while (1) {
printf("$ ");
if (!fgets(line, sizeof line, stdin)) break;
line[strcspn(line, "\n")] = 0;
if (strcmp(line, "exit") == 0) break;
pid_t pid = fork();
if (pid == 0) {
execlp(line, line, (char*)NULL);
perror("exec"); _exit(1);
}
wait(NULL);
}
return 0;
}

Evidence: Run ls, pwd, whoami, exit. Demonstrate that pressing Ctrl-D exits cleanly.

Milestone 2: Tokenizer and arguments

Split input on whitespace. Build a char *argv[] and pass to execvp.

char **tokenize(char *line) {
int bufsize = 64, position = 0;
char **tokens = malloc(bufsize * sizeof(char*));
char *token = strtok(line, " \t\r\n");
while (token) {
tokens[position++] = token;
if (position >= bufsize) {
bufsize *= 2;
tokens = realloc(tokens, bufsize * sizeof(char*));
}
token = strtok(NULL, " \t\r\n");
}
tokens[position] = NULL;
return tokens;
}

Evidence: Run ls -la /tmp, echo hello world, cat /etc/passwd | head.

(The last one won't work yet — pipes are Milestone 4.)

Milestone 3: Built-ins

cd, exit, pwd, help. Built-ins must be handled in the parent process — cd in a child has no effect on the parent.

int builtin_cd(char **args) {
if (!args[1]) chdir(getenv("HOME"));
else if (chdir(args[1]) != 0) perror("cd");
return 1;
}

Evidence: cd /tmp; pwd — must print /tmp. If cd were not a built-in, this would print the original directory.

Milestone 4: Pipes

The big one. cat foo | grep bar | wc -l needs:

  1. Parse into N commands.
  2. Create N-1 pipes.
  3. fork for each command.
  4. In each child, dup2 the right ends of the pipes to stdin/stdout.
  5. Close every fd the child doesn't need.
  6. exec.
  7. Parent waits for all children.
int execute_pipeline(char ***commands, int n) {
int pipes[2 * (n - 1)];
for (int i = 0; i < n - 1; i++) pipe(pipes + 2 * i);
for (int i = 0; i < n; i++) {
pid_t pid = fork();
if (pid == 0) {
if (i > 0) dup2(pipes[2 * (i - 1)], 0);
if (i < n - 1) dup2(pipes[2 * i + 1], 1);
for (int j = 0; j < 2 * (n - 1); j++) close(pipes[j]);
execvp(commands[i][0], commands[i]);
perror("exec"); _exit(1);
}
}
for (int j = 0; j < 2 * (n - 1); j++) close(pipes[j]);
for (int i = 0; i < n; i++) wait(NULL);
return 1;
}

The single biggest source of bugs: close every fd you do not need. Forgetting one leaves pipes open and your shell hangs forever.

Evidence: ls /etc | grep conf | wc -l produces the same output as in bash.

Milestone 5: Redirection

>, >>, <, 2>. Open the file with the appropriate flags and dup2 over the right fd.

int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0644);
dup2(fd, 1); close(fd);

Evidence: echo hello > out; cat out prints hello.

Milestone 6: Job control & signals

Foreground vs background (&). Handle SIGINT so Ctrl-C kills the child, not the shell. Reap background processes with SIGCHLD handler or non-blocking waitpid.

This is hard to get fully right. A subset that handles foreground Ctrl-C correctly is enough for the tutorial.

Evidence: Run sleep 30 &, then sleep 30, then Ctrl-C. The foreground sleep dies. The background sleep keeps running. jobs (if implemented) lists it.

Milestone 7 (optional): Globbing, variables, history

* and ? expansion. $HOME, $PATH. Up-arrow history with readline library.

This is where bash starts diverging from POSIX. Stop here unless you genuinely want a usable interactive shell.


8. Tests & evidence

TestHow
Single commandls, echo foo, whoami
Argsls -la, echo hello world
Built-inscd /tmp; pwd correctly changes directory
Pipes`ls
Long pipes5-stage pipe runs correctly
Redirectionecho x > /tmp/y; cat /tmp/y
Appendecho a > f; echo b >> f; cat f produces a\nb
Stdin redirectionwc -l < /etc/passwd
Stderr redirectionls /nonexistent 2> err; cat err
Empty inputPressing Enter on empty line does nothing
EOFCtrl-D exits cleanly
Backgroundsleep 10 & doesn't block; child reaped on completion
No zombie processesAfter running 100 commands, ps shows no zombies for your shell

9. Common pitfalls

  • Forgetting to close unused pipe fds. Causes hangs. The rule: in each child, close every pipe fd except the one you dup2'd. In the parent, close all pipe fds before waiting.
  • Using exec* instead of exit on exec failure. If execvp returns, it failed; you must _exit, not exit (which would run atexit handlers in the child).
  • Built-ins in subshells. cd only works in the parent. If you fork before checking for built-ins, the cd does nothing.
  • Forgetting to wait. Without wait, you leave zombie children.
  • Ignoring signals. Without handling SIGINT, Ctrl-C kills your shell instead of the running command.
  • Naive tokenizer. echo "hello world" should be one argument, not two. The simple tutorials skip quoting; if you add it, write a proper state-machine tokenizer.
  • Reading from stdin after a child closed stdin. If the foreground child has stdin redirected, your shell's stdin can get into a strange state.

10. Extensions

  • Quoting ("...", '...') and escaping (\). Tokenizer becomes a small state machine.
  • Environment variables with $VAR expansion.
  • Aliases (alias ll='ls -la').
  • Tab completion via the readline library.
  • Scripting — read commands from a file (sh foo.sh).
  • Subshells with $(...) substitution.

A complete shell is a small operating system, which is why bash, zsh, fish are all serious software. Know where to stop.


11. Module integration

ModuleWhat the shell deepens
Sem 4 Module 1 — C fundamentalsString parsing, struct design, memory ownership of token arrays.
Sem 4 Module 4 — Systems-level programmingThe entire point: fork, exec, pipe, dup2, wait.
Sem 5 Module 1 — Processes & schedulingDirect application — you see process states change.
Sem 5 Module 3 — ConcurrencyPipes are concurrent producers/consumers across processes.
Docker tutorialA container is "fork + namespaces + chroot + exec". The shell is the foundation.
Interpreter tutorialYour tokenizer/parser code transfers directly.

12. Portfolio framing

What to publish:

  • Clean C source with src/, tests/, Makefile, and a README.md showing usage examples.
  • The list of tests above, with output transcripts.
  • A "what this does not do" section: job control beyond background, scripting, quoting.

Reviewer entry points:

  • src/main.c — the REPL loop.
  • src/execute.c — pipeline and redirection.
  • src/builtins.ccd, exit, etc.
  • README must include a section "How is this different from bash?" — required honesty.

Source

This tutorial draws from the BYO-X catalog "Shell" section. Brennan's tutorial is the canonical starting point; APUE and TLPI are the systems-programming references.