Skip to main content

dup, pipe, and Building Shell Redirection

What This Concept Is

Three primitives, used together, explain how ls | wc and cmd > out.txt work:

  • int pipe(int fds[2]) -- create a unidirectional pipe. fds[0] is the read end, fds[1] is the write end. Bytes written to fds[1] can be read from fds[0].
  • int dup(int oldfd) -- allocate the lowest-numbered free fd and make it a duplicate of oldfd. Both fds now refer to the same open file description.
  • int dup2(int oldfd, int newfd) -- forcibly make newfd a duplicate of oldfd, closing newfd first if it was open. Used to "rebind" standard fds.

The clever move is that fd 0, 1, and 2 are just numbers. If you dup2(some_fd, 1) before exec, the new program's printf (which calls write(1, ...)) will transparently write to wherever some_fd points -- a file, a pipe, a socket.

Why It Matters Here

This is the single most beautiful piece of UNIX design. Because exec inherits fds and fd 1 is just an integer, shells do not need a special "run this program and redirect" syscall. They fork, rearrange fds in the child, and exec. The kernel has no idea redirection happened.

Every shell pipeline -- every process-to-process data flow in UNIX -- runs on this pattern.

Concrete Example

A program that implements ls | wc -l from scratch:

#include <sys/wait.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

int main(void) {
int p[2];
if (pipe(p) < 0) { perror("pipe"); exit(1); }

pid_t c1 = fork();
if (c1 == 0) {
dup2(p[1], STDOUT_FILENO); /* ls writes to pipe instead of terminal */
close(p[0]); close(p[1]);
char *a[] = {"ls", NULL};
execvp("ls", a);
_exit(127);
}

pid_t c2 = fork();
if (c2 == 0) {
dup2(p[0], STDIN_FILENO); /* wc reads from pipe instead of terminal */
close(p[0]); close(p[1]);
char *a[] = {"wc", "-l", NULL};
execvp("wc", a);
_exit(127);
}

/* CRITICAL: parent must close both ends of the pipe.
Otherwise wc never sees EOF because the pipe still has a writer. */
close(p[0]); close(p[1]);

waitpid(c1, NULL, 0);
waitpid(c2, NULL, 0);
return 0;
}

The comment on the parent's close is the line most beginners forget. A pipe reader gets EOF when the last writer closes the write end. If the parent keeps p[1] open, wc blocks forever in read.

A simpler case -- cmd > out.txt:

pid_t pid = fork();
if (pid == 0) {
int fd = open("out.txt", O_WRONLY | O_CREAT | O_TRUNC, 0644);
dup2(fd, STDOUT_FILENO);
close(fd);
execvp("ls", (char*[]){"ls", "-l", NULL});
_exit(127);
}
waitpid(pid, NULL, 0);

Common Confusion / Misconception

"I dup2'd to stdout but I also need to close the original fd I was dup'ing from." Yes, and this is where most "why does my pipe hang?" bugs live. The counting model to hold:

A pipe reader sees EOF when and only when every fd that refers to the pipe's write end is closed. The kernel reference-counts them.

Every fork duplicates the full fd table. If you have a pipe and you fork, both processes now have both ends. Unless every process closes the ends it does not use, the pipe will have at least one writer alive, and the reader will wait forever.

Another trap: "dup and dup2 copy the file." They do not copy the file, they copy the fd. Both fds share the same underlying open file description, including the file offset. Writing to fd 3 advances the same offset that fd 7 sees, if they were dup'd from each other.

How To Use It

When implementing any redirection or pipeline, apply this recipe:

  1. Before fork: create the pipe(s) and/or open the file(s).
  2. fork.
  3. In each child, dup2 the right source onto STDIN_FILENO / STDOUT_FILENO / STDERR_FILENO.
  4. Close every fd the child does not need -- including both ends of the pipe if the child uses only one end.
  5. exec.
  6. In the parent, close every fd the parent does not need -- especially both ends of any pipe.
  7. waitpid children in an order that does not deadlock (usually any order is fine once all fds are closed).

Check Yourself

  1. Why does the parent in a pipe pipeline have to close both ends, even though it did not use them?
  2. What is the difference between dup(fd) and dup2(fd, 7)?
  3. What would happen in the ls | wc example if c1 forgot to close p[0]?

Mini Drill or Application

Extend the pipeline program above. Do all four:

  1. Change the pipeline to ls | grep c | wc -l. You will need two pipes. Draw the fd table for each of the three children before exec.
  2. Add 2>err.txt to ls in the original pipeline (redirect stderr to a file but keep stdout going to the pipe).
  3. Deliberately forget one of the parent's close calls. Run it under strace -f and find the line where wc's read blocks.
  4. In one sentence, explain why shells do not need a "run with redirection" syscall.

Read This Only If Stuck