The Translation Pipeline
What This Concept Is
When you run gcc hello.c -o hello, several distinct programs run in sequence. Collectively they turn text on disk into a runnable executable. The steps:
- Preprocessor (
cpp) - resolves#include, expands#define, handles#if/#ifdef. Input:.cfile. Output: a single "translation unit" of pure C text. - Compiler (
cc1) - parses C, type-checks, and emits assembly for the target architecture. Input: preprocessed C. Output:.sassembly file. - Assembler (
as) - converts assembly into machine code in a relocatable object file. Input:.s. Output:.o(ELF on Linux, Mach-O on macOS, COFF on Windows). - Linker (
ld) - combines one or more.ofiles with libraries, resolves symbol references, assigns final addresses, and writes an executable. Input:.ofiles plus libraries. Output: executable.
GCC hides all four steps behind one command, but each has observable output if you ask for it.
Why It Matters Here
Almost every practical C problem lives at one of these stages:
- missing
#include- preprocessor - type error or wrong format specifier - compiler
- "undefined reference to
foo" - linker - "shared library not found" at runtime - dynamic linker
- segfault - runtime, usually from an undefined-behavior bug the compiler was not required to catch
If you cannot name which stage is complaining, you cannot fix the error efficiently.
Concrete Example
Save this as hello.c:
#include <stdio.h>
#define GREETING "hello, world"
int main(void) {
puts(GREETING);
return 0;
}
Then:
gcc -E hello.c -o hello.i- stop after preprocessing.hello.icontains all of<stdio.h>inline plus yourmain, withGREETINGsubstituted.gcc -S hello.c -o hello.s- stop after compilation.hello.sis assembly text.gcc -c hello.c -o hello.o- stop after assembly.hello.ois a relocatable object.gcc hello.o -o hello- link.hellois the executable../helloprintshello, world.
Every one of those commands is part of the default gcc hello.c -o hello. You can always ask to stop earlier.
Common Confusion / Misconception
"The compiler finds all my errors." It does not. The compiler only sees one translation unit at a time. A missing definition of a function you declared is a linker error, not a compiler error. The linker does not type-check; it only resolves symbols by name.
"Header files are imported." They are textually inserted. The preprocessor literally pastes the file contents. That is why header guards exist: without them, double inclusion produces duplicate definitions.
How To Use It
When a build fails, first identify the stage:
- Message mentions an
#includeor undeclared macro - preprocessor. - Message mentions a type, a missing prototype, or a line number in your source - compiler.
- Message says
undefined referenceormultiple definition- linker. - Message at runtime says
error while loading shared libraries- dynamic linker.
Then fix the problem at that stage.
Check Yourself
- What is a translation unit, and how many does a project with 5
.cfiles have? - Why can a program compile cleanly but fail to link?
- Which stage is responsible for finding
printf's code? The compiler or the linker?
Mini Drill or Application
Write a tiny hello.c and run, in order:
gcc -E hello.c -o hello.iand inspect withwc -l hello.i. Explain the line count.gcc -S hello.c -o hello.sand read the first 20 lines ofhello.s.gcc -c hello.c -o hello.oand runnm hello.o- list the symbols and classify each.gcc hello.o -o helloand runfile hello.- Now write
bad.cthat calls a functionfancy()without defining it. Which command fails, and what does the error say?