Skip to main content

The Translation Pipeline

What This Concept Is

When you run gcc hello.c -o hello, several distinct programs run in sequence. Collectively they turn text on disk into a runnable executable. The steps:

  1. Preprocessor (cpp) - resolves #include, expands #define, handles #if/#ifdef. Input: .c file. Output: a single "translation unit" of pure C text.
  2. Compiler (cc1) - parses C, type-checks, and emits assembly for the target architecture. Input: preprocessed C. Output: .s assembly file.
  3. Assembler (as) - converts assembly into machine code in a relocatable object file. Input: .s. Output: .o (ELF on Linux, Mach-O on macOS, COFF on Windows).
  4. Linker (ld) - combines one or more .o files with libraries, resolves symbol references, assigns final addresses, and writes an executable. Input: .o files plus libraries. Output: executable.

GCC hides all four steps behind one command, but each has observable output if you ask for it.

Why It Matters Here

Almost every practical C problem lives at one of these stages:

  • missing #include - preprocessor
  • type error or wrong format specifier - compiler
  • "undefined reference to foo" - linker
  • "shared library not found" at runtime - dynamic linker
  • segfault - runtime, usually from an undefined-behavior bug the compiler was not required to catch

If you cannot name which stage is complaining, you cannot fix the error efficiently.

Concrete Example

Save this as hello.c:

#include <stdio.h>
#define GREETING "hello, world"
int main(void) {
puts(GREETING);
return 0;
}

Then:

  • gcc -E hello.c -o hello.i - stop after preprocessing. hello.i contains all of <stdio.h> inline plus your main, with GREETING substituted.
  • gcc -S hello.c -o hello.s - stop after compilation. hello.s is assembly text.
  • gcc -c hello.c -o hello.o - stop after assembly. hello.o is a relocatable object.
  • gcc hello.o -o hello - link. hello is the executable.
  • ./hello prints hello, world.

Every one of those commands is part of the default gcc hello.c -o hello. You can always ask to stop earlier.

Common Confusion / Misconception

"The compiler finds all my errors." It does not. The compiler only sees one translation unit at a time. A missing definition of a function you declared is a linker error, not a compiler error. The linker does not type-check; it only resolves symbols by name.

"Header files are imported." They are textually inserted. The preprocessor literally pastes the file contents. That is why header guards exist: without them, double inclusion produces duplicate definitions.

How To Use It

When a build fails, first identify the stage:

  1. Message mentions an #include or undeclared macro - preprocessor.
  2. Message mentions a type, a missing prototype, or a line number in your source - compiler.
  3. Message says undefined reference or multiple definition - linker.
  4. Message at runtime says error while loading shared libraries - dynamic linker.

Then fix the problem at that stage.

Check Yourself

  1. What is a translation unit, and how many does a project with 5 .c files have?
  2. Why can a program compile cleanly but fail to link?
  3. Which stage is responsible for finding printf's code? The compiler or the linker?

Mini Drill or Application

Write a tiny hello.c and run, in order:

  1. gcc -E hello.c -o hello.i and inspect with wc -l hello.i. Explain the line count.
  2. gcc -S hello.c -o hello.s and read the first 20 lines of hello.s.
  3. gcc -c hello.c -o hello.o and run nm hello.o - list the symbols and classify each.
  4. gcc hello.o -o hello and run file hello.
  5. Now write bad.c that calls a function fancy() without defining it. Which command fails, and what does the error say?

Read This Only If Stuck