A 12 Preprocessing
This page is a generated reference surface for selective reading. It exists to keep the learner apps guide-first while still preserving source access.
Learning objectives
- Explain the main ideas and vocabulary in A 12 Preprocessing.
- Work through the source examples for A 12 Preprocessing without depending on raw chunk order.
- Use A 12 Preprocessing as selective reference when learner modules point back to The C Programming Language.
Prerequisites
- None curated yet.
Module targets
module-01-c-programming-fundamentals
AI companion modes
- Explain simply
- Socratic tutor
- Quiz me
- Challenge my understanding
- Diagnose my confusion
- Generate extra practice
- Revision mode
- Connect forward / backward
Source-of-truth note
This unit is anchored to The C Programming Language and the source chapter "A 12 Preprocessing". Use external resources only to clarify, extend, or modernize details without replacing the chapter's conceptual spine.
External enrichment
No chapter-specific enrichment resources are curated yet. Add them in the unit manifest when a source clearly improves learning.
Source provenance
- Primary source:
The C Programming Language - Source chapter: A 12 Preprocessing
- Raw source file:
075-a-12-preprocessing.md
Merged source
A 12 Preprocessing
A.12 Preprocessing
A preprocessor performs macro substitution, conditional compilation, and inclusion of named files. Lines beginning with #, perhaps preceded by white space, communicate with this preprocessor. The syntax of these lines is independent of the rest of the language; they may appear anywhere and have effect that lasts (independent of scope) until the end of the translation unit. Line boundaries are significant; each line is analyzed individually (bus see
Par.A.12.2 for how to adjoin lines). To the preprocessor, a token is any language token, or a character sequence giving a file name as in the #includedirective (Par.A.12.4); in addition, any character not otherwise defined is taken as a token. However, the effect of white spaces other than space and horizontal tab is undefined within preprocessor lines.
Preprocessing itself takes place in several logically successive phases that may, in a particular implementation, be condensed.
- First, trigraph sequences as described in Par.A.12.1 are replaced by their equivalents.
Should the operating system environment require it, newline characters are introduced between the lines of the source file.
- Each occurrence of a backslash character \ followed by a newline is deleted, this
splicing lines (Par.A.12.2).
- The program is split into tokens separated by white-space characters; comments are
replaced by a single space. Then preprocessing directives are obeyed, and macros (Pars.A.12.3-A.12.10) are expanded.
- Escape sequences in character constants and string literals (Pars. A.2.5.2, A.2.6) are
replaced by their equivalents; then adjacent string literals are concatenated.
- The result is translated, then linked together with other programs and libraries, by
collecting the necessary programs and data, and connecting external functions and object references to their definitions.
A.12.1 Trigraph Sequences
The character set of C source programs is contained within seven-bit ASCII, but is a superset of the ISO 646-1983 Invariant Code Set. In order to enable programs to be represented in the reduced set, all occurrences of the following trigraph sequences are replaced by the corresponding single character. This replacement occurs before any other processing.
??= # ??( [ ??< {
??/ \ ??) ] ??> }
??' ^ ??! | ??- ~
No other such replacements occur.
Trigraph sequences are new with the ANSI standard.
A.12.2 Line Splicing
Lines that end with the backslash character \ are folded by deleting the backslash and the following newline character. This occurs before division into tokens.