Skip to main content

C Is a Portable Assembler

What This Concept Is

C is usually described as "a high-level language," but for this module it is more useful to think of it as an abstract assembler for a machine that looks like a sequence of addressable bytes.

C exposes:

  • memory as a linear array of bytes with addresses
  • objects, which are regions of that memory with a type, a lifetime, and (usually) a name
  • operations that read and write those bytes using the object's type
  • well-defined, implementation-defined, unspecified, and undefined behavior, depending on how you use those operations

That last category is the one that separates C from friendlier languages. If you break the rules, the C standard does not say what happens. The compiler is allowed to assume you did not break them.

Why It Matters Here

Every later topic in this module and the next few modules depends on this mental model:

  • pointers are not "references to objects" in a Java sense; they are addresses with a type
  • arrays decay to pointers because the underlying memory does not know it was an array
  • integer promotion rules exist because C has to pick a width before the machine can execute an operation
  • security bugs in C almost always come from undefined behavior (buffer overruns, signed overflow, use after free), not from language features being hard to spell

If you only think in "variables that hold values," you will misread C code and blame the machine for bugs that are really contract violations.

Concrete Example

Consider:

#include <stdio.h>
int main(void) {
int x = 5;
int *p = &x;
*p = 7;
printf("%d\n", x);
return 0;
}

At the C level this is "change x through a pointer." At the abstract-machine level:

  • x is an object of type int, occupying sizeof(int) bytes at some address.
  • p is a different object, of type int *, holding the address of x.
  • *p = 7 writes through that address using type int.
  • printf("%d\n", x) reads x as an int and prints 7.

Now change one line to char *q = (char *)&x; *q = 0x7F;. This is still defined: writing bytes through a char * alias is allowed. But double *d = (double *)&x; *d = 0.0; is undefined: the strict aliasing rule forbids reading or writing an int object through an unrelated double *.

Common Confusion / Misconception

"Undefined behavior means it crashes." It does not. Undefined behavior means the standard imposes no requirement. The program can appear to work for years, then break when you change an unrelated line, upgrade the compiler, or build at a different optimization level.

"The machine will do what I wrote." Often yes, but optimizers reason against the abstract machine. If you wrote signed integer overflow, the compiler may assume the overflow never happens and delete a branch that would have handled it.

How To Use It

When you read or write C, mentally run the code against the abstract machine, not the CPU you are sitting in front of:

  1. Name each object: its type, its storage duration, and its address if relevant.
  2. For every read or write, confirm the type used matches the object's type (or is char-typed).
  3. When you do arithmetic, check that the result fits the declared type.
  4. When you call a function, mentally pass arguments by value unless the type is a pointer.

If any step is undefined behavior, stop and rewrite. Do not ship code that "only works at -O0."

Check Yourself

  1. What is the difference between an object and a variable in C?
  2. Give an example of implementation-defined behavior and an example of undefined behavior.
  3. Why can two compilers produce different output from the same C source and both be correct?

Mini Drill or Application

Read the following snippets and classify each as defined, implementation-defined, unspecified, or undefined. Justify each:

  1. int a = INT_MAX; a = a + 1;
  2. unsigned u = UINT_MAX; u = u + 1;
  3. int a[3] = {0}; int x = a[3];
  4. int i = 0; int j = i++ + i++;
  5. char c = 200; int x = c; (on a platform where char is signed)

Read This Only If Stuck