Skip to main content

From C to Assembly: Reading Disassembly and Recognizing Patterns

What This Concept Is

Compilers do not translate C line-by-line. They map C constructs -- loops, conditionals, field accesses, function calls -- onto idioms in assembly. Once you know those idioms you can read almost any optimized output.

The core idioms to internalize:

C constructTypical x86_64 idiom
local variable in register%rax, %rbx, ... (no memory reference)
spilled local on the stack-0x8(%rbp), -0x10(%rbp), ...
array index a[i] with intmov (%rdi,%rsi,4), %eax
struct field p->next at offset 8mov 0x8(%rdi), %rax
if (x == 0)test %eax, %eax; je .Lfalse
for (i = 0; i < n; ++i)xor %ecx, %ecx; .L1: ...; inc %ecx; cmp %edx, %ecx; jl .L1
function callmov...; call foo; mov %rax, ...

Why It Matters Here

Disassembly is the ground truth. When a profile says "90% of the time is in this function," you need to be able to look at the instructions and say what is slow: a mispredicted branch, a stalled load, a bad inlining decision. Guessing from source is how performance myths spread.

Concrete Example

long sum(const long *a, int n) {
long s = 0;
for (int i = 0; i < n; ++i) s += a[i];
return s;
}

With gcc -O2 for x86_64 this becomes roughly:

sum:
test %esi, %esi # n == 0?
jle .Lempty # yes: return 0
mov %esi, %eax # eax = n
lea (%rdi,%rax,8), %rdx # rdx = &a[n] (end pointer)
xor %eax, %eax # s = 0
.Lloop:
add (%rdi), %rax # s += *p
add $8, %rdi # p++
cmp %rdx, %rdi # p != end?
jne .Lloop
ret
.Lempty:
xor %eax, %eax
ret

Every line maps to something. The compiler converted the index i into a pointer walk (%rdi) and compared against an end pointer (%rdx) -- that is idiomatic loop strength reduction. The return value comes back in %rax.

Common Confusion / Misconception

"This assembly has more instructions than my C, so the compiler is bad." Instruction count is the wrong metric. A well-compiled tight loop may look verbose because the compiler unrolled it, software-pipelined it, or split it into vector and scalar paths. What matters is instructions per cycle, cache behaviour, and branch predictability -- not the line count.

Also: at -O0 the output is full of redundant loads and stores because every local lives on the stack. Never judge a compiler's quality from -O0. Always read -O2 at minimum.

How To Use It

Recommended workflow for studying a function:

  1. Paste it into Compiler Explorer with -O2 -std=c11 -march=native (or -march=rv64g for RISC-V).
  2. Enable "Filter: Intel syntax" or "AT&T" to match what you are used to.
  3. Map the source lines to assembly chunks. Colouring in Compiler Explorer helps.
  4. Identify the loop body. Count loads, stores, and arithmetic ops per iteration -- that is your rough CPI floor.
  5. Look at the prologue and epilogue only if you suspect an ABI issue; otherwise skim them.

For already-built binaries, use objdump -d -M intel binary | less, or gdb with disassemble function and layout asm.

Check Yourself

  1. Why does gcc -O2 often replace for (i=0; i<n; ++i) sum += a[i] with a pointer-compare loop?
  2. What does test %eax, %eax; je .L compute?
  3. How do you recognize a struct field access in disassembly?
  4. Why is -O0 a misleading baseline for understanding what the compiler actually does?

Mini Drill or Application

For each C snippet, predict the assembly shape (loads, stores, branches, and register moves), then verify in Compiler Explorer:

int is_even(int x) { return (x & 1) == 0; }
int first_nonzero(const int *a, int n);
long dot(const double *a, const double *b, int n);
struct node { int v; struct node *next; };
int list_length(struct node *h);

Write a one-paragraph annotation for each disassembly, pointing out the idioms.

Read This Only If Stuck