From C to Assembly: Reading Disassembly and Recognizing Patterns

What This Concept Is

Compilers do not translate C line-by-line. They map C constructs -- loops, conditionals, field accesses, function calls -- onto idioms in assembly. Once you know those idioms you can read almost any optimized output.

The core idioms to internalize:

C construct	Typical x86_64 idiom
local variable in register	`%rax`, `%rbx`, ... (no memory reference)
spilled local on the stack	`-0x8(%rbp)`, `-0x10(%rbp)`, ...
array index `a[i]` with `int`	`mov (%rdi,%rsi,4), %eax`
struct field `p->next` at offset 8	`mov 0x8(%rdi), %rax`
`if (x == 0)`	`test %eax, %eax; je .Lfalse`
`for (i = 0; i < n; ++i)`	`xor %ecx, %ecx; .L1: ...; inc %ecx; cmp %edx, %ecx; jl .L1`
function call	`mov`...; `call foo`; `mov %rax, ...`

Why It Matters Here

Disassembly is the ground truth. When a profile says "90% of the time is in this function," you need to be able to look at the instructions and say what is slow: a mispredicted branch, a stalled load, a bad inlining decision. Guessing from source is how performance myths spread.

Concrete Example

long sum(const long *a, int n) {
    long s = 0;
    for (int i = 0; i < n; ++i) s += a[i];
    return s;
}

With gcc -O2 for x86_64 this becomes roughly:

sum:
    test    %esi, %esi              # n == 0?
    jle     .Lempty                 # yes: return 0
    mov     %esi, %eax              # eax = n
    lea     (%rdi,%rax,8), %rdx     # rdx = &a[n]  (end pointer)
    xor     %eax, %eax              # s = 0
.Lloop:
    add     (%rdi), %rax            # s += *p
    add     $8, %rdi                # p++
    cmp     %rdx, %rdi              # p != end?
    jne     .Lloop
    ret
.Lempty:
    xor     %eax, %eax
    ret

Every line maps to something. The compiler converted the index i into a pointer walk (%rdi) and compared against an end pointer (%rdx) -- that is idiomatic loop strength reduction. The return value comes back in %rax.

Common Confusion / Misconception

"This assembly has more instructions than my C, so the compiler is bad." Instruction count is the wrong metric. A well-compiled tight loop may look verbose because the compiler unrolled it, software-pipelined it, or split it into vector and scalar paths. What matters is instructions per cycle, cache behaviour, and branch predictability -- not the line count.

Also: at -O0 the output is full of redundant loads and stores because every local lives on the stack. Never judge a compiler's quality from -O0. Always read -O2 at minimum.

How To Use It

Recommended workflow for studying a function:

Paste it into Compiler Explorer with -O2 -std=c11 -march=native (or -march=rv64g for RISC-V).
Enable "Filter: Intel syntax" or "AT&T" to match what you are used to.
Map the source lines to assembly chunks. Colouring in Compiler Explorer helps.
Identify the loop body. Count loads, stores, and arithmetic ops per iteration -- that is your rough CPI floor.
Look at the prologue and epilogue only if you suspect an ABI issue; otherwise skim them.

For already-built binaries, use objdump -d -M intel binary | less, or gdb with disassemble function and layout asm.

Check Yourself

Why does gcc -O2 often replace for (i=0; i<n; ++i) sum += a[i] with a pointer-compare loop?
What does test %eax, %eax; je .L compute?
How do you recognize a struct field access in disassembly?
Why is -O0 a misleading baseline for understanding what the compiler actually does?

Mini Drill or Application

For each C snippet, predict the assembly shape (loads, stores, branches, and register moves), then verify in Compiler Explorer:

int is_even(int x) { return (x & 1) == 0; }
int first_nonzero(const int *a, int n);
long dot(const double *a, const double *b, int n);
struct node { int v; struct node *next; };
int list_length(struct node *h);

Write a one-paragraph annotation for each disassembly, pointing out the idioms.

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconception​

How To Use It​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​