Skip to main content

Control Flow at the Machine Level: Branches, Calls, and Returns

What This Concept Is

At the ISA level, control flow is the set of instructions that change PC to something other than the next instruction:

  • unconditional jumps -- jmp target / j target
  • conditional branches -- je, jne, jl, jg, jz (x86) or beq, bne, blt, bge (RISC-V)
  • direct calls -- call target pushes the return address and jumps
  • indirect calls -- call *%rax jumps through a register (used by vtables, function pointers)
  • returns -- ret pops a return address from the stack and jumps to it

Conditional branches depend on flags (x86) or on a register comparison in the branch opcode itself (RISC-V). A "taken" branch changes PC to the target; a "not-taken" branch falls through.

Why It Matters Here

Branches are where performance most often leaks out. A modern CPU prefetches and speculatively executes past them; a mispredict forces it to throw away 10-20 cycles of work. A call-return pair walks the stack; a mismatched pair corrupts it. A function pointer that cannot be resolved statically becomes an indirect call that the predictor may or may not cope with.

You will also see control flow at the bottom of every if, for, and while. Compiled loops look like "straight line code, back-branch" -- and the branch predictor's view of that back-branch shapes the loop's throughput.

Concrete Example

int max(int a, int b) { return a > b ? a : b; }

compiles (with -O2) to a branch-free cmov sequence on x86_64:

max:
mov %edi, %eax # eax = a
cmp %esi, %edi # a - b, set flags
cmovl %esi, %eax # if a < b, eax = b
ret

No jump is needed. But:

void loop(int *a, int n) {
for (int i = 0; i < n; ++i) a[i]++;
}

becomes:

loop:
test %esi, %esi
jle .Ldone # skip if n <= 0
movslq %esi, %rsi
lea (%rdi,%rsi,4), %rdx # end pointer
.L1:
addl $1, (%rdi)
add $4, %rdi
cmp %rdx, %rdi
jne .L1 # back-edge, taken (n-1) times
.Ldone:
ret

The jne at .L1 is strongly predicted taken. It mispredicts only once (when the loop exits), which is why well-structured loops are fast.

A function call looks like:

    ...                           # arguments in rdi, rsi, ...
call foo # push return address, jump to foo
... # resume here; foo's result is in rax

and the matching ret inside foo pops the address back.

Common Confusion / Misconception

"Branches are slow." Branches are only slow when they mispredict. A loop that iterates a million times with a taken back-edge has essentially zero branch cost -- the predictor gets it right every time. Random, data-dependent branches (e.g. if (a[i] > threshold) on sorted vs unsorted data) are the ones that cause mispredicts and visible slowdowns. This is the entire moral of Mike Acton's sorted-vs-unsorted microbenchmark.

Another trap: ret does not magically know where to return. It reads the top of the stack. If anyone has corrupted the return address, ret will happily jump into whatever bytes are there -- the root of many exploits and most "random" crashes.

How To Use It

  1. When you see cmp a, b followed by a conditional jump, map it back to the C comparison. Watch the sense (signed vs unsigned; jl uses signed flags, jb uses carry).
  2. When you see a long if/else chain, ask whether a cmov or a jump table would be faster -- compilers choose both.
  3. For loops, find the back-edge branch and ask: is it taken most of the time? If yes, the predictor handles it. If no, it is a perf hazard.
  4. For virtual dispatch, find the indirect call (call *(%rax) or jalr). That is where a polymorphic call really happens.

Check Yourself

  1. What are the three operations a call instruction performs atomically?
  2. Why does ret not need an explicit target operand?
  3. How does the compiler turn a ternary expression into branch-free code on x86?
  4. Why is a tight, counted loop's back-edge essentially free, predictor-wise?

Mini Drill or Application

For each C fragment, decide whether the compiler will likely use a jump, a conditional move, or a jump table, and check in Compiler Explorer at -O2:

int abs3(int x) { return x < 0 ? -x : x; }
int select(int flag, int a, int b) { return flag ? a : b; }
int classify(int c) {
switch (c) { case 1: return 10; case 2: return 20; case 3: return 30;
case 4: return 40; default: return -1; }
}

Also trace, by hand, a call foo followed by a ret -- show exactly what %rsp and (%rsp) look like before the call, inside foo, and after the ret.

Read This Only If Stuck