Skip to main content

The ALU and How Numbers Are Added and Multiplied

What This Concept Is

The arithmetic-logic unit (ALU) is the part of the CPU that computes on fixed-width integers. For every cycle it is driven by control signals it produces one of a small set of results:

  • arithmetic: add, subtract, negate, compare, increment, decrement
  • logical: AND, OR, XOR, NOT
  • shifts: left, logical right, arithmetic right
  • flags: zero (Z), sign (S/N), carry (C), overflow (V/O)

Integers are stored in two's complement: the top bit is a sign, -x is ~x + 1, and the same binary adder handles signed and unsigned addition. Addition ripples carries through a chain of full-adders; multiplication is traditionally a sequence of shifts and adds, though modern ALUs use a multi-cycle hardware multiplier (often Booth or Wallace-tree) that produces a 64-bit product from two 32-bit operands in a handful of cycles.

Why It Matters Here

Nearly every instruction ultimately lands on the ALU. Understanding it lets you:

  • predict the cost of add vs mul vs div (one cycle, a few cycles, tens of cycles)
  • read flag-dependent branches (jz, jo, js, jc) without guessing
  • notice when a compiler converts x * 17 into (x << 4) + x because shifts and adds are cheaper than multiplies
  • reason about integer overflow, signed vs unsigned comparison, and the subtle bugs that come from mixing them

Concrete Example

Two's complement addition on an 8-bit ALU:

   01001101   (77)
+ 00110110 (54)
-----------
10000011 (-125 as signed, 131 as unsigned)
carry out = 0, overflow = 1 -> signed overflow occurred

The bits are the same; the interpretation changes. The ALU sets V=1 because two positives summed to a negative -- the signed result is wrong. For unsigned math the answer is fine and C=0.

Compare x == y via subtraction:

    sub     %rbx, %rax       # rax -= rbx; flags set from the result
je equal # jump if Z==1

There is no separate equality circuit; subtraction is the comparison, and the Z flag carries the answer.

Multiplication via shift-and-add (how the hardware does it for a 4x4):

  a = 1011  (11)
b = 0110 (6)

b bit 0 = 0 -> 0000
b bit 1 = 1 -> 10110 (a << 1)
b bit 2 = 1 -> 101100 (a << 2)
b bit 3 = 0 -> 0

sum = 1000010 = 66 ✔

Modern CPUs do all four partial products in parallel, then collapse them with a carry-save adder tree, so an integer multiply typically costs 3-5 cycles, not one.

Common Confusion / Misconception

"== is free but * is expensive." Equality still goes through the ALU (it is a subtract and a flag read), and on most modern cores it costs exactly the same as any other ALU op. Multiplication has higher latency than add but is not serial; the CPU can often issue an independent multiply every cycle. Integer division is the one that really hurts -- often 20-40 cycles and not pipelined.

How To Use It

  • When tuning inner loops, prefer add/sub/shift/AND/OR/XOR. Replace multiplies by constants with shifts and adds when the compiler will not.
  • Respect signed overflow in C: int overflow is undefined behaviour. Use unsigned when you truly want modular arithmetic.
  • For comparisons, remember that < on int uses the signed flags (S, V) while < on unsigned uses the carry flag (C). That is why mixing signed and unsigned in a < b can return the wrong answer.
  • For cost modelling, remember: add ~1 cycle, multiply ~3-5 cycles, divide ~20+ cycles, shifts by a constant ~1 cycle.

Check Yourself

  1. Why does the same adder work for both signed and unsigned addition?
  2. What do the Z, N, C, and V flags represent, and which instruction sets them?
  3. How does a cmp a, b followed by jl target actually decide whether to jump?
  4. Why does integer division cost so much more than multiplication on a modern CPU?

Mini Drill or Application

For each expression, list the ALU operations and flag reads you would expect in compiled x86_64 code:

  1. x == y
  2. x < y where both are int
  3. x < y where both are unsigned
  4. a * 9
  5. a / 7 where a is a non-constant unsigned long

Check in Compiler Explorer at -O2. Note the strength-reduction tricks the compiler uses for constant multiplies and divides (magic numbers and multi-shifts).

Read This Only If Stuck