Threads, Shared State, and Non-Atomic Operations
What This Concept Is
A thread is an independently scheduled stream of instructions that runs inside a process and shares the process's address space with every other thread in that process.
Two facts follow directly:
- if two threads touch the same variable, they are reading and writing the same memory cell
- the OS can preempt a thread at almost any instruction boundary
A third fact is less obvious and more dangerous: most "operations" that look atomic in source code are not atomic at the hardware level. counter = counter + 1 compiles to three steps on almost every ISA:
load counter -> registeradd 1 to registerstore register -> counter
Between any two of those steps the scheduler can switch to another thread on the same core, or another core can be executing its own load-modify-store on the same memory at the same moment.
Why It Matters Here
The rest of this module is built on this single observation. Locks exist because high-level operations decompose into multiple hardware steps. Memory barriers exist because even those steps can be reordered. Lock-free programming exists because we have hardware instructions that are atomic at that lower level.
If you stay at source-code granularity, you will not see the bug. You must read concurrent code at the interleaving level.
Concrete Example
Two threads, each incrementing a shared counter 1,000,000 times:
int counter = 0;
void *worker(void *arg) {
for (int i = 0; i < 1000000; i++) {
counter = counter + 1;
}
return NULL;
}
Expected final value: 2,000,000. Actual final value on a real machine: something smaller, often much smaller, and different every run.
The schedule that loses an update:
| Step | Thread A | Thread B | counter in memory |
|---|---|---|---|
| 1 | load counter (= 100) into regA | 100 | |
| 2 | load counter (= 100) into regB | 100 | |
| 3 | add 1, regA = 101 | 100 | |
| 4 | add 1, regB = 101 | 100 | |
| 5 | store regA -> counter | 101 | |
| 6 | store regB -> counter | 101 |
Two +1 operations both happened, but the counter only advanced by 1.
Common Confusion / Misconception
"If my test never prints a wrong value, there is no race."
A race is a property of the program and the memory model, not of the executions you happened to see. The compiler can reorder instructions, the CPU can reorder memory operations, and the scheduler chooses different interleavings every run. A data race that is "lucky today" can be catastrophic under load, on a different CPU, or after a compiler upgrade.
A second misconception: "this variable is just a flag, it is one byte, so writes to it are atomic." On most architectures single-byte loads and stores are indeed atomic, but that does not mean the update is atomic, that the visibility between cores is guaranteed, or that the compiler will not reorder it with surrounding code.
How To Use It
When you look at concurrent code:
- Identify every shared variable (globals, heap data reachable from multiple threads, captured references).
- For each access, ask what hardware-level steps it decomposes into.
- For each pair of accesses from different threads, ask whether one is a write.
- If yes, that pair is a candidate race. Either synchronize it or prove the memory model makes it safe.
If you cannot name the shared cells, you cannot reason about concurrency.
Check Yourself
- Why is
counter = counter + 1not one step at the hardware level? - If you never observe the bug during testing, why might the race still exist?
- Name three kinds of state that are shared between threads in a single process.
Mini Drill or Application
Write the following in C with pthreads, run it, and record the final value over 10 runs:
int counter = 0;
void *inc(void *_) {
for (int i = 0; i < 100000; i++) counter++;
return NULL;
}
Create two threads that both call inc. Expected value is 200000. Log the actual values. Then write one sentence explaining why the spread exists.