Spinlocks vs Blocking Mutexes
What This Concept Is
A lock is the simplest mechanism that enforces mutual exclusion on a critical section. Two implementation strategies compete:
- Spinlock. The waiting thread stays on the CPU and tests the lock in a tight loop until it succeeds.
- Blocking mutex. The waiting thread is removed from the scheduler's run queue, put to sleep on a wait queue, and woken by the OS when the lock is released.
They look identical in source code (lock(m); ...; unlock(m);). They have opposite performance profiles:
| Dimension | Spinlock | Blocking mutex |
|---|---|---|
| Cost when uncontended | tiny (one atomic op) | tiny (one atomic op fast path) |
| Cost when held briefly | tiny (spin a few cycles) | large (syscall + context switch) |
| Cost when held long | terrible (wastes CPU) | fine (sleeps) |
| Works if holder can be preempted | no (livelock-prone) | yes |
| Usable in interrupt context / kernel | yes | usually no |
Most production libraries combine both: try a bounded spin first, then fall back to blocking. This is called an adaptive mutex or two-phase lock.
Why It Matters Here
Choosing the wrong lock type turns a correct program into a slow program. Choosing the wrong lock type in the kernel can deadlock the machine. The decision also affects how you write critical sections: under a spinlock you must never take a page fault, allocate memory, or make a blocking call.
Understanding the difference is also the only way to read kernel code, database engines, or high-performance runtimes, where both kinds of lock coexist.
Concrete Example
Think about a counter that is incremented thousands of times per second by many threads.
- If each increment takes a few nanoseconds of critical-section work, a spinlock is almost always better: the cost of a context switch (1-10 microseconds) dwarfs the spin.
- If the critical section involves a disk read, a spinlock is a disaster: all waiters burn CPU while the holder is blocked on I/O. A blocking mutex is required.
A real-world collapse: a web server used spinlocks for a cache protected critical section. One request happened to take a page fault mid-section. Other cores spun for the entire page-fault latency. Throughput fell to a single core's worth.
Common Confusion / Misconception
"Spinlocks are faster." Only when the critical section is short, the expected spin is short, and the holder cannot be preempted. Outside that envelope they are slower and sometimes catastrophic.
"pthread_mutex_t is a blocking mutex, so it never spins." Modern implementations (glibc, musl) implement the fast path with atomics and often spin briefly before sleeping. The classification is about behavior under contention, not about the API surface.
"If I am in user space I cannot use a spinlock." You can, but a preemption of the holder can make every other spinner waste a full time slice. In user space, prefer adaptive mutexes unless you have measured the workload.
How To Use It
Choose a spinlock only if all of the following hold:
- The critical section is short (microseconds, not milliseconds).
- The holder will not block, fault, or be preempted.
- Contention is low, or your spinlock is backoff-aware and queue-fair.
Otherwise choose a blocking mutex, which in practice means pthread_mutex_t, std::mutex, synchronized in Java, or the equivalent. Reach for adaptive two-phase locks when you have measurable contention with short holds.
Check Yourself
- Why does a spinlock that guards a disk read destroy throughput?
- Why is a context switch expensive compared to an atomic operation?
- Under what condition is a spinlock strictly better than a blocking mutex even in user space?
Mini Drill or Application
Implement two versions of the shared-counter increment workload:
- A spinlock using
__atomic_test_and_setwith a tight retry loop. - A blocking mutex using
pthread_mutex_t.
Run each with 2, 4, and 8 threads. Record throughput. Explain the cross-over point in one sentence.
Hybrid: Adaptive and Two-Phase Locks
Real mutexes are rarely pure. Linux pthread_mutex_t is an adaptive (or two-phase) lock: on contention it spins for a short, bounded number of iterations; if the lock is still not free, it falls back to a futex wait that parks the thread in the kernel. This captures the best of both: cheap acquire when the holder is about to finish, cheap waiting when the holder is long-running or blocked. The spin phase typically uses a relaxed-backoff loop (PAUSE / YIELD) so it does not saturate the memory bus. Java's synchronized and Go's sync.Mutex use the same pattern. When you implement your own lock, mimic this structure rather than choosing a single strategy.
Priority Inversion: A Gotcha Unique to Blocking Mutexes
A low-priority thread holds a mutex; a high-priority thread tries to acquire it and blocks; a medium-priority thread runs and preempts the low-priority holder indefinitely. The high-priority task is now blocked behind a medium-priority task via the mutex it was waiting on. This is priority inversion, and it famously caused the 1997 Mars Pathfinder reset bug. Fixes include priority inheritance (the holder temporarily inherits the highest priority of any waiter) and priority ceiling (the holder runs at a predeclared ceiling priority). Spinlocks do not have this problem, because the high-priority thread is on-CPU spinning and cannot be preempted by a medium-priority thread; this is one of the few arguments for spinlocks in real-time code.
The Kernel Rule: Spin Only
Inside an operating-system kernel the choice is often forced: you must use a spinlock because you cannot block. Interrupt handlers in particular run in a context where the scheduler is not available, so a blocking mutex would deadlock the machine. Linux enforces this with two separate APIs -- spin_lock (never sleeps; safe from any context) and mutex_lock (may sleep; forbidden in atomic context). Device drivers that misuse this rule produce the classic "BUG: scheduling while atomic" oops. The lesson for user-space code is the same in miniature: any code that must not block (signal handlers, real-time audio callbacks) must not take a blocking mutex either. Use a spinlock or make the shared state lock-free.