Skip to main content

Test-and-Set, Compare-and-Swap, and Lock Implementations

What This Concept Is

Locks are software objects built on top of atomic hardware instructions. The canonical constructions are:

TAS-based spinlock (test-and-set)

typedef struct { int held; } spinlock_t;

void spin_lock(spinlock_t *l) {
while (__atomic_test_and_set(&l->held, __ATOMIC_ACQUIRE)) {
/* spin */
}
}
void spin_unlock(spinlock_t *l) {
__atomic_clear(&l->held, __ATOMIC_RELEASE);
}

Correct but unfair: the thread that happens to see held == 0 first wins. Under contention it can starve waiters on other cores.

Ticket lock (fetch-and-add)

Every waiter takes a ticket; the lock broadcasts the currently-serving ticket. This gives FIFO fairness and bounded waiting.

typedef struct { unsigned ticket, serving; } ticket_t;

void ticket_lock(ticket_t *l) {
unsigned my = __atomic_fetch_add(&l->ticket, 1, __ATOMIC_RELAXED);
while (__atomic_load_n(&l->serving, __ATOMIC_ACQUIRE) != my) ;
}
void ticket_unlock(ticket_t *l) {
__atomic_fetch_add(&l->serving, 1, __ATOMIC_RELEASE);
}

CAS (compare-and-swap)

cas(addr, expected, new) atomically sets *addr = new if *addr == expected. It is the universal primitive: anything lock-free can be built from it. A lock using CAS:

while (!__atomic_compare_exchange_n(&l->held, &(int){0}, 1, false,
__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) ;

Queue-based locks (MCS, CLH)

Each waiter enqueues a local node and spins on its own cache line. When the predecessor releases, it writes to the successor's node. Because each waiter spins on a separate cache line, the lock scales far better under heavy contention.

Why It Matters Here

Locks are not magic. They are short programs. Reading them teaches you:

  • that fairness is an explicit design choice
  • that cache-line contention is a real lock-implementation concern
  • that memory ordering is half of every correct lock
  • that CAS is the primitive every lock-free algorithm is built on

When you understand how to build a lock, you understand what a lock is costing you.

Concrete Example

Compare TAS and ticket under 16 contending threads doing 1M tiny increments each:

  • TAS: throughput collapses; the cache line ping-pongs between whatever core happens to arbitrate next.
  • Ticket: higher throughput, predictable latency, but still a global counter (serving) that every waiter reads, so cache-line contention remains.
  • MCS: best scaling, because each waiter spins only on its local node.

The correctness is the same in all three. The difference is entirely in how the hardware coherence protocol handles the access pattern.

Common Confusion / Misconception

"CAS never fails when there is no contention." True, but that is the point of contention measurement: CAS retries under contention, and the retry loop is where performance disappears. Lock-free does not mean wait-free; starvation is still possible.

"Ticket locks are strictly better than TAS." Only under contention. Under no contention, ticket and TAS are indistinguishable in cost, and ticket uses two cache-line words instead of one.

How To Use It

When you need a lock:

  1. Prefer the platform's pthread_mutex_t or std::mutex. It is measured, tuned, and adaptive.
  2. Write your own only for pedagogy, kernel code, or measured hot paths where a specific lock variant wins.
  3. When writing your own, pick the variant that matches the contention profile: TAS for light contention, ticket for fair FIFO, MCS/CLH for heavy many-core contention.
  4. Always specify the memory ordering explicitly: acquire on lock, release on unlock.

Check Yourself

  1. Why does a TAS spinlock starve cores under heavy contention?
  2. What invariant does the ticket lock maintain between ticket and serving?
  3. Why do MCS locks scale better than ticket locks on many cores?

Mini Drill or Application

Implement a ticket lock in C using __atomic_fetch_add and __atomic_load_n. Protect a shared counter with it and verify correctness with two threads. Then instrument: count how many iterations of the spin loop each thread does under contention. Describe the pattern.

Read This Only If Stuck