Skip to main content

Lock Granularity: Coarse vs Fine-Grained

What This Concept Is

Lock granularity is the size of the state protected by a single lock.

  • Coarse-grained locking uses one lock for a large region of state (for example, one mutex for an entire hash table).
  • Fine-grained locking uses many locks for disjoint pieces of state (for example, one mutex per hash bucket, or per node in a linked list).

Coarse is simple and always safe, but serializes all access. Fine is concurrent but multiplies correctness obligations: you must pick a consistent lock order across all operations, and you must handle structural changes that cross lock boundaries (splits, rehashes, splice operations).

Intermediate strategies exist: lock striping (a fixed number of locks covering buckets by hash), range locks, and hand-over-hand locking for linked structures.

Why It Matters Here

Granularity is where correctness meets scalability. A correct-but-slow hash table is easy; a correct-and-scalable one is a real design.

The two classic failure modes:

  • Coarse locking that does not scale. A web cache behind one global mutex bottlenecks at one core regardless of load.
  • Fine locking that is wrong. Two operations that look local (for example, delete(x) and insert(y)) can interleave during a rebalance and leave the structure inconsistent.

Understanding granularity lets you read real systems: Linux kernel spinlocks, Java's ConcurrentHashMap, Postgres page locks.

Concrete Example

A concurrent linked list, node-by-node.

Coarse: one mutex for the whole list. All traversals and modifications serialize. Throughput is one operation at a time.

Hand-over-hand:

// find and remove node with key k
pthread_mutex_lock(&head->m);
node_t *pred = head;
pthread_mutex_lock(&head->next->m);
node_t *cur = head->next;
while (cur && cur->key != k) {
pthread_mutex_unlock(&pred->m);
pred = cur;
pthread_mutex_lock(&cur->next->m);
cur = cur->next;
}
/* pred->m and cur->m are both held; safe to unlink cur */

This lets multiple threads traverse concurrently as long as they do not overlap. The invariant that protects correctness: a thread holds locks on both the predecessor and current node before unlinking. Without the predecessor lock, another thread could concurrently remove pred and leave the list broken.

Common Confusion / Misconception

"Finer is always faster." False. Fine-grained locking adds per-lock overhead (acquisition cost, cache-line footprint) and acquisition-order complexity. For low-contention workloads, one mutex is often strictly faster. Measure.

"If my locks never deadlock in testing, the granularity is fine." Deadlocks in fine-grained locking emerge from rare cross-operation interleavings. Lock-order invariants must be established by construction, not by testing.

Another misconception: "Read-only operations do not need to lock." They do, unless the data structure is explicitly designed for lock-free reads and the memory model provides the right ordering. Otherwise you are reading partially written nodes.

How To Use It

When designing a concurrent data structure:

  1. Start with one global mutex. Verify correctness first.
  2. Measure contention. If the global lock is not the bottleneck, stop.
  3. If it is, partition state into disjoint regions with natural boundaries (buckets, shards, ranges).
  4. Define a total order over locks. Document it. Every operation acquires locks in that order.
  5. For operations that cross regions (for example, move between buckets), acquire both locks in order; never optimistically release one mid-operation.

Check Yourself

  1. Why does one mutex per node in a linked list not automatically give you more throughput?
  2. What invariant does hand-over-hand locking maintain between predecessor and current node?
  3. When does lock striping outperform per-bucket locking?

Mini Drill or Application

Implement a concurrent hash table two ways:

  1. One global mutex.
  2. One mutex per bucket (lock striping).

Measure throughput at 1, 2, 4, and 8 threads doing 50/50 read/write workloads. Record where the global-mutex version stops scaling. Write one paragraph explaining the crossover.

Read This Only If Stuck