Lock Granularity: Coarse vs Fine-Grained
What This Concept Is
Lock granularity is the size of the state protected by a single lock.
- Coarse-grained locking uses one lock for a large region of state (for example, one mutex for an entire hash table).
- Fine-grained locking uses many locks for disjoint pieces of state (for example, one mutex per hash bucket, or per node in a linked list).
Coarse is simple and always safe, but serializes all access. Fine is concurrent but multiplies correctness obligations: you must pick a consistent lock order across all operations, and you must handle structural changes that cross lock boundaries (splits, rehashes, splice operations).
Intermediate strategies exist: lock striping (a fixed number of locks covering buckets by hash), range locks, and hand-over-hand locking for linked structures.
Why It Matters Here
Granularity is where correctness meets scalability. A correct-but-slow hash table is easy; a correct-and-scalable one is a real design.
The two classic failure modes:
- Coarse locking that does not scale. A web cache behind one global mutex bottlenecks at one core regardless of load.
- Fine locking that is wrong. Two operations that look local (for example,
delete(x)andinsert(y)) can interleave during a rebalance and leave the structure inconsistent.
Understanding granularity lets you read real systems: Linux kernel spinlocks, Java's ConcurrentHashMap, Postgres page locks.
Concrete Example
A concurrent linked list, node-by-node.
Coarse: one mutex for the whole list. All traversals and modifications serialize. Throughput is one operation at a time.
Hand-over-hand:
// find and remove node with key k
pthread_mutex_lock(&head->m);
node_t *pred = head;
pthread_mutex_lock(&head->next->m);
node_t *cur = head->next;
while (cur && cur->key != k) {
pthread_mutex_unlock(&pred->m);
pred = cur;
pthread_mutex_lock(&cur->next->m);
cur = cur->next;
}
/* pred->m and cur->m are both held; safe to unlink cur */
This lets multiple threads traverse concurrently as long as they do not overlap. The invariant that protects correctness: a thread holds locks on both the predecessor and current node before unlinking. Without the predecessor lock, another thread could concurrently remove pred and leave the list broken.
Common Confusion / Misconception
"Finer is always faster." False. Fine-grained locking adds per-lock overhead (acquisition cost, cache-line footprint) and acquisition-order complexity. For low-contention workloads, one mutex is often strictly faster. Measure.
"If my locks never deadlock in testing, the granularity is fine." Deadlocks in fine-grained locking emerge from rare cross-operation interleavings. Lock-order invariants must be established by construction, not by testing.
Another misconception: "Read-only operations do not need to lock." They do, unless the data structure is explicitly designed for lock-free reads and the memory model provides the right ordering. Otherwise you are reading partially written nodes.
How To Use It
When designing a concurrent data structure:
- Start with one global mutex. Verify correctness first.
- Measure contention. If the global lock is not the bottleneck, stop.
- If it is, partition state into disjoint regions with natural boundaries (buckets, shards, ranges).
- Define a total order over locks. Document it. Every operation acquires locks in that order.
- For operations that cross regions (for example, move between buckets), acquire both locks in order; never optimistically release one mid-operation.
Check Yourself
- Why does one mutex per node in a linked list not automatically give you more throughput?
- What invariant does hand-over-hand locking maintain between predecessor and current node?
- When does lock striping outperform per-bucket locking?
Mini Drill or Application
Implement a concurrent hash table two ways:
- One global mutex.
- One mutex per bucket (lock striping).
Measure throughput at 1, 2, 4, and 8 threads doing 50/50 read/write workloads. Record where the global-mutex version stops scaling. Write one paragraph explaining the crossover.