Concurrency and Debugging Clinic
Retrieval Prompts
- List what threads in a process share and what each has its own copy of.
- Write the canonical
pthread_cond_waitpattern with mutex, predicate, and thewhileloop. - Name two races the mutex prevents in a bounded-buffer queue and name one race the condition variable prevents.
- State, in one sentence, when
atomicis sufficient and when you must still take a mutex. - Describe what
gdbwatchdoes and when you would reach for it over a breakpoint.
Compare and Distinguish
- mutex vs atomic
signalvsbroadcast- breakpoint vs watchpoint
perf recordvsstrace- "spurious wake-up" vs "lost wake-up"
Common Mistake Check
- Writing
if (empty) cond_wait(...)instead ofwhile (empty) cond_wait(...). - Using
volatile int doneas a cross-thread stop flag and then wondering why the worker loops forever. - Signalling a condition variable after unlocking the mutex, and losing the wake-up.
- Passing
&localtopthread_createand returning from the enclosing function before the thread reads it. - Attempting to debug an
-O3build withgdband being surprised that half the locals are "value optimized out."
Mini Application: Producer-Consumer From Memory
Write, with no references, a producer-consumer program where:
- One producer emits the integers
1..10000. - One consumer sums them.
- The queue capacity is 8.
- The program exits by having the producer push a sentinel (e.g.,
-1) after the last value. - Final sum is printed and equals
50005000.
Walk through your own source and annotate each line of q_put and q_get with the race it prevents (mutex vs cond-wait vs signal).
Mini Application: Debug a Planted Race
Start from this deliberately buggy increment program:
#include <pthread.h>
#include <stdio.h>
long counter = 0;
void *bump(void *_) {
for (int i = 0; i < 1000000; i++) counter++;
return NULL;
}
int main(void) {
pthread_t t[4];
for (int i = 0; i < 4; i++) pthread_create(&t[i], NULL, bump, NULL);
for (int i = 0; i < 4; i++) pthread_join(t[i], NULL);
printf("%ld\n", counter); /* expected 4000000, actually less */
}
Tasks:
- Run it 20 times. Record the distribution of final values.
- Fix with
atomic_long. Re-run and confirm4000000every time. - Fix again with a
pthread_mutex_t(revert the atomic). Measure the runtime difference. - In a paragraph, explain which fix you would use in a real server that increments a metrics counter 10 M times per second.
Mini Application: Core-Dump Autopsy
Given this program:
#include <string.h>
void crash(char *dst, const char *src) { strcpy(dst, src); }
int main(void) {
char buf[8];
crash(buf, "this is way too long");
return 0;
}
- Enable core dumps (
ulimit -c unlimited), build with-g -O0, and run. - Open the core with
gdb ./crash core, produce a full backtrace. - In the
mainframe, printbufandsizeof(buf). Explain the discrepancy. - Rebuild with
-fsanitize=address, rerun. Quote the first four lines of the ASan report and point at the line number that caused the overflow.
Mini Application: strace the Hang
Take any program that uses a mutex. Deliberately forget to pthread_mutex_unlock, causing a deadlock. Run it under strace -f -p <pid>. Identify the line of output that shows the thread stuck in futex_wait. Fix the unlock and re-run.
Scenarios
- A multi-threaded web cache sometimes returns the wrong URL's body to the client. Under
helgrind, one read ofcache[url]is unprotected. Why does this matter if writes are protected? - A queue uses
cond_signalperputand one consumer. Throughput is fine. A team adds three more consumers; throughput collapses. Diagnose. - A program is correct under
-O0and wrong under-O2. The symptom is a stale read of a shared flag. What is the root cause and the fix? - A
gdbsession showscounter = 0repeatedly, but the program printscounter = 4000000. The program is multi-threaded. Why mightgdbbe showing a different thread's local copy? perf record -gon a lock-heavy workload shows 60% of CPU in__lll_lock_wake. What does that tell you, and what are your three likely fixes?
Evidence Check
Complete when: your producer-consumer runs with 4 producers and 4 consumers without losing or duplicating any item, your planted-race fix scripts match the expected output across 20 runs, and you can read a core dump and name the offending line in under two minutes.