Context Switch Clinic
A clinic for context-switch mechanics, hidden costs, and the threads-vs-processes tradeoff. This is where the hand-wavy parts of scheduling meet measurable CPU-level detail.
Retrieval Prompts
- List, in order, what the CPU and kernel do during a process-to-process context switch on x86-64 Linux.
- State what the TLB contains and why flushing it matters.
- List the "hidden" components of context-switch cost beyond register save/restore.
- State what is shared and what is isolated between threads of the same process.
- Define
clone()flags that make the difference between "thread" and "process" semantics.
Compare and Distinguish
Separate these pairs:
- direct cost versus indirect cost of a context switch
- process switch versus thread switch (same process)
- kernel-thread switch versus user-thread switch
- voluntary yield versus preemption
- a timer interrupt versus a system call as a trigger for a switch
- TLB flush versus cache flush
Common Mistake Check
Identify the error:
- "A context switch just saves and restores registers."
- "Switching between threads of the same process costs the same as switching between processes."
- "An ASID-capable CPU never has to flush the TLB on a context switch."
- "Interrupts are the only way to force a context switch."
- "The scheduler picks the next task before the trap into the kernel."
- "Threads share the stack."
Mini Application
Walk the switch
For each scenario, walk through the switch in 6-10 steps, labeling each step with (a) where it happens (user, kernel, hardware), (b) which hardware resource is touched:
- A timer interrupt fires in the middle of a user-space computation and the scheduler decides to switch.
- A process calls
read()on an empty socket. - A higher-priority thread wakes up in the same process.
- A process calls
sched_yield().
Estimate costs
Given:
- register save/restore:
~100 ns - kernel trap + iret:
~300 ns(with mitigations enabled) - TLB flush (no PCID):
~500 ns, plus10-100 nsper subsequent page fault to repopulate - L1 cache cold start:
~30 cycles × ~few hundred linesworth of misses ≈10 µsin the worst case - branch predictor warm-up:
~1000 mispredictions × ~10 ns≈10 µs
Estimate the total wall-clock cost of:
- Process -> process switch, different address space, worst case cache/TLB.
- Thread -> thread switch, same process, warm caches.
- Process -> process switch with PCID/ASID enabled.
State what changes in each case.
Threads versus processes decision
For each design, pick "threads" or "processes" and justify in two sentences:
- A web server that handles thousands of concurrent requests, each mostly I/O.
- A Chromium-style browser with one tab per site for security isolation.
- A build system that runs gcc on thousands of small files.
- A scientific simulation on one NUMA node, sharing a 200 GB dataset.
- An embedded control plane with a strict requirement that one faulty component cannot corrupt another.
Evidence Check
This page is complete only if:
- You can walk a context switch from user-mode instruction through kernel scheduler back to a different user process.
- You can estimate context-switch cost in nanoseconds within an order of magnitude and explain which step dominates.
- You can justify, for any concurrency problem, a choice between threads and processes with isolation and cost reasoning.
- You can explain why a process switch generally costs more than a thread switch.