Context Switch Clinic

A clinic for context-switch mechanics, hidden costs, and the threads-vs-processes tradeoff. This is where the hand-wavy parts of scheduling meet measurable CPU-level detail.

Retrieval Prompts

List, in order, what the CPU and kernel do during a process-to-process context switch on x86-64 Linux.
State what the TLB contains and why flushing it matters.
List the "hidden" components of context-switch cost beyond register save/restore.
State what is shared and what is isolated between threads of the same process.
Define clone() flags that make the difference between "thread" and "process" semantics.

Compare and Distinguish

Separate these pairs:

direct cost versus indirect cost of a context switch
process switch versus thread switch (same process)
kernel-thread switch versus user-thread switch
voluntary yield versus preemption
a timer interrupt versus a system call as a trigger for a switch
TLB flush versus cache flush

Common Mistake Check

Identify the error:

"A context switch just saves and restores registers."
"Switching between threads of the same process costs the same as switching between processes."
"An ASID-capable CPU never has to flush the TLB on a context switch."
"Interrupts are the only way to force a context switch."
"The scheduler picks the next task before the trap into the kernel."
"Threads share the stack."

Mini Application

Walk the switch

For each scenario, walk through the switch in 6-10 steps, labeling each step with (a) where it happens (user, kernel, hardware), (b) which hardware resource is touched:

A timer interrupt fires in the middle of a user-space computation and the scheduler decides to switch.
A process calls read() on an empty socket.
A higher-priority thread wakes up in the same process.
A process calls sched_yield().

Estimate costs

Given:

register save/restore: ~100 ns
kernel trap + iret: ~300 ns (with mitigations enabled)
TLB flush (no PCID): ~500 ns, plus 10-100 ns per subsequent page fault to repopulate
L1 cache cold start: ~30 cycles × ~few hundred lines worth of misses ≈ 10 µs in the worst case
branch predictor warm-up: ~1000 mispredictions × ~10 ns ≈ 10 µs

Estimate the total wall-clock cost of:

Process -> process switch, different address space, worst case cache/TLB.
Thread -> thread switch, same process, warm caches.
Process -> process switch with PCID/ASID enabled.

State what changes in each case.

Threads versus processes decision

For each design, pick "threads" or "processes" and justify in two sentences:

A web server that handles thousands of concurrent requests, each mostly I/O.
A Chromium-style browser with one tab per site for security isolation.
A build system that runs gcc on thousands of small files.
A scientific simulation on one NUMA node, sharing a 200 GB dataset.
An embedded control plane with a strict requirement that one faulty component cannot corrupt another.

Evidence Check

This page is complete only if:

You can walk a context switch from user-mode instruction through kernel scheduler back to a different user process.
You can estimate context-switch cost in nanoseconds within an order of magnitude and explain which step dominates.
You can justify, for any concurrency problem, a choice between threads and processes with isolation and cost reasoning.
You can explain why a process switch generally costs more than a thread switch.

Retrieval Prompts​

Compare and Distinguish​

Common Mistake Check​

Mini Application​

Walk the switch​

Estimate costs​

Threads versus processes decision​

Evidence Check​