Module 1: Processes & Scheduling: Case Studies
These case studies make CPU virtualization visible: runnable queues, fairness, latency, priority inversion, context-switch cost, and cgroup isolation.
Case Study 1: Web Worker Starved By CPU Batch Job
Scenario: A batch analytics process runs beside a latency-sensitive web worker. The web worker is technically "up" but p99 latency spikes whenever the batch job starts.
Source anchor: Linux kernel CFS Scheduler documents CFS fairness and virtual runtime.
Module concepts: CFS, virtual runtime, latency, fairness, nice value.
Wrong Approach
"The machine has free CPU on average, so scheduling is fine."
Better Approach
Measure runnable latency and separate classes of work:
web worker:
latency-sensitive
batch job:
throughput-sensitive
control:
nice/cgroup CPU shares/separate pool
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| same priority | simple | latency spikes |
| lower batch priority | protects web | slower batch |
| separate hosts/pools | isolation | cost |
| cgroup CPU control | container-friendly | tuning required |
Failure Mode
The batch job accumulates CPU time faster than the web worker can recover interactive latency, so requests queue even though total CPU utilization still looks acceptable.
Project / Capstone Connection
Use this when shaping CPU policy for a web API, worker system, or student platform where background jobs and request handlers share the same node.
Required Artifact
Write a scheduler diagnosis: runnable tasks, latency symptom, policy knob, and before/after metric.
Case Study 2: Container CPU Limit Misread
Scenario: A service is limited to one CPU in a container. The process has eight worker threads and spends time throttled, causing tail latency.
Source anchor: Linux kernel cgroup v2 documents CPU quotas, throttling, and controller behavior.
Module concepts: cgroups, CPU quota, throttling, threads, container isolation.
Wrong Approach
"Eight threads means eight CPUs."
Better Approach
Match concurrency to quota:
CPU quota:
1 core equivalent
Worker count:
tune to CPU-bound vs I/O-bound workload
Metric:
throttled time, run queue latency, p99
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| keep 8 CPU-bound threads | easy default | heavy throttling |
| match threads to quota | steadier latency | lower peak parallelism |
| separate CPU-bound and I/O-bound pools | better fit by workload | more tuning |
Failure Mode
The container hits its quota early in each period, then spends time throttled while runnable threads wait, which inflates tail latency.
Project / Capstone Connection
Apply this in containerized services, CI runners, or sidecar-heavy deployments where thread count is configured separately from cgroup CPU limits.
Required Artifact
Create a cgroup CPU report with quota, worker count, throttling, and tuning decision.
Case Study 3: Priority Inversion In A Real-Time Path
Scenario: A high-priority audio thread waits on a lock held by a low-priority logging thread. A medium-priority CPU task keeps running, delaying the lock holder.
Source anchor: POSIX real-time scheduling and Linux priority-inheritance mutexes are the practical anchor. See pthread mutex protocols.
Module concepts: priority inversion, real-time scheduling, mutex, priority inheritance.
Wrong Approach
"The high-priority thread always runs first."
Better Approach
Avoid shared locks on real-time paths or use priority inheritance:
audio thread:
no blocking I/O
bounded lock hold time
priority-inheritance mutex if unavoidable
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| ordinary mutex | simple | inversion risk |
| priority inheritance mutex | bounds inversion | added complexity |
| remove shared lock from RT path | strongest latency control | redesign effort |
Failure Mode
The low-priority lock holder cannot run long enough to release the mutex because a medium-priority task keeps preempting it, so the high-priority thread misses timing goals.
Project / Capstone Connection
This maps directly to robotics, media, or control-system capstones where one timing-critical thread shares data with background logging or telemetry.
Required Artifact
Draw the three-thread schedule and identify where priority inheritance changes it.
Case Study 4: Fork/Exec In A Request Path
Scenario: A server shells out to an external command on each request. Under load, process creation overhead and zombie cleanup become visible.
Source anchor: Linux process APIs are documented in man pages such as fork(2), execve(2), and wait(2).
Module concepts: fork, exec, wait, process lifecycle, context switch.
Wrong Approach
Treat child process creation as a free function call.
Better Approach
Choose the execution model:
rare admin task:
fork/exec acceptable
hot request path:
long-lived worker pool or library call
must fork:
timeout, wait/reap, resource limits
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| fork/exec per request | process isolation | high overhead |
| worker pool | lower latency | lifecycle management |
| library call in-process | fastest path | less isolation |
Failure Mode
Per-request process creation amplifies context-switch cost, process-table churn, and zombie buildup until throughput collapses under load.
Project / Capstone Connection
Use this when evaluating whether a capstone service should shell out to tools, media converters, or scripts inside a request path.
Required Artifact
Write a process lifecycle trace: parent, child, exec, exit, wait, timeout, cleanup.
Case Study 5: Real-Time Deadline Miss
Scenario: A robotics control loop must run every 10 ms. A general-purpose scheduling policy misses deadlines under competing load.
Source anchor: Linux scheduler docs and POSIX scheduling APIs explain normal and real-time policy distinctions. See sched(7).
Module concepts: real-time scheduling, deadline, RMS/EDF intuition, preemption.
Wrong Approach
"Fast average runtime is enough."
Better Approach
Budget worst case:
period:
10 ms
worst-case execution:
measured under load
scheduling policy:
real-time if justified
failure:
safe fallback on missed deadline
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| general-purpose scheduling | simple deployment | missed deadlines under load |
| real-time policy | better deadline control | risk to system fairness |
| reduce competing load | safer timing margin | less consolidation |
Failure Mode
Average execution time looks fine, but a burst of competing work stretches worst-case latency beyond the 10 ms period and causes deadline misses.
Project / Capstone Connection
This fits embedded, robotics, or streaming capstones where periodic control or media tasks need bounded scheduling behavior.
Required Artifact
Write a deadline analysis with period, WCET, competing tasks, policy, and miss behavior.
Source Map
| Source | Use it for |
|---|---|
| Linux CFS Scheduler | fairness and virtual runtime |
| Linux cgroup v2 | resource control for processes/containers |
| pthread mutex protocols | priority inheritance |
| fork(2), execve(2), wait(2) | process lifecycle |
| sched(7) | Linux scheduling policies |
Completion Standard
- At least three artifacts are completed.
- At least one artifact includes a schedule trace.
- At least one artifact includes cgroup or priority tuning.