Skip to main content

Module 1: Processes & Scheduling: Case Studies

These case studies make CPU virtualization visible: runnable queues, fairness, latency, priority inversion, context-switch cost, and cgroup isolation.


Case Study 1: Web Worker Starved By CPU Batch Job

Scenario: A batch analytics process runs beside a latency-sensitive web worker. The web worker is technically "up" but p99 latency spikes whenever the batch job starts.

Source anchor: Linux kernel CFS Scheduler documents CFS fairness and virtual runtime.

Module concepts: CFS, virtual runtime, latency, fairness, nice value.

Wrong Approach

"The machine has free CPU on average, so scheduling is fine."

Better Approach

Measure runnable latency and separate classes of work:

web worker:
latency-sensitive

batch job:
throughput-sensitive

control:
nice/cgroup CPU shares/separate pool

Tradeoff Table

ChoiceGainCost
same prioritysimplelatency spikes
lower batch priorityprotects webslower batch
separate hosts/poolsisolationcost
cgroup CPU controlcontainer-friendlytuning required

Failure Mode

The batch job accumulates CPU time faster than the web worker can recover interactive latency, so requests queue even though total CPU utilization still looks acceptable.

Project / Capstone Connection

Use this when shaping CPU policy for a web API, worker system, or student platform where background jobs and request handlers share the same node.

Required Artifact

Write a scheduler diagnosis: runnable tasks, latency symptom, policy knob, and before/after metric.


Case Study 2: Container CPU Limit Misread

Scenario: A service is limited to one CPU in a container. The process has eight worker threads and spends time throttled, causing tail latency.

Source anchor: Linux kernel cgroup v2 documents CPU quotas, throttling, and controller behavior.

Module concepts: cgroups, CPU quota, throttling, threads, container isolation.

Wrong Approach

"Eight threads means eight CPUs."

Better Approach

Match concurrency to quota:

CPU quota:
1 core equivalent

Worker count:
tune to CPU-bound vs I/O-bound workload

Metric:
throttled time, run queue latency, p99

Tradeoff Table

ChoiceGainCost
keep 8 CPU-bound threadseasy defaultheavy throttling
match threads to quotasteadier latencylower peak parallelism
separate CPU-bound and I/O-bound poolsbetter fit by workloadmore tuning

Failure Mode

The container hits its quota early in each period, then spends time throttled while runnable threads wait, which inflates tail latency.

Project / Capstone Connection

Apply this in containerized services, CI runners, or sidecar-heavy deployments where thread count is configured separately from cgroup CPU limits.

Required Artifact

Create a cgroup CPU report with quota, worker count, throttling, and tuning decision.


Case Study 3: Priority Inversion In A Real-Time Path

Scenario: A high-priority audio thread waits on a lock held by a low-priority logging thread. A medium-priority CPU task keeps running, delaying the lock holder.

Source anchor: POSIX real-time scheduling and Linux priority-inheritance mutexes are the practical anchor. See pthread mutex protocols.

Module concepts: priority inversion, real-time scheduling, mutex, priority inheritance.

Wrong Approach

"The high-priority thread always runs first."

Better Approach

Avoid shared locks on real-time paths or use priority inheritance:

audio thread:
no blocking I/O
bounded lock hold time
priority-inheritance mutex if unavoidable

Tradeoff Table

ChoiceGainCost
ordinary mutexsimpleinversion risk
priority inheritance mutexbounds inversionadded complexity
remove shared lock from RT pathstrongest latency controlredesign effort

Failure Mode

The low-priority lock holder cannot run long enough to release the mutex because a medium-priority task keeps preempting it, so the high-priority thread misses timing goals.

Project / Capstone Connection

This maps directly to robotics, media, or control-system capstones where one timing-critical thread shares data with background logging or telemetry.

Required Artifact

Draw the three-thread schedule and identify where priority inheritance changes it.


Case Study 4: Fork/Exec In A Request Path

Scenario: A server shells out to an external command on each request. Under load, process creation overhead and zombie cleanup become visible.

Source anchor: Linux process APIs are documented in man pages such as fork(2), execve(2), and wait(2).

Module concepts: fork, exec, wait, process lifecycle, context switch.

Wrong Approach

Treat child process creation as a free function call.

Better Approach

Choose the execution model:

rare admin task:
fork/exec acceptable

hot request path:
long-lived worker pool or library call

must fork:
timeout, wait/reap, resource limits

Tradeoff Table

ChoiceGainCost
fork/exec per requestprocess isolationhigh overhead
worker poollower latencylifecycle management
library call in-processfastest pathless isolation

Failure Mode

Per-request process creation amplifies context-switch cost, process-table churn, and zombie buildup until throughput collapses under load.

Project / Capstone Connection

Use this when evaluating whether a capstone service should shell out to tools, media converters, or scripts inside a request path.

Required Artifact

Write a process lifecycle trace: parent, child, exec, exit, wait, timeout, cleanup.


Case Study 5: Real-Time Deadline Miss

Scenario: A robotics control loop must run every 10 ms. A general-purpose scheduling policy misses deadlines under competing load.

Source anchor: Linux scheduler docs and POSIX scheduling APIs explain normal and real-time policy distinctions. See sched(7).

Module concepts: real-time scheduling, deadline, RMS/EDF intuition, preemption.

Wrong Approach

"Fast average runtime is enough."

Better Approach

Budget worst case:

period:
10 ms

worst-case execution:
measured under load

scheduling policy:
real-time if justified

failure:
safe fallback on missed deadline

Tradeoff Table

ChoiceGainCost
general-purpose schedulingsimple deploymentmissed deadlines under load
real-time policybetter deadline controlrisk to system fairness
reduce competing loadsafer timing marginless consolidation

Failure Mode

Average execution time looks fine, but a burst of competing work stretches worst-case latency beyond the 10 ms period and causes deadline misses.

Project / Capstone Connection

This fits embedded, robotics, or streaming capstones where periodic control or media tasks need bounded scheduling behavior.

Required Artifact

Write a deadline analysis with period, WCET, competing tasks, policy, and miss behavior.


Source Map

SourceUse it for
Linux CFS Schedulerfairness and virtual runtime
Linux cgroup v2resource control for processes/containers
pthread mutex protocolspriority inheritance
fork(2), execve(2), wait(2)process lifecycle
sched(7)Linux scheduling policies

Completion Standard

  • At least three artifacts are completed.
  • At least one artifact includes a schedule trace.
  • At least one artifact includes cgroup or priority tuning.