Module 1: Processes & Scheduling

Primary text: Operating Systems: Three Easy Pieces (OSTEP)
Selective support: Operating System Concepts (Silberschatz et al.) for real-time scheduling, multi-processor scheduling detail, and case-study depth

This guide is the primary teacher. You do not need to read the source books front-to-back to complete this module. You do need to become operationally strong at reasoning about processes, scheduling policies, context-switch cost, and the tradeoffs between fairness, latency, and throughput.

Scope of This Module

This module is not a tour of Linux internals. It is where the CPU-virtualization illusion becomes something you can defend.

What it covers in depth:

the process abstraction: address space, kernel state, and lifecycle
the process control block (PCB), process states, and kernel bookkeeping
creating processes with fork, replacing them with exec, and the process tree
FCFS, SJF, SRTF as baseline policies and why SJF minimizes average turnaround
round-robin, preemption, and the time-quantum tradeoff between response time and overhead
Multi-Level Feedback Queue (MLFQ) and how Linux CFS extends the same fairness problem
scheduler metrics: turnaround, response time, throughput, fairness, and why they conflict
priorities, nice values, and priority inversion
real-time scheduling: hard versus soft, earliest-deadline-first (EDF), rate-monotonic (RMS)
context switching mechanics, the register save/restore dance, and trap/return
the hidden cost of context switching: TLB flush, cache cold-start, branch predictor pollution
threads versus processes and what they share versus isolate
multi-core scheduling, per-CPU queues, work stealing, and affinity
Linux CFS internals: red-black tree ordering by virtual runtime
containers, cgroups, and scheduling isolation at the control-group level

What it deliberately does not try to finish here:

memory management and virtual memory (Module 2)
synchronization primitives and deadlock (Module 3)
file systems and I/O scheduling (Module 4)
network-side request handling (Module 5)

This is where the "CPU is a shared, virtualized resource" claim becomes operational.

Before You Start

Answer these closed-book before starting the main path:

What information does a kernel need to save before it stops running one process and starts another?
Why does running N processes on 1 CPU require a scheduler at all, and what is the scheduler deciding each time it runs?
If two processes each want the CPU and one is "short" while the other is "long", why does running the short one first improve average turnaround?
What is the difference between an interrupt and a system call, and which one can trigger a context switch?
Why is creating a thread typically cheaper than creating a process?

Diagnostic Interpretation

4-5 solid answers

You are ready for the full path.

2-3 solid answers

Continue, but expect extra time in the context-switch and multi-core clusters.

0-1 solid answers

Revisit Semester 4 (Systems Programming) briefly for the mode-switch and trap mechanics this module assumes.

What This Module Is For

Scheduling and process management is the language in which later systems claims are made precise. Across this semester and the rest of the program you will repeatedly be asked:

why is this service's tail latency bad even though average throughput looks fine?
why does doubling the thread count hurt performance instead of helping?
why does the same workload cost more CPU inside a container than on bare metal?
what is the kernel actually doing between two printf calls from different processes?
which of fork, thread, or async task should I reach for, and why?

This module builds the scheduling intuition needed for:

concurrency, synchronization, and deadlock (Module 3)
I/O and disk scheduling (Module 4)
network service performance and request handling (Module 5)
observability work: interpreting CPU, run-queue length, context-switch rate
container runtimes, cgroup limits, and noisy-neighbor debugging in cloud semesters

You are learning to reason about shared CPU without handwaving.

Concept Map

How To Use This Module

Work in order. The later clusters only make sense if the earlier mental models are stable.

Cluster 1: What a Process Is

Order	Concept	Type	Focus
1	The Process Abstraction: Address Space, Registers, PC	PRIMARY	What the OS virtualizes when it gives you a process
2	The Process Control Block and Kernel State	PRIMARY	The bookkeeping the kernel keeps per process
3	Creating Processes: fork, exec, and Process Trees	PRIMARY	How Unix separates "make a copy" from "replace the image"

Cluster mastery check: Can you name every piece of state the kernel must track per process and explain why fork and exec are separate calls?

Cluster 2: CPU Scheduling Policies

Order	Concept	Type	Focus
4	FCFS, SJF, SRTF, and Why SJF Minimizes Average Turnaround	PRIMARY	Three baseline policies with worked Gantt charts
5	Round-Robin and the Time-Quantum Tradeoff	PRIMARY	Preemption, time slices, and the response vs overhead tradeoff
6	MLFQ and How Linux CFS Builds on It	PRIMARY	Learning from history and generalizing fairness

Cluster mastery check: Given arrival and burst times, can you draw the Gantt chart, compute turnaround and response time, and justify which policy matches which workload?

Cluster 3: Scheduler Metrics and Fairness

Order	Concept	Type	Focus
7	Turnaround, Response Time, Throughput, and Fairness	PRIMARY	What we are actually measuring and why metrics conflict
8	Priorities, Nice Values, and Priority Inversion	SUPPORTING	How static preference interacts with locks
9	Real-Time Scheduling: Hard vs Soft, EDF, RMS	SUPPORTING	Deadlines, schedulability, and the real-time family

Cluster mastery check: Can you describe a workload where minimizing average turnaround makes tail latency worse, and pick a metric to prioritize with a reason?

Cluster 4: Context Switching and Overhead

Order	Concept	Type	Focus
10	How a Context Switch Actually Works	PRIMARY	Save, switch stack, restore, `iret`
11	The Hidden Cost: TLB, Cache, and Branch Predictor	PRIMARY	Why a switch costs more than the register save itself
12	Threads vs Processes: Shared vs Isolated State	PRIMARY	What cloning an address space buys and what it costs

Cluster mastery check: Can you estimate the cost of a context switch in microseconds and explain why measuring just register save/restore understates it?

Cluster 5: Scheduling in Modern Systems

Order	Concept	Type	Focus
13	Multi-Core Scheduling and Load Balancing	PRIMARY	Per-CPU run queues, migration, affinity, work stealing
14	Linux CFS Internals: Red-Black Tree and Virtual Runtime	SUPPORTING	How CFS keeps fairness in `O(log n)`
15	Containers, cgroups, and Scheduling Isolation	SUPPORTING	Grouping processes so fairness is per-group, not per-process

Cluster mastery check: Given two containers with different CPU shares on a four-core host, can you predict what happens under contention and explain why?

Then work these practice pages:

Order	Practice path	Focus
1	Scheduling Policy Lab	Hand-executing FCFS, SJF, SRTF, RR, and MLFQ with metrics
2	Process Model Workshop	Address space, PCB, `fork`, `exec`, and process states
3	Context-Switch Clinic	Switch cost, threads vs processes, CFS, and cgroup reasoning
4	Code Katas	Implement a mini scheduler, measure context-switch cost, trace `fork` via `strace`

Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources only for targeted reinforcement.

Learning Objectives

By the end of this module you should be able to:

Describe the process abstraction completely, listing address space, registers, PCB, and kernel-owned state.
Explain the separation between fork and exec and walk a simple shell through the sequence.
Hand-execute FCFS, SJF, SRTF, round-robin, and MLFQ on a small workload, producing Gantt charts and metrics.
Prove informally why SJF minimizes average turnaround time for a given set of arrivals.
Pick a time quantum with a written justification for a given target tradeoff.
Distinguish turnaround, response time, throughput, and fairness, and recognize when they conflict.
Explain priority inversion and one mitigation (priority inheritance) with a concrete example.
State the EDF and RMS feasibility tests for real-time workloads on a uniprocessor.
Walk through a context switch step by step, including what happens to TLB and cache, and estimate its cost.
Compare threads and processes with a table of what is shared and what is isolated.
Describe per-CPU run queues, load balancing, and CPU affinity in a multi-core Linux scheduler.
Explain CFS's virtual runtime and why a red-black tree gives O(log n) pick-next.
Explain cgroup CPU shares and predict how two containers compete under contention.

Outputs

a process-lifecycle notebook with hand-traced fork + exec sequences for at least 5 shells or pipelines
one Gantt-chart worksheet with at least 6 workloads hand-scheduled by FCFS, SJF, SRTF, RR, and MLFQ, with metrics computed
one mini scheduler implementation (Python or C) that simulates at least three policies against the same arrival/burst list
one context-switch cost measurement, using getrusage, perf, or lmbench, with at least three runs and an interpretation note
one strace -f log of a shell running a pipeline, annotated to show fork, execve, wait, and I/O syscalls
one short memo on threads versus processes, with a table of shared state and an example application decision
one cgroup experiment: two cpu-limited containers contending on a single core, with the observed scheduling behavior
one mistake log naming at least 10 scheduling and process errors (forgot to wait on child, fork bomb, priority inversion without inheritance, no affinity for NUMA, time quantum too small, etc.)

Completion Standard

You have completed Module 1 when all of these are true:

you can draw the PCB from memory, including pointers into scheduling structures
you can compute turnaround, response time, and waiting time for any Gantt chart
you can pick a policy for a workload and defend the choice in one paragraph
you have measured context-switch cost on a real Linux machine and can explain what drove the number
you have written or read a scheduler that implements at least round-robin and MLFQ
you can describe a priority-inversion scenario and its mitigation without reference
you can explain why moving from "bare Linux" to "Linux inside a cgroup-limited container" changes scheduling without changing the scheduler

If you can recite definitions but cannot compute a Gantt chart or explain a context switch step by step, the module is not complete.

Reading Policy

Concept pages are the main path.
Local book chunks are selective reinforcement, not a second syllabus.
Read only if stuck means try the concept page, self-check, and drill first.
Optional deep dive means additional nuance or exercise volume, not required progression.
Because scheduling intuition is used throughout the rest of the semester, written arguments (not only numerical answers) are required.

Suggested Weekly Flow

Day	Work
1	Concepts 1-3, write a one-page description of the process abstraction from memory
2	Concepts 4-6, hand-schedule one workload under all five policies
3	Concepts 7-9, write a one-paragraph metric-conflict example
4	Concepts 10-12, walk through a context switch in sentences and estimate its cost
5	Concepts 13-15, read one LWN article on Linux CFS
6	Practice 1 + 2
7	Practice 3 + 4 (katas)
8	Quiz, mistake-log cleanup, and cgroup experiment

Reference

If you need exact links into the local chunked books, use Reference and Selective Reading.

Rich Learning Pages

Scope of This Module​

Before You Start​

Diagnostic Interpretation​

What This Module Is For​

Concept Map​

How To Use This Module​

Cluster 1: What a Process Is​

Cluster 2: CPU Scheduling Policies​

Cluster 3: Scheduler Metrics and Fairness​

Cluster 4: Context Switching and Overhead​

Cluster 5: Scheduling in Modern Systems​

Learning Objectives​

Outputs​

Completion Standard​

Reading Policy​

Suggested Weekly Flow​

Reference​

Rich Learning Pages​