Skip to main content

Module 1: Processes & Scheduling

Primary text: Operating Systems: Three Easy Pieces (OSTEP)
Selective support: Operating System Concepts (Silberschatz et al.) for real-time scheduling, multi-processor scheduling detail, and case-study depth

This guide is the primary teacher. You do not need to read the source books front-to-back to complete this module. You do need to become operationally strong at reasoning about processes, scheduling policies, context-switch cost, and the tradeoffs between fairness, latency, and throughput.


Scope of This Module

This module is not a tour of Linux internals. It is where the CPU-virtualization illusion becomes something you can defend.

What it covers in depth:

  • the process abstraction: address space, kernel state, and lifecycle
  • the process control block (PCB), process states, and kernel bookkeeping
  • creating processes with fork, replacing them with exec, and the process tree
  • FCFS, SJF, SRTF as baseline policies and why SJF minimizes average turnaround
  • round-robin, preemption, and the time-quantum tradeoff between response time and overhead
  • Multi-Level Feedback Queue (MLFQ) and how Linux CFS extends the same fairness problem
  • scheduler metrics: turnaround, response time, throughput, fairness, and why they conflict
  • priorities, nice values, and priority inversion
  • real-time scheduling: hard versus soft, earliest-deadline-first (EDF), rate-monotonic (RMS)
  • context switching mechanics, the register save/restore dance, and trap/return
  • the hidden cost of context switching: TLB flush, cache cold-start, branch predictor pollution
  • threads versus processes and what they share versus isolate
  • multi-core scheduling, per-CPU queues, work stealing, and affinity
  • Linux CFS internals: red-black tree ordering by virtual runtime
  • containers, cgroups, and scheduling isolation at the control-group level

What it deliberately does not try to finish here:

  • memory management and virtual memory (Module 2)
  • synchronization primitives and deadlock (Module 3)
  • file systems and I/O scheduling (Module 4)
  • network-side request handling (Module 5)

This is where the "CPU is a shared, virtualized resource" claim becomes operational.


Before You Start

Answer these closed-book before starting the main path:

  1. What information does a kernel need to save before it stops running one process and starts another?
  2. Why does running N processes on 1 CPU require a scheduler at all, and what is the scheduler deciding each time it runs?
  3. If two processes each want the CPU and one is "short" while the other is "long", why does running the short one first improve average turnaround?
  4. What is the difference between an interrupt and a system call, and which one can trigger a context switch?
  5. Why is creating a thread typically cheaper than creating a process?

Diagnostic Interpretation

4-5 solid answers

  • You are ready for the full path.

2-3 solid answers

  • Continue, but expect extra time in the context-switch and multi-core clusters.

0-1 solid answers

  • Revisit Semester 4 (Systems Programming) briefly for the mode-switch and trap mechanics this module assumes.

What This Module Is For

Scheduling and process management is the language in which later systems claims are made precise. Across this semester and the rest of the program you will repeatedly be asked:

  • why is this service's tail latency bad even though average throughput looks fine?
  • why does doubling the thread count hurt performance instead of helping?
  • why does the same workload cost more CPU inside a container than on bare metal?
  • what is the kernel actually doing between two printf calls from different processes?
  • which of fork, thread, or async task should I reach for, and why?

This module builds the scheduling intuition needed for:

  • concurrency, synchronization, and deadlock (Module 3)
  • I/O and disk scheduling (Module 4)
  • network service performance and request handling (Module 5)
  • observability work: interpreting CPU, run-queue length, context-switch rate
  • container runtimes, cgroup limits, and noisy-neighbor debugging in cloud semesters

You are learning to reason about shared CPU without handwaving.


Concept Map


How To Use This Module

Work in order. The later clusters only make sense if the earlier mental models are stable.

Cluster 1: What a Process Is

OrderConceptTypeFocus
1The Process Abstraction: Address Space, Registers, PCPRIMARYWhat the OS virtualizes when it gives you a process
2The Process Control Block and Kernel StatePRIMARYThe bookkeeping the kernel keeps per process
3Creating Processes: fork, exec, and Process TreesPRIMARYHow Unix separates "make a copy" from "replace the image"

Cluster mastery check: Can you name every piece of state the kernel must track per process and explain why fork and exec are separate calls?

Cluster 2: CPU Scheduling Policies

OrderConceptTypeFocus
4FCFS, SJF, SRTF, and Why SJF Minimizes Average TurnaroundPRIMARYThree baseline policies with worked Gantt charts
5Round-Robin and the Time-Quantum TradeoffPRIMARYPreemption, time slices, and the response vs overhead tradeoff
6MLFQ and How Linux CFS Builds on ItPRIMARYLearning from history and generalizing fairness

Cluster mastery check: Given arrival and burst times, can you draw the Gantt chart, compute turnaround and response time, and justify which policy matches which workload?

Cluster 3: Scheduler Metrics and Fairness

OrderConceptTypeFocus
7Turnaround, Response Time, Throughput, and FairnessPRIMARYWhat we are actually measuring and why metrics conflict
8Priorities, Nice Values, and Priority InversionSUPPORTINGHow static preference interacts with locks
9Real-Time Scheduling: Hard vs Soft, EDF, RMSSUPPORTINGDeadlines, schedulability, and the real-time family

Cluster mastery check: Can you describe a workload where minimizing average turnaround makes tail latency worse, and pick a metric to prioritize with a reason?

Cluster 4: Context Switching and Overhead

OrderConceptTypeFocus
10How a Context Switch Actually WorksPRIMARYSave, switch stack, restore, iret
11The Hidden Cost: TLB, Cache, and Branch PredictorPRIMARYWhy a switch costs more than the register save itself
12Threads vs Processes: Shared vs Isolated StatePRIMARYWhat cloning an address space buys and what it costs

Cluster mastery check: Can you estimate the cost of a context switch in microseconds and explain why measuring just register save/restore understates it?

Cluster 5: Scheduling in Modern Systems

OrderConceptTypeFocus
13Multi-Core Scheduling and Load BalancingPRIMARYPer-CPU run queues, migration, affinity, work stealing
14Linux CFS Internals: Red-Black Tree and Virtual RuntimeSUPPORTINGHow CFS keeps fairness in O(log n)
15Containers, cgroups, and Scheduling IsolationSUPPORTINGGrouping processes so fairness is per-group, not per-process

Cluster mastery check: Given two containers with different CPU shares on a four-core host, can you predict what happens under contention and explain why?

Then work these practice pages:

OrderPractice pathFocus
1Scheduling Policy LabHand-executing FCFS, SJF, SRTF, RR, and MLFQ with metrics
2Process Model WorkshopAddress space, PCB, fork, exec, and process states
3Context-Switch ClinicSwitch cost, threads vs processes, CFS, and cgroup reasoning
4Code KatasImplement a mini scheduler, measure context-switch cost, trace fork via strace

Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources only for targeted reinforcement.


Learning Objectives

By the end of this module you should be able to:

  1. Describe the process abstraction completely, listing address space, registers, PCB, and kernel-owned state.
  2. Explain the separation between fork and exec and walk a simple shell through the sequence.
  3. Hand-execute FCFS, SJF, SRTF, round-robin, and MLFQ on a small workload, producing Gantt charts and metrics.
  4. Prove informally why SJF minimizes average turnaround time for a given set of arrivals.
  5. Pick a time quantum with a written justification for a given target tradeoff.
  6. Distinguish turnaround, response time, throughput, and fairness, and recognize when they conflict.
  7. Explain priority inversion and one mitigation (priority inheritance) with a concrete example.
  8. State the EDF and RMS feasibility tests for real-time workloads on a uniprocessor.
  9. Walk through a context switch step by step, including what happens to TLB and cache, and estimate its cost.
  10. Compare threads and processes with a table of what is shared and what is isolated.
  11. Describe per-CPU run queues, load balancing, and CPU affinity in a multi-core Linux scheduler.
  12. Explain CFS's virtual runtime and why a red-black tree gives O(log n) pick-next.
  13. Explain cgroup CPU shares and predict how two containers compete under contention.

Outputs

  • a process-lifecycle notebook with hand-traced fork + exec sequences for at least 5 shells or pipelines
  • one Gantt-chart worksheet with at least 6 workloads hand-scheduled by FCFS, SJF, SRTF, RR, and MLFQ, with metrics computed
  • one mini scheduler implementation (Python or C) that simulates at least three policies against the same arrival/burst list
  • one context-switch cost measurement, using getrusage, perf, or lmbench, with at least three runs and an interpretation note
  • one strace -f log of a shell running a pipeline, annotated to show fork, execve, wait, and I/O syscalls
  • one short memo on threads versus processes, with a table of shared state and an example application decision
  • one cgroup experiment: two cpu-limited containers contending on a single core, with the observed scheduling behavior
  • one mistake log naming at least 10 scheduling and process errors (forgot to wait on child, fork bomb, priority inversion without inheritance, no affinity for NUMA, time quantum too small, etc.)

Completion Standard

You have completed Module 1 when all of these are true:

  • you can draw the PCB from memory, including pointers into scheduling structures
  • you can compute turnaround, response time, and waiting time for any Gantt chart
  • you can pick a policy for a workload and defend the choice in one paragraph
  • you have measured context-switch cost on a real Linux machine and can explain what drove the number
  • you have written or read a scheduler that implements at least round-robin and MLFQ
  • you can describe a priority-inversion scenario and its mitigation without reference
  • you can explain why moving from "bare Linux" to "Linux inside a cgroup-limited container" changes scheduling without changing the scheduler

If you can recite definitions but cannot compute a Gantt chart or explain a context switch step by step, the module is not complete.


Reading Policy

  • Concept pages are the main path.
  • Local book chunks are selective reinforcement, not a second syllabus.
  • Read only if stuck means try the concept page, self-check, and drill first.
  • Optional deep dive means additional nuance or exercise volume, not required progression.
  • Because scheduling intuition is used throughout the rest of the semester, written arguments (not only numerical answers) are required.

Suggested Weekly Flow

DayWork
1Concepts 1-3, write a one-page description of the process abstraction from memory
2Concepts 4-6, hand-schedule one workload under all five policies
3Concepts 7-9, write a one-paragraph metric-conflict example
4Concepts 10-12, walk through a context switch in sentences and estimate its cost
5Concepts 13-15, read one LWN article on Linux CFS
6Practice 1 + 2
7Practice 3 + 4 (katas)
8Quiz, mistake-log cleanup, and cgroup experiment

Reference

If you need exact links into the local chunked books, use Reference and Selective Reading.


Rich Learning Pages

Worked Examples | Guided Labs | Case Studies | Mistake Clinic | Reading Guide | Capstone Thread