Module 2: Sorting, Searching & Fundamental Structures
Primary texts: The Algorithm Design Manual (Skiena) for problem-driven intuition and Introduction to Algorithms (CLRS) for proofs and cost analysis
Selective support: Algorithms (Sedgewick) for implementation texture; Grokking Algorithms for first-pass framing; Competitive Programming for exercise volume
This guide is the primary teacher. You do not need to read the source books front-to-back to complete this module. You do need to become fluent in sorting, searching, hashing, and heap-based priority queues -- both as algorithms and as a vocabulary for expressing cost tradeoffs.
Scope of This Module
This module is not a catalog of every sort ever invented. It is where you learn to reason about why a particular operation costs what it does and when to pay that cost.
What it covers in depth:
- the three comparison sorts that matter operationally: insertion, merge, quick
- the
Omega(n log n)lower bound proved through decision trees - linear-time specialty sorts (counting, radix) and the assumptions they pay for their speed with
- external and practical sorting -- what real libraries actually do
- binary search as an invariant-driven template, not a trick
- selection and order statistics in linear time (quickselect, median-of-medians)
- arrays vs linked lists vs trees vs hash tables as a choice about access patterns
- hashing as a randomized model: expected
O(1), universal hashing, load factor, collision resolution - binary heaps as implicit trees and the priority-queue API they enable
- heapsort and the pattern-by-pattern use of priority queues in simulation, Dijkstra, top-K, median maintenance, and
k-way merge
What it deliberately does not try to finish here:
- balanced search trees beyond a mention (Module 5 handles them)
- graph algorithms beyond naming Dijkstra as a PQ client (Module 3)
- amortized analysis beyond the resize of a hash table (Module 5)
- advanced heaps such as Fibonacci heaps (Module 5)
This is a foundations module. If you can implement these algorithms but cannot explain their cost profiles in a sentence, you are not done.
Before You Start
Answer these closed-book before starting the main path:
- Why does binary search require random access to be
O(log n)? - What makes mergesort
O(n log n)rather thanO(n^2)? Specifically, what is the recurrence? - What does "stable sort" mean, and when do you need it?
- Why is the
O(1)guarantee for hash-table lookup "expected," not worst case? - In a max-heap stored as an array, how do you compute the parent of the node at index
i?
Diagnostic Interpretation
4-5 solid answers
- You are ready for the full path.
2-3 solid answers
- Continue, but expect extra time in the lower-bound and hashing clusters.
0-1 solid answers
- Revisit Module 1 recurrence analysis and the Semester 1 probability indicator-variable techniques before proceeding.
What This Module Is For
Sorting, searching, hashing, and priority queues are the plumbing of almost every non-trivial algorithm. Later work repeatedly asks:
- how do I find one element or one order statistic out of many?
- how do I organize data so that insert, lookup, and delete are all fast?
- how do I keep track of the next-most-important item in a dynamic collection?
- which algorithm costs what on which input distribution?
- when does a library default hide a performance or correctness trap?
This module trains the reasoning needed for:
- graph algorithms (Module 3) that use PQs for Dijkstra, Prim, A*
- dynamic programming (Module 4) that often needs sorted input or fast lookup
- advanced structures (Module 5) that generalize the "tree in an array" and amortization ideas introduced here
- systems, database, and compiler work that depend on cache-aware sorting and hashing
You are learning to reason about data-handling cost precisely and pick the right tool for a job.
Concept Map
How To Use This Module
Work in order. Later clusters depend on earlier habits -- in particular the invariant-writing reflex from Cluster 1 and the probability-of-collision reasoning from Cluster 4.
Cluster 1: Comparison-Based Sorting and the Lower Bound
| Order | Concept | Type | Focus |
|---|---|---|---|
| 1 | Insertion Sort Teaches Lessons That Survive | SUPPORTING | Invariants, adaptivity, and why library sorts switch to insertion sort on small subarrays |
| 2 | Mergesort Is the Divide-and-Conquer Exemplar | PRIMARY | Recurrence T(n) = 2T(n/2) + Theta(n), stability, and guaranteed O(n log n) |
| 3 | Quicksort and Why Randomization Helps | PRIMARY | Partition, expected O(n log n), pivot choice, adversarial vs unlucky inputs |
| 4 | Comparison Sorting Has an Omega(n log n) Lower Bound | PRIMARY | Decision trees, log2(n!), and what counts as a "comparison" |
Cluster mastery check: Can you write a loop invariant, derive Theta(n log n) from the mergesort recurrence, and explain why no comparison-based sort can do asymptotically better?
Cluster 2: Linear-Time and Specialized Sorting
| Order | Concept | Type | Focus |
|---|---|---|---|
| 5 | Counting Sort and When Integer Keys Break the Lower Bound | SUPPORTING | Theta(n + k) histogram sort, stability, and why it does not violate the lower bound |
| 6 | Radix Sort and Linear in Key Width | SUPPORTING | LSD vs MSD, the real meaning of "linear," and how radix plugs into counting sort |
| 7 | External and Practical Sorting | SUPPORTING | Introsort, Timsort, external merge sort, and which library sort your language really runs |
Cluster mastery check: Can you pick the right sort given the input shape, key structure, and memory budget -- and justify why the textbook asymptotics alone do not decide?
Cluster 3: Searching in Sorted and Unsorted Data
| Order | Concept | Type | Focus |
|---|---|---|---|
| 8 | Binary Search and Its Invariant | PRIMARY | The correctness template that generalizes to lower_bound, upper_bound, and predicate-based search |
| 9 | Selection and Order Statistics in O(n) | PRIMARY | Quickselect, median-of-medians, and the partition primitive as a general tool |
| 10 | Search-Data-Structure Choice | SUPPORTING | Array vs linked list vs balanced BST vs hash table, by access pattern |
Cluster mastery check: Can you state the invariant for a binary search before writing any code, and can you match a storage shape to its operation-cost profile?
Cluster 4: Hash Tables as a Dictionary Primitive
| Order | Concept | Type | Focus |
|---|---|---|---|
| 11 | Hashing as a Randomized Model | PRIMARY | Expected O(1) as a probabilistic guarantee, and what breaks it |
| 12 | Collision Resolution: Chaining vs Open Addressing | PRIMARY | Two schemes, their load-factor behavior, and deletion traps |
| 13 | Universal Hashing, Load Factor, and Resizing | SUPPORTING | Universal families, cryptographic vs non-cryptographic hashing, amortized resizes |
Cluster mastery check: Can you explain why hash-table O(1) is expected rather than worst case, and can you configure a table to behave well under adversarial input?
Cluster 5: Heaps and Priority Queues
| Order | Concept | Type | Focus |
|---|---|---|---|
| 14 | The Binary Heap as an Implicit Tree | PRIMARY | Array-backed heap layout, sift-up/down, and the O(n) build argument |
| 15 | Heapsort and the Priority-Queue API | PRIMARY | In-place O(n log n) sort, and the PQ operations all graph algorithms depend on |
| 16 | Priority-Queue Applications | SUPPORTING | Event simulation, Dijkstra relaxation, top-K, median maintenance, k-way merge |
Cluster mastery check: Can you spot a problem whose loop is "pick the best remaining element, do work, possibly produce new elements," and translate it to a heap-backed priority queue?
Then work these practice pages:
| Order | Practice path | Focus |
|---|---|---|
| 1 | Sorting Correctness and Analysis Lab | Invariants, complexity derivations, stability, and complexity-distinguishing drills |
| 2 | Searching and Selection Workshop | Binary search variants, quickselect, predicate-based search |
| 3 | Hashing and Priority-Queue Clinic | Collision reasoning, universal hashing, heap operations, PQ applications |
| 4 | Code Katas | Timed implementations: mergesort, quicksort, heapsort, hash table, binary search, k-smallest |
Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources only for targeted reinforcement.
Learning Objectives
By the end of this module you should be able to:
- Write insertion sort, mergesort, and quicksort from scratch with correct loop invariants.
- Derive the
Theta(n log n)cost of mergesort and the expectedO(n log n)of randomized quicksort. - Prove the
Omega(n log n)comparison-sort lower bound via decision trees. - Implement counting sort and radix sort, and explain the assumptions that let them beat the lower bound.
- Choose the right sorting algorithm given input shape, key structure, memory budget, and stability needs.
- Implement binary search and its variants (
lower_bound,upper_bound, predicate-based) using the invariant template. - Implement quickselect and describe when linear-time selection beats sorting.
- Implement a hash table with chaining and with open addressing, and explain load-factor tradeoffs.
- Implement a binary-heap-backed priority queue with
insert,extract_min,peek, and understand howdecrease_keyis typically supported. - Identify priority-queue patterns in real problems: simulation, shortest path, top-K, median maintenance,
k-way merge.
Outputs
- an algorithms notebook or repo with tested implementations of insertion, merge, quick, counting, radix, and heap sorts
- a binary search library covering
search,lower_bound,upper_bound, and one predicate-based variant, each with a written invariant - a quickselect implementation and one worked example comparing it to sorting-then-indexing
- a hand-rolled hash table with chaining and with linear probing, plus a written load-factor study
- a priority-queue implementation with at least three application demos (Dijkstra on a small graph, top-K, median maintenance)
- one benchmark note comparing library sort, your own mergesort, and your own quicksort on random, sorted, and reverse-sorted inputs
- one mistake log naming at least 10 real errors such as
off-by-one in binary search,assumed stability,forgot tombstones on delete,confused peek and extract,miscounted recurrence levels - one short memo explaining how Module 2 tools feed into graph algorithms (Module 3) and advanced structures (Module 5)
Completion Standard
You have completed Module 2 when all of these are true:
- you can write a correct sort from memory and state its invariant in one sentence
- you can derive the
Theta(n log n)recurrence for mergesort without looking it up - you can explain the
Omega(n log n)bound using the decision-tree argument - you can write binary search and at least one variant correctly on the first try
- you can explain why hash-table
O(1)is "expected" rather than worst case, and what breaks the expectation - you can implement a binary heap and the basic PQ operations, and you recognize priority-queue shaped problems
- you can argue why a particular library sort is or is not the right choice for a given workload
If the code works but you cannot justify the cost or the correctness in words, the module is not complete.
Reading Policy
- Concept pages are the main path.
- Local book chunks are selective reinforcement, not a second syllabus.
Read only if stuckmeans try the concept page, self-check, and drill first.Optional deep divemeans additional nuance or exercise volume, not required progression.- External URLs appear only in
resources.mdunder "Read-If-Curious"; concept pages never include raw web links.
Suggested Weekly Flow
| Day | Work |
|---|---|
| 1 | Concepts 1-2 and one hand-traced mergesort with a written invariant |
| 2 | Concepts 3-4 and one randomized-quicksort implementation + decision-tree sketch |
| 3 | Concepts 5-7 and a counting/radix sort implementation on integer input |
| 4 | Concepts 8-10 and binary search + lower_bound written from scratch |
| 5 | Concepts 11-13 and a chaining hash table implemented end-to-end |
| 6 | Concepts 14-16 and a binary-heap PQ with a median-maintenance demo |
| 7 | Practice pages, code katas, quiz, and mistake-log cleanup |
Reference
If you need exact links into the local chunked books, use Reference and Selective Reading.
Rich Learning Pages
Worked Examples | Guided Labs | Case Studies | Mistake Clinic | Reading Guide | Capstone Thread
Model Artifact Calibration
For benchmark calibration, compare your measurements and decision writeup to the algorithm benchmark note model artifact.