Module 2: Sorting, Searching & Fundamental Structures

Primary texts: The Algorithm Design Manual (Skiena) for problem-driven intuition and Introduction to Algorithms (CLRS) for proofs and cost analysis
Selective support: Algorithms (Sedgewick) for implementation texture; Grokking Algorithms for first-pass framing; Competitive Programming for exercise volume

This guide is the primary teacher. You do not need to read the source books front-to-back to complete this module. You do need to become fluent in sorting, searching, hashing, and heap-based priority queues -- both as algorithms and as a vocabulary for expressing cost tradeoffs.

Scope of This Module

This module is not a catalog of every sort ever invented. It is where you learn to reason about why a particular operation costs what it does and when to pay that cost.

What it covers in depth:

the three comparison sorts that matter operationally: insertion, merge, quick
the Omega(n log n) lower bound proved through decision trees
linear-time specialty sorts (counting, radix) and the assumptions they pay for their speed with
external and practical sorting -- what real libraries actually do
binary search as an invariant-driven template, not a trick
selection and order statistics in linear time (quickselect, median-of-medians)
arrays vs linked lists vs trees vs hash tables as a choice about access patterns
hashing as a randomized model: expected O(1), universal hashing, load factor, collision resolution
binary heaps as implicit trees and the priority-queue API they enable
heapsort and the pattern-by-pattern use of priority queues in simulation, Dijkstra, top-K, median maintenance, and k-way merge

What it deliberately does not try to finish here:

balanced search trees beyond a mention (Module 5 handles them)
graph algorithms beyond naming Dijkstra as a PQ client (Module 3)
amortized analysis beyond the resize of a hash table (Module 5)
advanced heaps such as Fibonacci heaps (Module 5)

This is a foundations module. If you can implement these algorithms but cannot explain their cost profiles in a sentence, you are not done.

Before You Start

Answer these closed-book before starting the main path:

Why does binary search require random access to be O(log n)?
What makes mergesort O(n log n) rather than O(n^2)? Specifically, what is the recurrence?
What does "stable sort" mean, and when do you need it?
Why is the O(1) guarantee for hash-table lookup "expected," not worst case?
In a max-heap stored as an array, how do you compute the parent of the node at index i?

Diagnostic Interpretation

4-5 solid answers

You are ready for the full path.

2-3 solid answers

Continue, but expect extra time in the lower-bound and hashing clusters.

0-1 solid answers

Revisit Module 1 recurrence analysis and the Semester 1 probability indicator-variable techniques before proceeding.

What This Module Is For

Sorting, searching, hashing, and priority queues are the plumbing of almost every non-trivial algorithm. Later work repeatedly asks:

how do I find one element or one order statistic out of many?
how do I organize data so that insert, lookup, and delete are all fast?
how do I keep track of the next-most-important item in a dynamic collection?
which algorithm costs what on which input distribution?
when does a library default hide a performance or correctness trap?

This module trains the reasoning needed for:

graph algorithms (Module 3) that use PQs for Dijkstra, Prim, A*
dynamic programming (Module 4) that often needs sorted input or fast lookup
advanced structures (Module 5) that generalize the "tree in an array" and amortization ideas introduced here
systems, database, and compiler work that depend on cache-aware sorting and hashing

You are learning to reason about data-handling cost precisely and pick the right tool for a job.

Concept Map

How To Use This Module

Work in order. Later clusters depend on earlier habits -- in particular the invariant-writing reflex from Cluster 1 and the probability-of-collision reasoning from Cluster 4.

Cluster 1: Comparison-Based Sorting and the Lower Bound

Order	Concept	Type	Focus
1	Insertion Sort Teaches Lessons That Survive	SUPPORTING	Invariants, adaptivity, and why library sorts switch to insertion sort on small subarrays
2	Mergesort Is the Divide-and-Conquer Exemplar	PRIMARY	Recurrence `T(n) = 2T(n/2) + Theta(n)`, stability, and guaranteed `O(n log n)`
3	Quicksort and Why Randomization Helps	PRIMARY	Partition, expected `O(n log n)`, pivot choice, adversarial vs unlucky inputs
4	Comparison Sorting Has an Omega(n log n) Lower Bound	PRIMARY	Decision trees, `log2(n!)`, and what counts as a "comparison"

Cluster mastery check: Can you write a loop invariant, derive Theta(n log n) from the mergesort recurrence, and explain why no comparison-based sort can do asymptotically better?

Cluster 2: Linear-Time and Specialized Sorting

Order	Concept	Type	Focus
5	Counting Sort and When Integer Keys Break the Lower Bound	SUPPORTING	`Theta(n + k)` histogram sort, stability, and why it does not violate the lower bound
6	Radix Sort and Linear in Key Width	SUPPORTING	LSD vs MSD, the real meaning of "linear," and how radix plugs into counting sort
7	External and Practical Sorting	SUPPORTING	Introsort, Timsort, external merge sort, and which library sort your language really runs

Cluster mastery check: Can you pick the right sort given the input shape, key structure, and memory budget -- and justify why the textbook asymptotics alone do not decide?

Cluster 3: Searching in Sorted and Unsorted Data

Order	Concept	Type	Focus
8	Binary Search and Its Invariant	PRIMARY	The correctness template that generalizes to `lower_bound`, `upper_bound`, and predicate-based search
9	Selection and Order Statistics in O(n)	PRIMARY	Quickselect, median-of-medians, and the partition primitive as a general tool
10	Search-Data-Structure Choice	SUPPORTING	Array vs linked list vs balanced BST vs hash table, by access pattern

Cluster mastery check: Can you state the invariant for a binary search before writing any code, and can you match a storage shape to its operation-cost profile?

Cluster 4: Hash Tables as a Dictionary Primitive

Order	Concept	Type	Focus
11	Hashing as a Randomized Model	PRIMARY	Expected `O(1)` as a probabilistic guarantee, and what breaks it
12	Collision Resolution: Chaining vs Open Addressing	PRIMARY	Two schemes, their load-factor behavior, and deletion traps
13	Universal Hashing, Load Factor, and Resizing	SUPPORTING	Universal families, cryptographic vs non-cryptographic hashing, amortized resizes

Cluster mastery check: Can you explain why hash-table O(1) is expected rather than worst case, and can you configure a table to behave well under adversarial input?

Cluster 5: Heaps and Priority Queues

Order	Concept	Type	Focus
14	The Binary Heap as an Implicit Tree	PRIMARY	Array-backed heap layout, sift-up/down, and the `O(n)` build argument
15	Heapsort and the Priority-Queue API	PRIMARY	In-place `O(n log n)` sort, and the PQ operations all graph algorithms depend on
16	Priority-Queue Applications	SUPPORTING	Event simulation, Dijkstra relaxation, top-K, median maintenance, `k`-way merge

Cluster mastery check: Can you spot a problem whose loop is "pick the best remaining element, do work, possibly produce new elements," and translate it to a heap-backed priority queue?

Then work these practice pages:

Order	Practice path	Focus
1	Sorting Correctness and Analysis Lab	Invariants, complexity derivations, stability, and complexity-distinguishing drills
2	Searching and Selection Workshop	Binary search variants, quickselect, predicate-based search
3	Hashing and Priority-Queue Clinic	Collision reasoning, universal hashing, heap operations, PQ applications
4	Code Katas	Timed implementations: mergesort, quicksort, heapsort, hash table, binary search, k-smallest

Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources only for targeted reinforcement.

Learning Objectives

By the end of this module you should be able to:

Write insertion sort, mergesort, and quicksort from scratch with correct loop invariants.
Derive the Theta(n log n) cost of mergesort and the expected O(n log n) of randomized quicksort.
Prove the Omega(n log n) comparison-sort lower bound via decision trees.
Implement counting sort and radix sort, and explain the assumptions that let them beat the lower bound.
Choose the right sorting algorithm given input shape, key structure, memory budget, and stability needs.
Implement binary search and its variants (lower_bound, upper_bound, predicate-based) using the invariant template.
Implement quickselect and describe when linear-time selection beats sorting.
Implement a hash table with chaining and with open addressing, and explain load-factor tradeoffs.
Implement a binary-heap-backed priority queue with insert, extract_min, peek, and understand how decrease_key is typically supported.
Identify priority-queue patterns in real problems: simulation, shortest path, top-K, median maintenance, k-way merge.

Outputs

an algorithms notebook or repo with tested implementations of insertion, merge, quick, counting, radix, and heap sorts
a binary search library covering search, lower_bound, upper_bound, and one predicate-based variant, each with a written invariant
a quickselect implementation and one worked example comparing it to sorting-then-indexing
a hand-rolled hash table with chaining and with linear probing, plus a written load-factor study
a priority-queue implementation with at least three application demos (Dijkstra on a small graph, top-K, median maintenance)
one benchmark note comparing library sort, your own mergesort, and your own quicksort on random, sorted, and reverse-sorted inputs
one mistake log naming at least 10 real errors such as off-by-one in binary search, assumed stability, forgot tombstones on delete, confused peek and extract, miscounted recurrence levels
one short memo explaining how Module 2 tools feed into graph algorithms (Module 3) and advanced structures (Module 5)

Completion Standard

You have completed Module 2 when all of these are true:

you can write a correct sort from memory and state its invariant in one sentence
you can derive the Theta(n log n) recurrence for mergesort without looking it up
you can explain the Omega(n log n) bound using the decision-tree argument
you can write binary search and at least one variant correctly on the first try
you can explain why hash-table O(1) is "expected" rather than worst case, and what breaks the expectation
you can implement a binary heap and the basic PQ operations, and you recognize priority-queue shaped problems
you can argue why a particular library sort is or is not the right choice for a given workload

If the code works but you cannot justify the cost or the correctness in words, the module is not complete.

Reading Policy

Concept pages are the main path.
Local book chunks are selective reinforcement, not a second syllabus.
Read only if stuck means try the concept page, self-check, and drill first.
Optional deep dive means additional nuance or exercise volume, not required progression.
External URLs appear only in resources.md under "Read-If-Curious"; concept pages never include raw web links.

Suggested Weekly Flow

Day	Work
1	Concepts 1-2 and one hand-traced mergesort with a written invariant
2	Concepts 3-4 and one randomized-quicksort implementation + decision-tree sketch
3	Concepts 5-7 and a counting/radix sort implementation on integer input
4	Concepts 8-10 and binary search + `lower_bound` written from scratch
5	Concepts 11-13 and a chaining hash table implemented end-to-end
6	Concepts 14-16 and a binary-heap PQ with a median-maintenance demo
7	Practice pages, code katas, quiz, and mistake-log cleanup

Reference

If you need exact links into the local chunked books, use Reference and Selective Reading.

Rich Learning Pages

Model Artifact Calibration

For benchmark calibration, compare your measurements and decision writeup to the algorithm benchmark note model artifact.

Scope of This Module​

Before You Start​

Diagnostic Interpretation​

What This Module Is For​

Concept Map​

How To Use This Module​

Cluster 1: Comparison-Based Sorting and the Lower Bound​

Cluster 2: Linear-Time and Specialized Sorting​

Cluster 3: Searching in Sorted and Unsorted Data​

Cluster 4: Hash Tables as a Dictionary Primitive​

Cluster 5: Heaps and Priority Queues​

Learning Objectives​

Outputs​

Completion Standard​

Reading Policy​

Suggested Weekly Flow​

Reference​

Rich Learning Pages​

Model Artifact Calibration​