Skip to main content

Module 2: Sorting, Searching & Fundamental Structures

Primary texts: The Algorithm Design Manual (Skiena) for problem-driven intuition and Introduction to Algorithms (CLRS) for proofs and cost analysis
Selective support: Algorithms (Sedgewick) for implementation texture; Grokking Algorithms for first-pass framing; Competitive Programming for exercise volume

This guide is the primary teacher. You do not need to read the source books front-to-back to complete this module. You do need to become fluent in sorting, searching, hashing, and heap-based priority queues -- both as algorithms and as a vocabulary for expressing cost tradeoffs.


Scope of This Module

This module is not a catalog of every sort ever invented. It is where you learn to reason about why a particular operation costs what it does and when to pay that cost.

What it covers in depth:

  • the three comparison sorts that matter operationally: insertion, merge, quick
  • the Omega(n log n) lower bound proved through decision trees
  • linear-time specialty sorts (counting, radix) and the assumptions they pay for their speed with
  • external and practical sorting -- what real libraries actually do
  • binary search as an invariant-driven template, not a trick
  • selection and order statistics in linear time (quickselect, median-of-medians)
  • arrays vs linked lists vs trees vs hash tables as a choice about access patterns
  • hashing as a randomized model: expected O(1), universal hashing, load factor, collision resolution
  • binary heaps as implicit trees and the priority-queue API they enable
  • heapsort and the pattern-by-pattern use of priority queues in simulation, Dijkstra, top-K, median maintenance, and k-way merge

What it deliberately does not try to finish here:

  • balanced search trees beyond a mention (Module 5 handles them)
  • graph algorithms beyond naming Dijkstra as a PQ client (Module 3)
  • amortized analysis beyond the resize of a hash table (Module 5)
  • advanced heaps such as Fibonacci heaps (Module 5)

This is a foundations module. If you can implement these algorithms but cannot explain their cost profiles in a sentence, you are not done.


Before You Start

Answer these closed-book before starting the main path:

  1. Why does binary search require random access to be O(log n)?
  2. What makes mergesort O(n log n) rather than O(n^2)? Specifically, what is the recurrence?
  3. What does "stable sort" mean, and when do you need it?
  4. Why is the O(1) guarantee for hash-table lookup "expected," not worst case?
  5. In a max-heap stored as an array, how do you compute the parent of the node at index i?

Diagnostic Interpretation

4-5 solid answers

  • You are ready for the full path.

2-3 solid answers

  • Continue, but expect extra time in the lower-bound and hashing clusters.

0-1 solid answers

  • Revisit Module 1 recurrence analysis and the Semester 1 probability indicator-variable techniques before proceeding.

What This Module Is For

Sorting, searching, hashing, and priority queues are the plumbing of almost every non-trivial algorithm. Later work repeatedly asks:

  • how do I find one element or one order statistic out of many?
  • how do I organize data so that insert, lookup, and delete are all fast?
  • how do I keep track of the next-most-important item in a dynamic collection?
  • which algorithm costs what on which input distribution?
  • when does a library default hide a performance or correctness trap?

This module trains the reasoning needed for:

  • graph algorithms (Module 3) that use PQs for Dijkstra, Prim, A*
  • dynamic programming (Module 4) that often needs sorted input or fast lookup
  • advanced structures (Module 5) that generalize the "tree in an array" and amortization ideas introduced here
  • systems, database, and compiler work that depend on cache-aware sorting and hashing

You are learning to reason about data-handling cost precisely and pick the right tool for a job.


Concept Map


How To Use This Module

Work in order. Later clusters depend on earlier habits -- in particular the invariant-writing reflex from Cluster 1 and the probability-of-collision reasoning from Cluster 4.

Cluster 1: Comparison-Based Sorting and the Lower Bound

OrderConceptTypeFocus
1Insertion Sort Teaches Lessons That SurviveSUPPORTINGInvariants, adaptivity, and why library sorts switch to insertion sort on small subarrays
2Mergesort Is the Divide-and-Conquer ExemplarPRIMARYRecurrence T(n) = 2T(n/2) + Theta(n), stability, and guaranteed O(n log n)
3Quicksort and Why Randomization HelpsPRIMARYPartition, expected O(n log n), pivot choice, adversarial vs unlucky inputs
4Comparison Sorting Has an Omega(n log n) Lower BoundPRIMARYDecision trees, log2(n!), and what counts as a "comparison"

Cluster mastery check: Can you write a loop invariant, derive Theta(n log n) from the mergesort recurrence, and explain why no comparison-based sort can do asymptotically better?

Cluster 2: Linear-Time and Specialized Sorting

OrderConceptTypeFocus
5Counting Sort and When Integer Keys Break the Lower BoundSUPPORTINGTheta(n + k) histogram sort, stability, and why it does not violate the lower bound
6Radix Sort and Linear in Key WidthSUPPORTINGLSD vs MSD, the real meaning of "linear," and how radix plugs into counting sort
7External and Practical SortingSUPPORTINGIntrosort, Timsort, external merge sort, and which library sort your language really runs

Cluster mastery check: Can you pick the right sort given the input shape, key structure, and memory budget -- and justify why the textbook asymptotics alone do not decide?

Cluster 3: Searching in Sorted and Unsorted Data

OrderConceptTypeFocus
8Binary Search and Its InvariantPRIMARYThe correctness template that generalizes to lower_bound, upper_bound, and predicate-based search
9Selection and Order Statistics in O(n)PRIMARYQuickselect, median-of-medians, and the partition primitive as a general tool
10Search-Data-Structure ChoiceSUPPORTINGArray vs linked list vs balanced BST vs hash table, by access pattern

Cluster mastery check: Can you state the invariant for a binary search before writing any code, and can you match a storage shape to its operation-cost profile?

Cluster 4: Hash Tables as a Dictionary Primitive

OrderConceptTypeFocus
11Hashing as a Randomized ModelPRIMARYExpected O(1) as a probabilistic guarantee, and what breaks it
12Collision Resolution: Chaining vs Open AddressingPRIMARYTwo schemes, their load-factor behavior, and deletion traps
13Universal Hashing, Load Factor, and ResizingSUPPORTINGUniversal families, cryptographic vs non-cryptographic hashing, amortized resizes

Cluster mastery check: Can you explain why hash-table O(1) is expected rather than worst case, and can you configure a table to behave well under adversarial input?

Cluster 5: Heaps and Priority Queues

OrderConceptTypeFocus
14The Binary Heap as an Implicit TreePRIMARYArray-backed heap layout, sift-up/down, and the O(n) build argument
15Heapsort and the Priority-Queue APIPRIMARYIn-place O(n log n) sort, and the PQ operations all graph algorithms depend on
16Priority-Queue ApplicationsSUPPORTINGEvent simulation, Dijkstra relaxation, top-K, median maintenance, k-way merge

Cluster mastery check: Can you spot a problem whose loop is "pick the best remaining element, do work, possibly produce new elements," and translate it to a heap-backed priority queue?

Then work these practice pages:

OrderPractice pathFocus
1Sorting Correctness and Analysis LabInvariants, complexity derivations, stability, and complexity-distinguishing drills
2Searching and Selection WorkshopBinary search variants, quickselect, predicate-based search
3Hashing and Priority-Queue ClinicCollision reasoning, universal hashing, heap operations, PQ applications
4Code KatasTimed implementations: mergesort, quicksort, heapsort, hash table, binary search, k-smallest

Use Module Quiz after the concept and practice path. Use Reference and Selective Reading and Learning Resources only for targeted reinforcement.


Learning Objectives

By the end of this module you should be able to:

  1. Write insertion sort, mergesort, and quicksort from scratch with correct loop invariants.
  2. Derive the Theta(n log n) cost of mergesort and the expected O(n log n) of randomized quicksort.
  3. Prove the Omega(n log n) comparison-sort lower bound via decision trees.
  4. Implement counting sort and radix sort, and explain the assumptions that let them beat the lower bound.
  5. Choose the right sorting algorithm given input shape, key structure, memory budget, and stability needs.
  6. Implement binary search and its variants (lower_bound, upper_bound, predicate-based) using the invariant template.
  7. Implement quickselect and describe when linear-time selection beats sorting.
  8. Implement a hash table with chaining and with open addressing, and explain load-factor tradeoffs.
  9. Implement a binary-heap-backed priority queue with insert, extract_min, peek, and understand how decrease_key is typically supported.
  10. Identify priority-queue patterns in real problems: simulation, shortest path, top-K, median maintenance, k-way merge.

Outputs

  • an algorithms notebook or repo with tested implementations of insertion, merge, quick, counting, radix, and heap sorts
  • a binary search library covering search, lower_bound, upper_bound, and one predicate-based variant, each with a written invariant
  • a quickselect implementation and one worked example comparing it to sorting-then-indexing
  • a hand-rolled hash table with chaining and with linear probing, plus a written load-factor study
  • a priority-queue implementation with at least three application demos (Dijkstra on a small graph, top-K, median maintenance)
  • one benchmark note comparing library sort, your own mergesort, and your own quicksort on random, sorted, and reverse-sorted inputs
  • one mistake log naming at least 10 real errors such as off-by-one in binary search, assumed stability, forgot tombstones on delete, confused peek and extract, miscounted recurrence levels
  • one short memo explaining how Module 2 tools feed into graph algorithms (Module 3) and advanced structures (Module 5)

Completion Standard

You have completed Module 2 when all of these are true:

  • you can write a correct sort from memory and state its invariant in one sentence
  • you can derive the Theta(n log n) recurrence for mergesort without looking it up
  • you can explain the Omega(n log n) bound using the decision-tree argument
  • you can write binary search and at least one variant correctly on the first try
  • you can explain why hash-table O(1) is "expected" rather than worst case, and what breaks the expectation
  • you can implement a binary heap and the basic PQ operations, and you recognize priority-queue shaped problems
  • you can argue why a particular library sort is or is not the right choice for a given workload

If the code works but you cannot justify the cost or the correctness in words, the module is not complete.


Reading Policy

  • Concept pages are the main path.
  • Local book chunks are selective reinforcement, not a second syllabus.
  • Read only if stuck means try the concept page, self-check, and drill first.
  • Optional deep dive means additional nuance or exercise volume, not required progression.
  • External URLs appear only in resources.md under "Read-If-Curious"; concept pages never include raw web links.

Suggested Weekly Flow

DayWork
1Concepts 1-2 and one hand-traced mergesort with a written invariant
2Concepts 3-4 and one randomized-quicksort implementation + decision-tree sketch
3Concepts 5-7 and a counting/radix sort implementation on integer input
4Concepts 8-10 and binary search + lower_bound written from scratch
5Concepts 11-13 and a chaining hash table implemented end-to-end
6Concepts 14-16 and a binary-heap PQ with a median-maintenance demo
7Practice pages, code katas, quiz, and mistake-log cleanup

Reference

If you need exact links into the local chunked books, use Reference and Selective Reading.


Rich Learning Pages

Worked Examples | Guided Labs | Case Studies | Mistake Clinic | Reading Guide | Capstone Thread


Model Artifact Calibration

For benchmark calibration, compare your measurements and decision writeup to the algorithm benchmark note model artifact.