Module 5: Advanced Data Structures & Algorithmic Tradeoffs: Case Studies
Advanced structures earn their place when a workload needs their guarantees. These cases force the choice through operations, invariants, and failure modes.
Case Study 1: Priority Queue for Job Scheduling
Scenario: A worker service runs background jobs by priority and retry time. The first implementation scans the full list of pending jobs every second to find the next job.
Source anchor: Python heapq documentation.
Module concepts:
- heaps
- priority queues
- comparator keys
- lazy deletion
Wrong Approach
Keep a list because scanning is straightforward and the queue is usually small.
Better Approach
Use a heap keyed by (runAt, priority, sequenceNumber) so fetching the next runnable job is efficient and stable for ties. Handle cancellation with lazy deletion or an index map, and test tie-breaking explicitly.
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| Linear scan | Simple code | Cost grows with pending jobs |
| Heap | Efficient next-job selection | Update/delete needs care |
| Indexed priority queue | Better updates | More implementation complexity |
Failure Mode
At peak traffic, every worker repeatedly scans thousands of delayed jobs.
Required Artifact
Create an operation table for insert, peek, pop, cancel, reprioritize, and explain how the chosen structure handles each one.
Project / Capstone Connection
Use this table before introducing any advanced structure in the project.
Case Study 2: Balanced Tree for Range Queries
Scenario: An analytics feature must answer "show all orders between two timestamps" and also insert late-arriving orders. A hash map by order ID makes exact lookup fast but does not support ordered range scans.
Source anchor: MIT OpenCourseWare balanced binary search trees lecture.
Module concepts:
- balanced binary search trees
- ordered maps
- range queries
- invariant maintenance
Wrong Approach
Store everything in a hash map and sort all timestamps whenever a range query arrives.
Better Approach
Use an ordered structure such as a balanced tree or database index when range queries are core operations. Maintain ordering on insert, then answer ranges by seeking to the lower bound and scanning until the upper bound.
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| Hash map | Fast exact lookup | Poor ordered scans |
| Sort on demand | Easy to add | Repeats sorting work |
| Balanced tree/index | Efficient ordered operations | More complex invariants |
Failure Mode
Range dashboards slow down as data grows because every request performs a fresh full sort.
Required Artifact
Write an access-pattern matrix comparing exact lookup, insert, delete, range scan, and ordered iteration.
Project / Capstone Connection
Use this matrix for choosing between arrays, maps, heaps, trees, and indexes.
Case Study 3: Database Index as a Data Structure Choice
Scenario: A product search endpoint filters by category, price, and created_at. Developers add application-side filtering after loading thousands of rows because the database query seems simpler.
Source anchor: SQLite query planner documentation.
Module concepts:
- indexes as data structures
- selectivity
- composite index order
- query-plan inspection
Wrong Approach
Assume database performance is outside algorithm analysis.
Better Approach
Treat the index as a search structure maintained by the database. Inspect the query plan, choose indexes based on equality filters, range filters, and ordering needs, then benchmark representative queries.
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| App-side filtering | Simple query | Moves work and data unnecessarily |
| Single-column indexes | Easy to add | May not match compound query |
| Composite index | Fits workload | Extra write and storage cost |
Failure Mode
The endpoint works in tests but times out when a category contains millions of products.
Required Artifact
Produce a query-plan note with workload, proposed index, expected seek/scan behavior, and before/after measurements.
Project / Capstone Connection
Use this note when explaining real-world uses of trees, hash indexes, and ordered search.
Case Study 4: Trie for Autocomplete
Scenario: A command palette autocompletes command names. The first version scans every command for every keystroke. It works with 200 commands but becomes sluggish with thousands of plugin commands.
Source anchor: The module's tries and prefix-search concepts.
Module concepts:
- trie
- prefix search
- memory tradeoffs
- ranking layer separation
Wrong Approach
Optimize the full scan with micro-tweaks before changing the access pattern.
Better Approach
Use a trie or prefix index for candidate retrieval, then apply ranking separately. Keep payloads small at trie nodes, define update behavior for plugin install/uninstall, and benchmark memory as well as latency.
Tradeoff Table
| Choice | Gain | Cost |
|---|---|---|
| Full scan | Simple and flexible | Work grows with all commands |
| Trie | Fast prefix lookup | Higher memory and update complexity |
| Search index | Rich ranking | More infrastructure |
Failure Mode
Autocomplete latency grows linearly with total commands instead of matching prefix selectivity.
Required Artifact
Create a prefix workload table with command count, average prefix length, memory estimate, lookup latency, and update requirements.
Project / Capstone Connection
Use this workload table for any prefix, dictionary, or autocomplete feature.
Source Map
| Source | Use it for |
|---|---|
| Python heapq documentation | Heap and priority-queue behavior. |
| MIT OpenCourseWare balanced binary search trees lecture | Balanced tree invariants and ordered operation reasoning. |
| SQLite query planner documentation | Database indexes as practical search structures. |