Skip to main content

Module 5: Advanced Data Structures & Algorithmic Tradeoffs: Case Studies

Advanced structures earn their place when a workload needs their guarantees. These cases force the choice through operations, invariants, and failure modes.


Case Study 1: Priority Queue for Job Scheduling

Scenario: A worker service runs background jobs by priority and retry time. The first implementation scans the full list of pending jobs every second to find the next job.

Source anchor: Python heapq documentation.

Module concepts:

  • heaps
  • priority queues
  • comparator keys
  • lazy deletion

Wrong Approach

Keep a list because scanning is straightforward and the queue is usually small.

Better Approach

Use a heap keyed by (runAt, priority, sequenceNumber) so fetching the next runnable job is efficient and stable for ties. Handle cancellation with lazy deletion or an index map, and test tie-breaking explicitly.

Tradeoff Table

ChoiceGainCost
Linear scanSimple codeCost grows with pending jobs
HeapEfficient next-job selectionUpdate/delete needs care
Indexed priority queueBetter updatesMore implementation complexity

Failure Mode

At peak traffic, every worker repeatedly scans thousands of delayed jobs.

Required Artifact

Create an operation table for insert, peek, pop, cancel, reprioritize, and explain how the chosen structure handles each one.

Project / Capstone Connection

Use this table before introducing any advanced structure in the project.


Case Study 2: Balanced Tree for Range Queries

Scenario: An analytics feature must answer "show all orders between two timestamps" and also insert late-arriving orders. A hash map by order ID makes exact lookup fast but does not support ordered range scans.

Source anchor: MIT OpenCourseWare balanced binary search trees lecture.

Module concepts:

  • balanced binary search trees
  • ordered maps
  • range queries
  • invariant maintenance

Wrong Approach

Store everything in a hash map and sort all timestamps whenever a range query arrives.

Better Approach

Use an ordered structure such as a balanced tree or database index when range queries are core operations. Maintain ordering on insert, then answer ranges by seeking to the lower bound and scanning until the upper bound.

Tradeoff Table

ChoiceGainCost
Hash mapFast exact lookupPoor ordered scans
Sort on demandEasy to addRepeats sorting work
Balanced tree/indexEfficient ordered operationsMore complex invariants

Failure Mode

Range dashboards slow down as data grows because every request performs a fresh full sort.

Required Artifact

Write an access-pattern matrix comparing exact lookup, insert, delete, range scan, and ordered iteration.

Project / Capstone Connection

Use this matrix for choosing between arrays, maps, heaps, trees, and indexes.


Case Study 3: Database Index as a Data Structure Choice

Scenario: A product search endpoint filters by category, price, and created_at. Developers add application-side filtering after loading thousands of rows because the database query seems simpler.

Source anchor: SQLite query planner documentation.

Module concepts:

  • indexes as data structures
  • selectivity
  • composite index order
  • query-plan inspection

Wrong Approach

Assume database performance is outside algorithm analysis.

Better Approach

Treat the index as a search structure maintained by the database. Inspect the query plan, choose indexes based on equality filters, range filters, and ordering needs, then benchmark representative queries.

Tradeoff Table

ChoiceGainCost
App-side filteringSimple queryMoves work and data unnecessarily
Single-column indexesEasy to addMay not match compound query
Composite indexFits workloadExtra write and storage cost

Failure Mode

The endpoint works in tests but times out when a category contains millions of products.

Required Artifact

Produce a query-plan note with workload, proposed index, expected seek/scan behavior, and before/after measurements.

Project / Capstone Connection

Use this note when explaining real-world uses of trees, hash indexes, and ordered search.


Case Study 4: Trie for Autocomplete

Scenario: A command palette autocompletes command names. The first version scans every command for every keystroke. It works with 200 commands but becomes sluggish with thousands of plugin commands.

Source anchor: The module's tries and prefix-search concepts.

Module concepts:

  • trie
  • prefix search
  • memory tradeoffs
  • ranking layer separation

Wrong Approach

Optimize the full scan with micro-tweaks before changing the access pattern.

Better Approach

Use a trie or prefix index for candidate retrieval, then apply ranking separately. Keep payloads small at trie nodes, define update behavior for plugin install/uninstall, and benchmark memory as well as latency.

Tradeoff Table

ChoiceGainCost
Full scanSimple and flexibleWork grows with all commands
TrieFast prefix lookupHigher memory and update complexity
Search indexRich rankingMore infrastructure

Failure Mode

Autocomplete latency grows linearly with total commands instead of matching prefix selectivity.

Required Artifact

Create a prefix workload table with command count, average prefix length, memory estimate, lookup latency, and update requirements.

Project / Capstone Connection

Use this workload table for any prefix, dictionary, or autocomplete feature.


Source Map

SourceUse it for
Python heapq documentationHeap and priority-queue behavior.
MIT OpenCourseWare balanced binary search trees lectureBalanced tree invariants and ordered operation reasoning.
SQLite query planner documentationDatabase indexes as practical search structures.