Priority-Queue Applications: Simulation, Dijkstra, Top-K, Median
What This Concept Is
Once you have a priority queue, a small set of patterns turn it into the right tool for many apparently different problems:
- Event-driven simulation -- events live on a min-heap keyed by timestamp; always process the next event in simulated-time order.
- Dijkstra's shortest path -- a min-heap of
(tentative_distance, vertex); always relax the closest un-settled vertex. - Top-K with a bounded heap -- a min-heap of size (K) holds the current largest (K) values; replace the root when a larger value arrives.
- Median maintenance with two heaps -- a max-heap holds the lower half, a min-heap holds the upper half; their roots give the running median.
- Huffman coding -- repeatedly pop the two smallest-frequency nodes, merge, push back.
- K-way merge -- a min-heap of (K) "current" elements, one from each sorted stream.
- Scheduling / load balancing -- a heap of machine loads; assign each new job to the least-loaded machine ("power of two choices" often samples two and picks the min).
- A* search -- a min-heap keyed by (f(n) = g(n) + h(n)); a Dijkstra-shaped loop with a heuristic.
- Retry / rate-limiter queues -- a heap of
(earliest_retry_time, request). - Best-first branch-and-bound -- a heap of partial solutions keyed by optimistic bound.
All of these share the structural move: repeatedly pick the extreme of a changing set and produce new items to consider.
Why It Matters Here
These patterns turn "knowing what a heap is" into "knowing what a heap is for." Each reappears in later semesters:
- Dijkstra and Prim in the graph module (S2 M3).
- Event simulation in concurrency and distributed systems (S4, S8).
- Median maintenance and sketches in data-heavy systems (S6 streaming analytics).
- Top-K in search, recommendation, and analytics (S6, S8).
- A* and branch-and-bound in optimization-style problems (S2 M4 onward).
The common insight: if the algorithm has a loop of the form "pop the best next thing -> do work -> push any newly created things," use a priority queue. If you find yourself re-scanning a set to find the current minimum, you are almost certainly paying (O(n)) where (O(\log n)) was on offer.
Concrete Examples
Example 1 -- median maintenance over a stream.
import heapq
lower, upper = [], [] # max-heap (via negation), min-heap
def add(x):
if not lower or x <= -lower[0]:
heapq.heappush(lower, -x)
else:
heapq.heappush(upper, x)
if len(lower) > len(upper) + 1:
heapq.heappush(upper, -heapq.heappop(lower))
elif len(upper) > len(lower):
heapq.heappush(lower, -heapq.heappop(upper))
def median():
if len(lower) > len(upper):
return -lower[0]
return (-lower[0] + upper[0]) / 2
Invariants: (1) every element in lower (\leq) every element in upper; (2) (|lower| \in {|upper|, |upper|+1}). Each add costs (O(\log n)); each median is (O(1)). Over (n) inserts the total is (O(n \log n)) -- equivalent to sorting, but with streaming access and (O(1)) median reads throughout.
Example 2 -- top-K with a bounded min-heap.
def top_k(stream, k):
h = []
for x in stream:
if len(h) < k:
heapq.heappush(h, x)
elif x > h[0]:
heapq.heapreplace(h, x)
return sorted(h, reverse=True)
Cost: (O(n \log k)) time, (O(k)) space. Compared to sorted(stream, reverse=True)[:k], which is (O(n \log n)) time and (O(n)) space, top-K beats the full sort whenever (k \log k \ll n \log n) -- essentially always when (k) is a small constant or (k = O(\sqrt{n})). It is also streaming-friendly: the full input never needs to fit in memory.
Common Confusion / Misconceptions
"Reach for a priority queue any time you sort." If all data is present up front and you need every element in order, one sort is cheaper than (n) heap operations (same big-O, better constants, and stable by default with Timsort). Use a PQ when data arrives incrementally, or when you only need the top (K), median, or next-best.
"Dijkstra works with any queue." With an unsorted array you get (O(V^2)) -- fine for dense graphs, bad for sparse. With a FIFO queue you get BFS, which only gives shortest paths when all edge weights are equal. With a deque + 0/1 weights you get 0-1 BFS. Only a priority queue gives (O((V+E) \log V)) for general non-negative weights.
"Stale entries are fine." When you "update" a priority by pushing a new entry (lazy decrease_key), old entries remain. If you forget to filter them on pop, your algorithm returns correct answers on small inputs and silently wrong ones once duplicates accumulate. Always check if d > dist[u]: continue (or equivalent) on pop.
"Two-heap median is obvious." The rebalancing conditions are subtle. Off-by-one in either direction breaks the "lower contains floor((n+1)/2)" invariant. Write the invariants down, then code, then test against a sorted reference on randomized input.
How To Use It
Pattern: pick-extreme-and-update loop.
- Initialize the PQ with starting items.
- Loop: pop the best item; do work; compute any newly created items; push them.
- Filter stale entries on pop if you use lazy
decrease_key. - Terminate when the PQ empties or a goal condition holds.
Map specific problems to this pattern:
| Problem | PQ key | Work on pop |
|---|---|---|
| Dijkstra | tentative distance | relax outgoing edges |
| A* | (f(n) = g(n) + h(n)) | expand neighbors with heuristic |
| Event simulator | timestamp | advance time, spawn new events |
| Huffman | frequency | merge two smallest, push merged node |
| k-way merge | current stream value | write to output, push next from that stream |
| Top-K | value (min-heap of size K) | replace root if larger arrives |
| Retry scheduler | earliest retry time | issue request, push next-try on failure |
| Branch-and-bound | optimistic bound | expand children, prune dominated |
If your loop is shaped like this without a PQ, you are almost certainly paying more time than you need to.
Transfer / Where This Shows Up Later
- S2 M3 (graphs). Dijkstra, Prim, A* directly reuse this pattern; Johnson's algorithm runs Dijkstra from every vertex.
- S2 M4 (DP & greedy). Huffman coding, scheduling with deadlines, and interval-covering problems reach for heaps.
- S3 (software design). The "observer + priority queue" pattern structures event systems; clean-code exemplars use the PQ as the canonical place where the ADT/implementation split pays off.
- S4 (systems programming). Linux
hrtimerwheels, Tokio'ssleeptimer, Go runtime timers, and epoll timeout queues are priority queues under the hood. - S5 (OS). EDF (earliest-deadline-first) and rate-monotonic real-time schedulers are PQ loops. Linux CFS uses a red-black tree instead of a heap, a trade-off whose rationale ("we also need ordered traversal") links back to concept 14.
- S6 (databases). External merge sort's final k-way merge,
ORDER BY ... LIMIT k, streaming analytics' heavy-hitters (top-K) and quantile sketches (median/percentile maintenance with richer data structures), and LSM compaction all sit on this pattern. - S7 (architecture). ADRs for retry, rate-limiting, and backpressure cite PQ-based schedulers; SLO enforcement often uses
(deadline, request)heaps. - S8 (system design). Distributed schedulers (Kubernetes priority preemption, Spark's DAG scheduler, YARN fair scheduler), leader-election timeouts, and "least-loaded backend" load balancers are heap-based PQs. Consistent hashing rings combine with PQs for repair/reallocation.
Check Yourself
- State the two invariants that the two-heap median algorithm preserves.
- Why is Dijkstra (O((V + E) \log V)) with a binary heap, and when does that improve to (O(E + V \log V))?
- When is
sorted(a)[:k]actually faster than a bounded-heap top-K, and when is it worse? - How does lazy
decrease_keywork, and what bug appears if you forget to filter stale entries? - Give a problem where a PQ is not the right tool even though the statement says "smallest remaining."
Mini Drill or Application
- Implement median maintenance for a stream of 1M doubles; verify against a sorted array at 100 random checkpoints.
- Simulate a bank with (k) tellers using an event-driven PQ; compute the average customer wait time as (k) varies.
- Use
heapq.merge(or implement equivalent) to merge 10 sorted files into one sorted output without loading them all into memory; verify the output is sorted and contains every input record. - Implement A* search on an 8-puzzle; show how heuristic quality (Manhattan vs. misplaced-tiles) changes the number of heap ops.
- Build a
RetrySchedulerkeyed by earliest-allowed-time; test that jittered retries respect the minimum backoff.
Read This Only If Stuck
- CLRS: 6.5 Priority Queues (applications)
- CLRS: 6.5 Priority Queues (more)
- CLRS: 24 Single-Source Shortest Paths (Dijkstra analysis via heaps)
- Skiena: 3.5 Priority Queues
- Skiena: 4.3.2 Constructing Heaps
- Sedgewick: Priority Queues (applications)
- Grokking Algorithms: Dijkstra's Algorithm
- Competitive Programming: Non-linear DS with built-in libraries
- CP-Algorithms: Dijkstra on sparse graphs (priority-queue implementation) -- explicit lazy-deletion PQ code with analysis.
- Princeton Sedgewick: 2.4 Priority Queues (applications) -- event-driven simulation, top-K, and multiway merge worked side-by-side.