Skip to main content

Priority-Queue Applications: Simulation, Dijkstra, Top-K, Median

What This Concept Is

Once you have a priority queue, a small set of patterns turn it into the right tool for many apparently different problems:

  • Event-driven simulation -- events live on a min-heap keyed by timestamp; always process the next event in simulated-time order.
  • Dijkstra's shortest path -- a min-heap of (tentative_distance, vertex); always relax the closest un-settled vertex.
  • Top-K with a bounded heap -- a min-heap of size (K) holds the current largest (K) values; replace the root when a larger value arrives.
  • Median maintenance with two heaps -- a max-heap holds the lower half, a min-heap holds the upper half; their roots give the running median.
  • Huffman coding -- repeatedly pop the two smallest-frequency nodes, merge, push back.
  • K-way merge -- a min-heap of (K) "current" elements, one from each sorted stream.
  • Scheduling / load balancing -- a heap of machine loads; assign each new job to the least-loaded machine ("power of two choices" often samples two and picks the min).
  • A* search -- a min-heap keyed by (f(n) = g(n) + h(n)); a Dijkstra-shaped loop with a heuristic.
  • Retry / rate-limiter queues -- a heap of (earliest_retry_time, request).
  • Best-first branch-and-bound -- a heap of partial solutions keyed by optimistic bound.

All of these share the structural move: repeatedly pick the extreme of a changing set and produce new items to consider.

Why It Matters Here

These patterns turn "knowing what a heap is" into "knowing what a heap is for." Each reappears in later semesters:

  • Dijkstra and Prim in the graph module (S2 M3).
  • Event simulation in concurrency and distributed systems (S4, S8).
  • Median maintenance and sketches in data-heavy systems (S6 streaming analytics).
  • Top-K in search, recommendation, and analytics (S6, S8).
  • A* and branch-and-bound in optimization-style problems (S2 M4 onward).

The common insight: if the algorithm has a loop of the form "pop the best next thing -> do work -> push any newly created things," use a priority queue. If you find yourself re-scanning a set to find the current minimum, you are almost certainly paying (O(n)) where (O(\log n)) was on offer.

Concrete Examples

Example 1 -- median maintenance over a stream.

import heapq
lower, upper = [], [] # max-heap (via negation), min-heap

def add(x):
if not lower or x <= -lower[0]:
heapq.heappush(lower, -x)
else:
heapq.heappush(upper, x)
if len(lower) > len(upper) + 1:
heapq.heappush(upper, -heapq.heappop(lower))
elif len(upper) > len(lower):
heapq.heappush(lower, -heapq.heappop(upper))

def median():
if len(lower) > len(upper):
return -lower[0]
return (-lower[0] + upper[0]) / 2

Invariants: (1) every element in lower (\leq) every element in upper; (2) (|lower| \in {|upper|, |upper|+1}). Each add costs (O(\log n)); each median is (O(1)). Over (n) inserts the total is (O(n \log n)) -- equivalent to sorting, but with streaming access and (O(1)) median reads throughout.

Example 2 -- top-K with a bounded min-heap.

def top_k(stream, k):
h = []
for x in stream:
if len(h) < k:
heapq.heappush(h, x)
elif x > h[0]:
heapq.heapreplace(h, x)
return sorted(h, reverse=True)

Cost: (O(n \log k)) time, (O(k)) space. Compared to sorted(stream, reverse=True)[:k], which is (O(n \log n)) time and (O(n)) space, top-K beats the full sort whenever (k \log k \ll n \log n) -- essentially always when (k) is a small constant or (k = O(\sqrt{n})). It is also streaming-friendly: the full input never needs to fit in memory.

Common Confusion / Misconceptions

"Reach for a priority queue any time you sort." If all data is present up front and you need every element in order, one sort is cheaper than (n) heap operations (same big-O, better constants, and stable by default with Timsort). Use a PQ when data arrives incrementally, or when you only need the top (K), median, or next-best.

"Dijkstra works with any queue." With an unsorted array you get (O(V^2)) -- fine for dense graphs, bad for sparse. With a FIFO queue you get BFS, which only gives shortest paths when all edge weights are equal. With a deque + 0/1 weights you get 0-1 BFS. Only a priority queue gives (O((V+E) \log V)) for general non-negative weights.

"Stale entries are fine." When you "update" a priority by pushing a new entry (lazy decrease_key), old entries remain. If you forget to filter them on pop, your algorithm returns correct answers on small inputs and silently wrong ones once duplicates accumulate. Always check if d > dist[u]: continue (or equivalent) on pop.

"Two-heap median is obvious." The rebalancing conditions are subtle. Off-by-one in either direction breaks the "lower contains floor((n+1)/2)" invariant. Write the invariants down, then code, then test against a sorted reference on randomized input.

How To Use It

Pattern: pick-extreme-and-update loop.

  1. Initialize the PQ with starting items.
  2. Loop: pop the best item; do work; compute any newly created items; push them.
  3. Filter stale entries on pop if you use lazy decrease_key.
  4. Terminate when the PQ empties or a goal condition holds.

Map specific problems to this pattern:

ProblemPQ keyWork on pop
Dijkstratentative distancerelax outgoing edges
A*(f(n) = g(n) + h(n))expand neighbors with heuristic
Event simulatortimestampadvance time, spawn new events
Huffmanfrequencymerge two smallest, push merged node
k-way mergecurrent stream valuewrite to output, push next from that stream
Top-Kvalue (min-heap of size K)replace root if larger arrives
Retry schedulerearliest retry timeissue request, push next-try on failure
Branch-and-boundoptimistic boundexpand children, prune dominated

If your loop is shaped like this without a PQ, you are almost certainly paying more time than you need to.

Transfer / Where This Shows Up Later

  • S2 M3 (graphs). Dijkstra, Prim, A* directly reuse this pattern; Johnson's algorithm runs Dijkstra from every vertex.
  • S2 M4 (DP & greedy). Huffman coding, scheduling with deadlines, and interval-covering problems reach for heaps.
  • S3 (software design). The "observer + priority queue" pattern structures event systems; clean-code exemplars use the PQ as the canonical place where the ADT/implementation split pays off.
  • S4 (systems programming). Linux hrtimer wheels, Tokio's sleep timer, Go runtime timers, and epoll timeout queues are priority queues under the hood.
  • S5 (OS). EDF (earliest-deadline-first) and rate-monotonic real-time schedulers are PQ loops. Linux CFS uses a red-black tree instead of a heap, a trade-off whose rationale ("we also need ordered traversal") links back to concept 14.
  • S6 (databases). External merge sort's final k-way merge, ORDER BY ... LIMIT k, streaming analytics' heavy-hitters (top-K) and quantile sketches (median/percentile maintenance with richer data structures), and LSM compaction all sit on this pattern.
  • S7 (architecture). ADRs for retry, rate-limiting, and backpressure cite PQ-based schedulers; SLO enforcement often uses (deadline, request) heaps.
  • S8 (system design). Distributed schedulers (Kubernetes priority preemption, Spark's DAG scheduler, YARN fair scheduler), leader-election timeouts, and "least-loaded backend" load balancers are heap-based PQs. Consistent hashing rings combine with PQs for repair/reallocation.

Check Yourself

  1. State the two invariants that the two-heap median algorithm preserves.
  2. Why is Dijkstra (O((V + E) \log V)) with a binary heap, and when does that improve to (O(E + V \log V))?
  3. When is sorted(a)[:k] actually faster than a bounded-heap top-K, and when is it worse?
  4. How does lazy decrease_key work, and what bug appears if you forget to filter stale entries?
  5. Give a problem where a PQ is not the right tool even though the statement says "smallest remaining."

Mini Drill or Application

  1. Implement median maintenance for a stream of 1M doubles; verify against a sorted array at 100 random checkpoints.
  2. Simulate a bank with (k) tellers using an event-driven PQ; compute the average customer wait time as (k) varies.
  3. Use heapq.merge (or implement equivalent) to merge 10 sorted files into one sorted output without loading them all into memory; verify the output is sorted and contains every input record.
  4. Implement A* search on an 8-puzzle; show how heuristic quality (Manhattan vs. misplaced-tiles) changes the number of heap ops.
  5. Build a RetryScheduler keyed by earliest-allowed-time; test that jittered retries respect the minimum backoff.

Read This Only If Stuck