Priority-Queue Applications: Simulation, Dijkstra, Top-K, Median

What This Concept Is

Once you have a priority queue, a small set of patterns turn it into the right tool for many apparently different problems:

Event-driven simulation -- events live on a min-heap keyed by timestamp; always process the next event in simulated-time order.
Dijkstra's shortest path -- a min-heap of (tentative_distance, vertex); always relax the closest un-settled vertex.
Top-K with a bounded heap -- a min-heap of size (K) holds the current largest (K) values; replace the root when a larger value arrives.
Median maintenance with two heaps -- a max-heap holds the lower half, a min-heap holds the upper half; their roots give the running median.
Huffman coding -- repeatedly pop the two smallest-frequency nodes, merge, push back.
K-way merge -- a min-heap of (K) "current" elements, one from each sorted stream.
Scheduling / load balancing -- a heap of machine loads; assign each new job to the least-loaded machine ("power of two choices" often samples two and picks the min).
A* search -- a min-heap keyed by (f(n) = g(n) + h(n)); a Dijkstra-shaped loop with a heuristic.
Retry / rate-limiter queues -- a heap of (earliest_retry_time, request).
Best-first branch-and-bound -- a heap of partial solutions keyed by optimistic bound.

All of these share the structural move: repeatedly pick the extreme of a changing set and produce new items to consider.

Why It Matters Here

These patterns turn "knowing what a heap is" into "knowing what a heap is for." Each reappears in later semesters:

Dijkstra and Prim in the graph module (S2 M3).
Event simulation in concurrency and distributed systems (S4, S8).
Median maintenance and sketches in data-heavy systems (S6 streaming analytics).
Top-K in search, recommendation, and analytics (S6, S8).
A* and branch-and-bound in optimization-style problems (S2 M4 onward).

The common insight: if the algorithm has a loop of the form "pop the best next thing -> do work -> push any newly created things," use a priority queue. If you find yourself re-scanning a set to find the current minimum, you are almost certainly paying (O(n)) where (O(\log n)) was on offer.

Concrete Examples

Example 1 -- median maintenance over a stream.

import heapq
lower, upper = [], []   # max-heap (via negation), min-heap

def add(x):
    if not lower or x <= -lower[0]:
        heapq.heappush(lower, -x)
    else:
        heapq.heappush(upper, x)
    if len(lower) > len(upper) + 1:
        heapq.heappush(upper, -heapq.heappop(lower))
    elif len(upper) > len(lower):
        heapq.heappush(lower, -heapq.heappop(upper))

def median():
    if len(lower) > len(upper):
        return -lower[0]
    return (-lower[0] + upper[0]) / 2

Invariants: (1) every element in lower (\leq) every element in upper; (2) (|lower| \in {|upper|, |upper|+1}). Each add costs (O(\log n)); each median is (O(1)). Over (n) inserts the total is (O(n \log n)) -- equivalent to sorting, but with streaming access and (O(1)) median reads throughout.

Example 2 -- top-K with a bounded min-heap.

def top_k(stream, k):
    h = []
    for x in stream:
        if len(h) < k:
            heapq.heappush(h, x)
        elif x > h[0]:
            heapq.heapreplace(h, x)
    return sorted(h, reverse=True)

Cost: (O(n \log k)) time, (O(k)) space. Compared to sorted(stream, reverse=True)[:k], which is (O(n \log n)) time and (O(n)) space, top-K beats the full sort whenever (k \log k \ll n \log n) -- essentially always when (k) is a small constant or (k = O(\sqrt{n})). It is also streaming-friendly: the full input never needs to fit in memory.

Common Confusion / Misconceptions

"Reach for a priority queue any time you sort." If all data is present up front and you need every element in order, one sort is cheaper than (n) heap operations (same big-O, better constants, and stable by default with Timsort). Use a PQ when data arrives incrementally, or when you only need the top (K), median, or next-best.

"Dijkstra works with any queue." With an unsorted array you get (O(V^2)) -- fine for dense graphs, bad for sparse. With a FIFO queue you get BFS, which only gives shortest paths when all edge weights are equal. With a deque + 0/1 weights you get 0-1 BFS. Only a priority queue gives (O((V+E) \log V)) for general non-negative weights.

"Stale entries are fine." When you "update" a priority by pushing a new entry (lazy decrease_key), old entries remain. If you forget to filter them on pop, your algorithm returns correct answers on small inputs and silently wrong ones once duplicates accumulate. Always check if d > dist[u]: continue (or equivalent) on pop.

"Two-heap median is obvious." The rebalancing conditions are subtle. Off-by-one in either direction breaks the "lower contains floor((n+1)/2)" invariant. Write the invariants down, then code, then test against a sorted reference on randomized input.

How To Use It

Pattern: pick-extreme-and-update loop.

Initialize the PQ with starting items.
Loop: pop the best item; do work; compute any newly created items; push them.
Filter stale entries on pop if you use lazy decrease_key.
Terminate when the PQ empties or a goal condition holds.

Map specific problems to this pattern:

Problem	PQ key	Work on pop
Dijkstra	tentative distance	relax outgoing edges
A*	(f(n) = g(n) + h(n))	expand neighbors with heuristic
Event simulator	timestamp	advance time, spawn new events
Huffman	frequency	merge two smallest, push merged node
k-way merge	current stream value	write to output, push next from that stream
Top-K	value (min-heap of size K)	replace root if larger arrives
Retry scheduler	earliest retry time	issue request, push next-try on failure
Branch-and-bound	optimistic bound	expand children, prune dominated

If your loop is shaped like this without a PQ, you are almost certainly paying more time than you need to.

Transfer / Where This Shows Up Later

S2 M3 (graphs). Dijkstra, Prim, A* directly reuse this pattern; Johnson's algorithm runs Dijkstra from every vertex.
S2 M4 (DP & greedy). Huffman coding, scheduling with deadlines, and interval-covering problems reach for heaps.
S3 (software design). The "observer + priority queue" pattern structures event systems; clean-code exemplars use the PQ as the canonical place where the ADT/implementation split pays off.
S4 (systems programming). Linux hrtimer wheels, Tokio's sleep timer, Go runtime timers, and epoll timeout queues are priority queues under the hood.
S5 (OS). EDF (earliest-deadline-first) and rate-monotonic real-time schedulers are PQ loops. Linux CFS uses a red-black tree instead of a heap, a trade-off whose rationale ("we also need ordered traversal") links back to concept 14.
S6 (databases). External merge sort's final k-way merge, ORDER BY ... LIMIT k, streaming analytics' heavy-hitters (top-K) and quantile sketches (median/percentile maintenance with richer data structures), and LSM compaction all sit on this pattern.
S7 (architecture). ADRs for retry, rate-limiting, and backpressure cite PQ-based schedulers; SLO enforcement often uses (deadline, request) heaps.
S8 (system design). Distributed schedulers (Kubernetes priority preemption, Spark's DAG scheduler, YARN fair scheduler), leader-election timeouts, and "least-loaded backend" load balancers are heap-based PQs. Consistent hashing rings combine with PQs for repair/reallocation.

Check Yourself

State the two invariants that the two-heap median algorithm preserves.
Why is Dijkstra (O((V + E) \log V)) with a binary heap, and when does that improve to (O(E + V \log V))?
When is sorted(a)[:k] actually faster than a bounded-heap top-K, and when is it worse?
How does lazy decrease_key work, and what bug appears if you forget to filter stale entries?
Give a problem where a PQ is not the right tool even though the statement says "smallest remaining."

Mini Drill or Application

Implement median maintenance for a stream of 1M doubles; verify against a sorted array at 100 random checkpoints.
Simulate a bank with (k) tellers using an event-driven PQ; compute the average customer wait time as (k) varies.
Use heapq.merge (or implement equivalent) to merge 10 sorted files into one sorted output without loading them all into memory; verify the output is sorted and contains every input record.
Implement A* search on an 8-puzzle; show how heuristic quality (Manhattan vs. misplaced-tiles) changes the number of heap ops.
Build a RetryScheduler keyed by earliest-allowed-time; test that jittered retries respect the minimum backoff.

Read This Only If Stuck

CLRS: 6.5 Priority Queues (applications)
CLRS: 6.5 Priority Queues (more)
CLRS: 24 Single-Source Shortest Paths (Dijkstra analysis via heaps)
Skiena: 3.5 Priority Queues
Skiena: 4.3.2 Constructing Heaps
Sedgewick: Priority Queues (applications)
Grokking Algorithms: Dijkstra's Algorithm
Competitive Programming: Non-linear DS with built-in libraries
CP-Algorithms: Dijkstra on sparse graphs (priority-queue implementation) -- explicit lazy-deletion PQ code with analysis.
Princeton Sedgewick: 2.4 Priority Queues (applications) -- event-driven simulation, top-K, and multiway merge worked side-by-side.

What This Concept Is​

Why It Matters Here​

Concrete Examples​

Common Confusion / Misconceptions​

How To Use It​

Transfer / Where This Shows Up Later​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​