Skip to main content

Hashing and Priority-Queue Clinic

Retrieval Prompts

  1. State the simple-uniform-hashing assumption in one sentence.
  2. Write the inequality that defines a universal family of hash functions.
  3. Give the expected search cost for chaining at load factor alpha.
  4. Give the expected successful search cost for open addressing at load factor alpha.
  5. State the parent and child index formulas for a 0-indexed binary heap.
  6. Give the total cost of build_heap and sketch why it is O(n), not O(n log n).
  7. Name three priority-queue problem shapes beyond Dijkstra.

Compare and Distinguish

  • chaining vs open addressing (performance profile, memory layout, deletion)
  • universal hashing vs cryptographic hashing
  • hash table vs balanced BST (strengths of each)
  • binary heap vs binary search tree (what each preserves)
  • extract_min vs peek on a heap
  • decrease_key with an index map vs "lazy" decrease-key with duplicates

Common Mistake Check

For each statement, identify the error:

  1. "A hash table is O(1) worst case."
  2. "Open addressing is always faster than chaining."
  3. "Deleting a key in linear probing just means marking the slot empty."
  4. "A binary heap is fully sorted in its array representation."
  5. "Heapsort is strictly faster than quicksort because of its better worst case."
  6. "Dijkstra's algorithm requires Fibonacci heaps to be efficient."
  7. "You can search for an arbitrary key in a heap in O(log n)."

Mini Application -- Hash Tables

  1. Implement a chaining hash table with resize-on-load-factor. Parameters: initial capacity 8, target alpha = 0.75, doubling on resize. Test with insert, lookup, delete, and one resize trigger.

  2. Implement linear-probing hash table with tombstone-based deletion. Include a rehash at alpha = 0.5 (since tombstones count as occupied for probe-sequence correctness).

  3. Using linearity of expectation, compute:

    • expected number of empty buckets when n = 200, m = 256
    • expected number of collisions (pairs of keys with h(i) = h(j)) when n = 200 keys hash uniformly into m = 256 buckets
  4. Demonstrate a collision-attack scenario: using a trivially predictable hash h(x) = x mod m, build 100 keys that all land in bucket 0 and show the resulting worst-case lookup time.

Mini Application -- Binary Heaps

  1. Implement sift_up and sift_down for an array-backed min-heap. Test on adversarial inputs (sorted ascending, sorted descending, all duplicates).

  2. Implement build_heap in O(n) by sifting down from index n/2 - 1 to 0. Verify on [7, 4, 9, 2, 6, 8, 1, 5, 3].

  3. Implement a priority queue with decrease_key(handle, new_key) backed by a heap and a handle -> index dictionary. Test by running Dijkstra on a 10-node graph.

Mini Application -- PQ Patterns

Pick three of the following and implement:

  1. Event-driven simulation of a single-server queue (Poisson arrivals, exponential service). Report mean wait time.
  2. Top-K over a stream of 1 million floats using a min-heap of size K.
  3. Median maintenance over the same stream with a max-heap/min-heap split.
  4. k-way merge of 10 sorted integer files using heapq.merge.
  5. Huffman code construction for a small alphabet with listed frequencies.
  6. Dijkstra shortest path on a 20-node weighted graph.

Analysis Drills

  • Given load factor alpha = 0.5 in linear probing, how many probes does a successful search do on average? Unsuccessful search?
  • For a heap, derive the O(n) build bound by summing sum_{h=0}^{log n} (n / 2^{h+1}) * h.
  • For Dijkstra using a binary heap on a graph with V vertices and E edges, show that the total PQ work is O((V + E) log V).

Evidence Check

This page is complete only if you can:

  • explain expected O(1) hashing in terms of a probability model and name what breaks it
  • implement chaining and open-addressing hash tables from scratch and explain the deletion difference
  • write sift-up and sift-down without looking at templates
  • recognize a priority-queue problem shape and implement it

Integrated Hashing and Heap Drill

Complete these small exercises before the larger katas:

  1. Insert 12 keys into 8 buckets using chaining. Draw the buckets and compute load factor.
  2. Repeat with linear probing. Show every probe for at least four insertions.
  3. Delete a key from the open-addressed table and explain why a tombstone may be needed.
  4. Build a min-heap from [9, 4, 7, 1, 3, 6, 2] using bottom-up heapify. Show the array after each sift-down.
  5. Use a priority queue to merge three sorted streams and write the runtime in terms of total items and number of streams.

Evidence check: include tests that force collisions and tests that verify heap order after every public operation.