Skip to main content

Searching and Selection Workshop

Retrieval Prompts

  1. State the loop invariant for binary search on a closed interval [lo, hi].
  2. State the invariant change needed to compute lower_bound instead of equality.
  3. Define predicate-based binary search and name one condition the predicate must satisfy.
  4. Describe the recurrence for expected-time quickselect.
  5. What does median-of-medians guarantee that random quickselect does not?
  6. When is "min-heap of size K" better than sorted(data)[:K] for top-K, and when is it worse?

Compare and Distinguish

  • binary search (equality) vs lower_bound vs upper_bound
  • "search on a sorted array" vs "binary search on the answer" (predicate over a value range)
  • quickselect vs median-of-medians
  • top-K via heap of size K vs top-K via select + partition
  • linear search in an unsorted array vs hash-table lookup

Common Mistake Check

For each statement, identify the error:

  1. "Binary search on a linked list is O(log n)."
  2. "mid = (lo + hi) / 2 is always safe."
  3. "Quickselect is worst-case O(n) because the recurrence is T(n) = T(3n/4) + n."
  4. "To find the median of n elements you must first sort them all."
  5. "A predicate-based binary search works on any boolean function of the position."
  6. "std::nth_element gives you a fully sorted array."

Mini Application -- Invariant-First Coding

Before writing any code for any of these, declare the region convention (closed or half-open), write the invariant in one sentence, and only then code.

  1. binary_search(A, target) -> index | -1: classical equality search.
  2. lower_bound(A, x) -> index: smallest i with A[i] >= x. If no such i, return len(A).
  3. upper_bound(A, x) -> index: smallest i with A[i] > x.
  4. first_occurrence(A, x) and last_occurrence(A, x): two one-liners using the two bounds above.
  5. Predicate search: given a monotone non-decreasing boolean function P(k) on k in [lo, hi], return the smallest k with P(k) = true.

Mini Application -- Selection Problems

  1. Implement quickselect(A, k) with a random pivot. Test that it returns the same value as sorted(A)[k-1] over 1000 random trials.

  2. Solve: given an array of size n and a value k, return the k smallest elements. Do it three ways:

    • sort, slice
    • quickselect for the k-th, then partition
    • min-heap of size k scanning the stream Measure the running times for n = 10^6 and various k.
  3. Solve: given a matrix sorted rowwise and columnwise, return the k-th smallest element in O(k log min(k, n)) using a heap.

Mini Application -- Binary Search on the Answer

  1. Given n houses at integer positions and m mailboxes to place, minimize the maximum distance from any house to its nearest mailbox. Frame this as predicate-based binary search.

  2. Given n jobs with durations and k workers, each worker processes a contiguous batch. Minimize the maximum worker load. Define the predicate and argue its monotonicity.

  3. Given a sorted array, find the square root of an integer x using only integer binary search.

Evidence Check

This page is complete only if you can:

  • write binary search and its two bound variants without looking at templates
  • explain what a predicate must satisfy to support binary search on the answer
  • pick among quickselect, heap-of-K, and full-sort for a specific top-K task and justify

Integrated List and Search Drill

Implement and test each operation without using a library shortcut that hides the algorithmic work:

  1. Remove all elements that satisfy a predicate while preserving order.
  2. Merge two sorted lists into a new sorted list.
  3. Return whether any two values sum to a target using a hash-backed lookup.
  4. Reverse a list in place.
  5. Rotate a list by k positions.
  6. Implement linear search and binary search with the same return convention.

Evidence check:

  • unit tests for empty, one-item, duplicate, and already-sorted cases
  • a short note comparing when linear search beats binary search in practice despite worse asymptotic growth
  • one benchmark or hand count showing the cost of sorting before searching many times