Debugging Complex Algorithms Systematically

What This Concept Is

Algorithm debugging is not print-statement archaeology. It is a disciplined process:

Reproduce with a minimal failing input. Shrink the input until removing anything else makes the failure disappear.
State the expected invariant at each step. What should be true of the data structure after iteration $k$?
Compare observed state against invariant. The first step where they diverge is the bug.
Classify the failure. Wrong recurrence, wrong state definition, wrong comparator, off-by-one in loop bounds, missing base case, integer overflow, stale cached value, mutated input, aliasing.
Fix the root cause, not the symptom. Adding special cases to suppress the failure you observed almost always hides a second bug that will surface on another input.

The loop here is "shrink -> invariant -> find first divergence -> classify -> root-cause fix -> re-test." Skipping any step costs time disproportionate to the time saved.

Why It Matters Here

Every paradigm in this module produces different bug patterns:

Recurrence DP. Wrong state, wrong transition, off-by-one on base case, overlap missed, inner-loop order wrong (writing to dp[w] before reading dp[w - weight]).
Greedy. Counterexample not considered; the algorithm is wrong, not buggy, so patching individual cases is futile. The fix is a new algorithm, not new code.
Divide-and-conquer. Merge step handles empty halves incorrectly; recursion on same-size input (non-termination); off-by-one in the split point.
Backtracking. Failing to restore state after a branch returns; incorrect pruning that skips feasible subtrees; branch ordering that delays finding solutions.
Asymptotic miscalculations. A loop inside a library call (in on a list, string concatenation) inflates constants by orders of magnitude.

Recognizing the pattern matters more than pounding on the code. A mis-classified bug can absorb days; a correctly-classified one usually resolves in minutes.

Concrete Example

Example 1 (coin change bug hunt). Suppose min_coins([1, 3, 4], 6) returns 3 instead of 2.

Step 1. Reproduce. Confirm with exactly that input on exactly your code.

Step 2. State invariant. After iteration $t$ of the outer loop, dp[0..t] holds the minimum-coin count for amounts $0..t$.

Step 3. Observe dp after each iteration. Print only dp[0..t] on iteration $t$, not the full table.

t=0: [0, INF, INF, INF, INF, INF, INF]
t=1: [0, 1,   INF, ...]
t=2: [0, 1,   2,   INF, ...]
t=3: [0, 1,   2,   1,   ...]     # dp[3] = 1 using coin 3
t=4: [0, 1,   2,   1,   1, ...]  # dp[4] = 1 using coin 4
t=5: [0, 1,   2,   1,   1,   2,  ...]  # dp[5] = dp[5-1]+1 = 2
t=6: [0, 1,   2,   1,   1,   2,  2]    # expected 2 (use 3+3)

If instead at t=6 we see 3, likely causes:

The inner loop does not iterate over coin 3 for $t = 6$ (off-by-one on bounds: maybe c < t instead of c <= t).
dp[6-3] is read before it was finalized for the bottom-up ordering you wrote.
Integer overflow / sentinel confusion if INF is not actually larger than 3.

The discipline: do not guess. Run, print state at each step, find the first divergence, read the two or three lines that update state between the last-good and first-violated step. The bug is right there.

Example 2 (backtracking state leak). A sudoku solver returns a grid with two 5s in a row. First suspect: the undo after a failed branch did not reset the candidate set. Instrument: after every return from solve(cell), assert board[cell] equals its pre-call value. The first violation names the exact branch that forgot to undo.

Example 3 (D&C merge bug). Mergesort returns a mostly-sorted array with one element misplaced. Shrink: binary-search on input prefixes until the smallest failing array is found (often length 2 or 3). For those, step through merge([a], [b]) by hand: if you see the off-by-one (while i < len(L) and j <= len(R) - note the mismatched <=), the bug is obvious in the minimal case but invisible in the original.

Common Confusion / Misconceptions

"Just look at the code." You will look at the expression that seems fishy, not the expression that is wrong. Debugging by eyeballing scales with how carefully you wrote the code, not with the size of the bug. Use the invariant to tell you where to look.
"Add more prints until something stands out." Too many prints drown the signal. Pick the invariant you want to verify and print exactly that. Emit structured lines (one per iteration, consistent columns).
"The bug is in the complicated part." Bugs live in the boring parts: loop bounds, base cases, comparators, integer overflow, copy-vs-reference semantics, aliasing. The complicated part is where the algorithm lives; it was written more carefully.
"Reduce later; fix now." Reducing the failing input makes every subsequent step easier. A 3-element failure is 100x faster to debug than a 3000-element failure with the same root cause.
"Patch the case I saw." The case you saw is often one of several. Root-cause fixes handle the family; case-patches postpone the next discovery.
"Debuggers are a crutch." Step debuggers (pdb, gdb, the IDE) are a force multiplier for recursive code. Print debugging is fine; step debugging lets you watch state evolve without writing any instrumentation.

How To Use It

When stuck:

Minimize the failing input. Delete elements until failure disappears; restore the last one. Binary-search on input prefixes for speed.
Write down the invariant the inner loop or recursion should preserve.
Instrument with a print that checks exactly that invariant, not a dump of everything.
Find the first violation - the first iteration or call where observation contradicts invariant.
Read the two or three lines that updated the state between the last-good and first-violated step. The bug is in those lines 90% of the time.
Classify the bug (see list above) before fixing. The classification often reveals related bugs you would otherwise miss.
Fix the root cause, not the symptom. Add a regression test for the minimal input.

Also: always keep a brute-force oracle for algorithms where one is cheap (concept 15). Many "bugs" turn out to be wrong algorithms, which only the oracle can distinguish. A greedy debugging loop that never runs against an oracle is indistinguishable from polishing a wrong answer.

Transfer / Where This Shows Up Later

S1 M5 (problem solving) introduced "stop, look, find the smallest failing case." This page is the disciplined version.
S4 (systems) applies the same skeleton to race-condition debugging: shrink the thread schedule (deterministic replay, rr), state the concurrency invariant (e.g., "only one thread holds the lock"), find first violation.
S5 (OS) debugs kernel bugs by crash-dump analysis - the crash is the first violation; the invariant is some kernel data-structure property.
S6 (databases) debugs query planner mistakes by running EXPLAIN ANALYZE (observed state) against an oracle cost model (invariant).
S7 (architecture) debugs drift in architectural properties: the fitness function (concept 13 of S7 M5) is the invariant; a failing CI run is the first violation; the diff between last-green and first-red is the shrunk input.
S8 (distributed systems) debugs consensus failures with log-ordering invariants and deterministic replay from transcripts; the "minimal failing trace" of messages is the shrink step.

Concept 15 (testing) produces the failing input this page debugs; concept 17 tells you when the "bug" is actually constants, not correctness.

Check Yourself

Why is a minimal failing input more valuable than a complete failing input?
Describe one bug pattern specific to backtracking, and one specific to DP.
When should you conclude that your "bug" is actually that the algorithm itself is wrong?
Your DP returns the correct answer on random inputs but wrong on a specific hand-crafted input. What does that pattern suggest about the bug?
A print-based debug on a recursive function produces 40,000 lines. How do you reduce the noise without losing signal?

Mini Drill or Application

For each scenario, write a minimal triage plan (inputs to try, invariants to check).

Your quicksort returns the array mostly sorted but with one element out of place on large random inputs.
Your DP for edit_distance returns 0 for ("abc", "abd").
Your backtracking sudoku solver returns a grid where one row has two 5s.
Your greedy interval scheduler picks fewer intervals than the brute force oracle on input $[(0,3), (1,2), (2,5), (4,6)]$.
Your mergesort crashes with RecursionError on large inputs in Python.
Implementation drill. Take a deliberately-broken memoized lis(A) (longest increasing subsequence) where the recurrence is off-by-one. Given a failing input of length 20, shrink it by binary-searching on prefixes until the minimum failing length is found (expect length 2 or 3). Print the DP table at each step and identify the first violated invariant. Then fix the recurrence and add a regression test.

Read This Only If Stuck

Skiena: War story - Skiena for the defense (debugging lesson)
Skiena: War story - Give me a ticket on an airplane (implementation bug)
Skiena: War story - And then I failed (debugging under pressure)
Skiena: War story - Hard against the clock
Competitive Programming: Tip 5 - Master the art of testing code
Skiena: Reasoning about efficiency (distinguish "slow" from "wrong")
CLRS: Analyzing algorithms (invariant-based verification)
Python pdb documentation (step debugger for recursive algorithms)
Hypothesis shrinking explained (automatic minimization of failing inputs)
rr: record and replay debugger (for language-level and systems debugging with deterministic replay)

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconceptions​

How To Use It​

Transfer / Where This Shows Up Later​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​