Optimal Substructure and Overlapping Subproblems
What This Concept Is
Dynamic programming applies when a problem has two properties simultaneously (CLRS §14.3):
- Optimal substructure. An optimal solution to the whole problem contains optimal solutions to its subproblems. You can write a recurrence of the form $$OPT(x) = \text{combine over choices},(OPT(\text{smaller subproblems})).$$
- Overlapping subproblems. The same subproblem reappears many times in a naive recursive expansion. Without caching, the work blows up exponentially.
If only optimal substructure is present and subproblems do not overlap, divide-and-conquer (concept 9) is the right tool. If subproblems overlap, DP (memoization or tabulation, concept 13) replaces the recomputation with lookups.
A DP solution has three named artifacts that you must be able to state before writing code:
- State. The minimal set of parameters that describe a subproblem (e.g. "prefix length and target remaining"). The state space is the product of the ranges of those parameters.
- Recurrence (transition). How $OPT(\text{state})$ relates to $OPT$ of smaller states.
- Base cases. Values of $OPT$ at the boundaries of the state space.
Together these define the DP. Total runtime is (number of states) $\times$ (work per state), space is (number of states) unless you can reuse slots (concept 13).
Why It Matters Here
Fibonacci, coin change, edit distance, longest common subsequence, longest increasing subsequence, matrix-chain multiplication, knapsack, optimal BSTs, and most problems where "try every way to split" naturally leads to exponential blowup become polynomial under DP.
The paradigm also forces you to name the state clearly, which is the most useful thing you can do on a hard problem. Solutions where students get stuck usually fail in one of three places: wrong state (too few parameters - an answer depends on something the state doesn't track), wrong transition (the recurrence misses a case), or missing base case. All three are easier to see when you write state, recurrence, base out loud before coding.
Concrete Example
Example 1 (Fibonacci - the overlap microscope).
Naive recursion:
def fib(n):
if n < 2: return n
return fib(n - 1) + fib(n - 2)
Runtime: $T(n) = T(n-1) + T(n-2) + 1 = \Theta(\varphi^n)$ where $\varphi \approx 1.618$. fib(40) already takes seconds because fib(38) is computed twice, fib(37) three times, fib(36) five times, ... - a Fibonacci number of redundant calls.
Optimal substructure: $F_n$ depends on $F_{n-1}$ and $F_{n-2}$, both computed optimally (trivially, they are just numbers).
Overlap: on the call tree from fib(n), the subproblem fib(k) is visited $\Theta(\varphi^{n-k})$ times for each $k$.
Caching (concept 13) gives $\Theta(n)$ time, $\Theta(n)$ memory (reducible to $\Theta(1)$ since each state only depends on the last two).
Example 2 (coin change - minimum coins). Denominations $C$, target $T$. Recurrence: $$OPT(t) = \begin{cases} 0 & t = 0 \ 1 + \min{OPT(t - c) : c \in C, , c \le t} & t > 0, \text{if any } c \text{ applies} \ \infty & \text{otherwise} \end{cases}$$
- State. A single integer $t \in [0, T]$. $T + 1$ states.
- Transition. Try each coin $c$, take the best outcome.
- Base. $OPT(0) = 0$.
- Work per state. $|C|$.
- Total runtime. $\Theta(T \cdot |C|)$.
Optimal substructure: if using a coin of value $c$ is part of the optimal solution at $t$, then the $(t - c)$-subproblem must be solved optimally (any cheaper solution to $t - c$ would give a cheaper solution to $t$).
Overlap: coins(t-c1) and coins(t-c2) both call coins(t-c1-c3) when $c3 = c2 - c1$, creating massive reuse.
Example 3 (edit distance). Given strings $X[0..m]$ and $Y[0..n]$, transform $X$ into $Y$ using insert, delete, replace; find minimum operations.
- State. $(i, j)$ = length of $X$-prefix, length of $Y$-prefix. $(m+1)(n+1)$ states.
- Recurrence. $$OPT(i, j) = \begin{cases} j & i = 0 \ i & j = 0 \ OPT(i-1, j-1) & X[i-1] = Y[j-1] \ 1 + \min(OPT(i-1, j), OPT(i, j-1), OPT(i-1, j-1)) & \text{otherwise} \end{cases}$$
- Total runtime. $\Theta(mn)$.
- Overlap. Naive recursion has $T(m, n) = O(3^{m+n})$; DP collapses it to quadratic.
Common Confusion / Misconceptions
- "DP is recursion with a dictionary." That is the mechanism, not the principle. If the recurrence is wrong or the state definition is wrong, memoizing it just caches wrong answers faster. The content of DP is choosing the right state.
- "Using DP when there is no overlap." A divide-and-conquer recursion has optimal substructure but each subproblem is encountered once, so memoization buys nothing. Mergesort should not be "DP'd."
- "All subproblems must be visited." No - top-down memoization visits only the subproblems actually reachable from the starting call. Bottom-up tabulation visits all states whether needed or not (concept 13).
- "The state is always obvious." State design is the hard part. For LIS, the state could be "index of last included element" ($\Theta(n^2)$) or "value of last included element" (gives the $\Theta(n \log n)$ patience-sort formulation). Same problem, different states, different runtimes.
- "DP gives optimal solutions, greedy gives fast solutions." Both are correct when their preconditions hold. DP is a broader hammer; greedy is a sharper one. See concept 14.
- "Dimension count of the state equals the asymptotic exponent." Often true ($O(n^2)$ for a 2-state DP over $[0, n] \times [0, n]$) but not always: the transition cost can be expensive ($\Theta(n)$ per state) or cheap ($\Theta(1)$ per state), changing the final complexity.
How To Use It
Design protocol:
- Identify the decision. What is chosen at each step (which coin, which split point, which letter to match, which item to take)?
- Define the state. What parameters describe a subproblem completely so that the decision does not depend on anything outside them?
- Write the recurrence. Relate $OPT(\text{state})$ to $OPT(\text{smaller states})$. Enumerate every possible choice; take the best (min/max/sum/count as appropriate).
- State base cases. The boundaries of the state space.
- Verify overlap. Draw the subproblem graph for a small example; does the same state reappear?
- Size the state space. Count distinct states. Runtime $\le$ (states) $\times$ (work per state).
- Prove correctness. Argue optimal substructure: any optimal solution to a larger state contains optimal solutions to the referenced smaller states.
Before writing code, you should be able to state the recurrence and the state count. If either is unclear, stop and think. The hardest DP problems are the ones where the right state is not obvious - often you need an extra dimension that tracks something the simpler formulation missed.
Transfer / Where This Shows Up Later
- S1 M2 (counting) gives you the state-space sizing tools (choose, permutations) you use to count DP states.
- S1 M3 (probability) extends DP to expected values: "probabilistic DP" = Bellman equations; you meet this again in RL-adjacent material and in queueing.
- S4 (systems) uses DP for compiler register allocation (interval graph coloring as a DP on chordal graphs) and cost-based instruction scheduling.
- S5 (OS) formulates optimal page-replacement (Belady's / offline caching) as a DP; online approximations benchmark against it.
- S6 (databases) runs query planning as DP over join trees: state = set of relations joined so far, transition = add next join. Selinger's optimizer is this DP.
- S7 (architecture) uses DP to optimize deployment topologies under latency and cost constraints.
- S8 (distributed systems) uses DP for optimal sharding (state = partitioning so far) and for Viterbi-style consensus log reconciliation.
Concept 13 implements this page; concept 14 contrasts it with greedy; concept 18 tells you how to spot DP in triage.
Check Yourself
- Define optimal substructure and overlapping subproblems in your own words, and give an example of a problem that has one but not the other.
- Why does the naive Fibonacci recursion take exponential time, and what is the ratio of overlap to non-overlap?
- What does "state" mean in the DP formulation of edit distance, and why is two dimensions enough?
- Give a problem where adding an extra state dimension changes the correct answer, not just the runtime.
- In weighted interval scheduling, why must the state include "last included interval index" rather than just "intervals processed so far"?
Mini Drill or Application
For each problem, write down:
- the state (what defines a subproblem)
- the recurrence
- the base cases
- the number of distinct states
- total runtime (states $\times$ work per state)
- Fibonacci numbers.
- Minimum coin change with denominations $C = [1, 5, 10, 25]$, target $T$.
- Longest increasing subsequence of an array $A$ (state with index; also state with "length $k$ whose last value is minimized" for $O(n \log n)$).
- Edit distance between strings $X[0..i]$ and $Y[0..j]$.
- $0/1$ knapsack with capacity $W$ and items of (weight, value).
- Matrix-chain multiplication: minimum scalar multiplications to compute $A_1 A_2 \cdots A_n$.
- Implementation drill. Implement top-down memoized
edit_distance(X, Y). Print the recursion tree foredit_distance("kitten", "sitting")showing which states are visited more than once. Derive the runtime $\Theta(mn)$ by counting states times per-state work.
Read This Only If Stuck
- Skiena: Fibonacci numbers by recursion (shows blowup)
- Skiena: Fibonacci numbers by caching
- CLRS: Rod cutting (classic DP example)
- CLRS: Elements of dynamic programming
- Competitive Programming: Dynamic programming
- Competitive Programming: DP illustration
- Grokking: Dynamic programming, Part 1
- CP Algorithms: Introduction to DP (worked top-down -> bottom-up walkthrough on Fibonacci)
- MIT 6.006 Fall 2011 - Dynamic programming lectures (Demaine's state-recurrence-base methodology)