Memoization Versus Tabulation

What This Concept Is

Once you have a DP recurrence (concept 12), there are two standard implementations:

Top-down memoization. Write the recursive function exactly as the recurrence states; cache the answer for each state the first time it is computed; return the cached value on later calls. Often implemented as @lru_cache or an explicit dict.
Bottom-up tabulation. Fill a table of states in an order such that every state is computed after its dependencies; no recursion. Typically a loop nest over the state dimensions.

Both have the same asymptotic runtime (state count times work per state) when the whole state space is fully explored. They differ in:

Which states they touch. Top-down visits only states reachable from the starting call; bottom-up visits everything.
Memory behavior. Bottom-up enables aggressive space-optimization: if the recurrence only depends on the previous row/column, the table collapses to $O(\text{width})$ from $O(mn)$.
Stack usage. Top-down uses recursion depth proportional to the longest dependency chain; bottom-up uses $O(1)$ call stack.
How easy they are to write correctly. Top-down matches the recurrence line-for-line; bottom-up requires a valid topological iteration order over the state graph.

Why It Matters Here

Choosing between them is an engineering decision you will make every time you implement a DP. Memoization handles complex state spaces, sparse evaluation, and recurrences whose dependency order is hard to schedule manually. Tabulation wins on constant factors, enables space-optimization tricks, and avoids recursion depth limits (Python's default of ~1000 will bite you on large inputs).

In competitive programming, tabulation is the default because contests care about wall-clock speed and predictable memory. In exploratory prototyping, memoization is the default because code velocity matters more than constants. In production, the choice follows the input regime: dense and small -> tabulate; sparse or tree-shaped state -> memoize.

Concrete Example

Example 1 (Fibonacci, three implementations).

Top-down:

from functools import lru_cache

@lru_cache(maxsize=None)
def fib_td(n):
    if n < 2: return n
    return fib_td(n - 1) + fib_td(n - 2)

Bottom-up:

def fib_bu(n):
    if n < 2: return n
    dp = [0] * (n + 1)
    dp[1] = 1
    for i in range(2, n + 1):
        dp[i] = dp[i - 1] + dp[i - 2]
    return dp[n]

Space-optimized bottom-up (only last two entries needed):

def fib_opt(n):
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return a

The top-down version matches the recurrence line for line. The bottom-up version exposes that the table only needs two slots, which the third version exploits. All three are $\Theta(n)$ time; memory is $\Theta(n)$, $\Theta(n)$, $\Theta(1)$ respectively.

Example 2 (coin change, bottom-up).

def min_coins(C, T):
    INF = float('inf')
    dp = [INF] * (T + 1)
    dp[0] = 0
    for t in range(1, T + 1):
        for c in C:
            if c <= t and dp[t - c] + 1 < dp[t]:
                dp[t] = dp[t - c] + 1
    return dp[T] if dp[T] != INF else -1

Top-down visits only the amounts reachable from $T$ by subtracting coins; bottom-up visits every amount $0..T$ even if many are unreachable. For denominations ${1, 5, 10, 25}$ and $T = 10^5$, both are fine; for pathological denominations where most values are unreachable, top-down can be 10x faster.

Example 3 ($0/1$ knapsack with space optimization). Full DP is $\Theta(nW)$ space. Observation: dp[i][w] depends only on dp[i-1][...]. Process items sequentially, keeping only one row:

def knapsack(weights, values, W):
    dp = [0] * (W + 1)
    for i in range(len(weights)):
        for w in range(W, weights[i] - 1, -1):   # reverse crucial: don't reuse item
            dp[w] = max(dp[w], dp[w - weights[i]] + values[i])
    return dp[W]

The reversed inner loop is the subtle part: forward iteration would allow using the same item multiple times (unbounded knapsack). The bottom-up formulation makes this optimization visible and correct; the top-down form obscures it.

Example 4 (longest common subsequence with row rollover). $\Theta(mn)$ time, $\Theta(\min(m, n))$ memory by keeping only the last two rows.

Common Confusion / Misconceptions

"Top-down is always cleaner." Clean to write, yes, but it carries call-stack overhead and is subject to recursion limits. For $T = 10^5$ coin change on Python's default recursion limit, bottom-up is the safer choice. C++ stacks handle millions easily; Python's do not.
"Bottom-up is always faster." In constants, usually. But if the state space is sparse (only a small fraction of states are reachable), top-down touches fewer states and can win by a large margin.
"Bottom-up requires no thought about dependencies." It requires you to know a valid iteration order over states. If your recurrence depends on a future state (e.g., $dp[i][j]$ uses $dp[i+1][j]$), the order is wrong or the recurrence is.
"Space optimization is free." It destroys the ability to reconstruct the solution path unless you save extra information. If you need the optimal coin sequence, not just the count, you need either the full table or a separate back-pointer array.
"Memoize means decorate and go." Memoizing a function with mutable arguments (lists, dicts) silently fails - Python's lru_cache requires hashable arguments. Convert to tuples or encode state as integers.
"If two implementations agree, the code is right." They often agree on small inputs because both bugs trigger only at the boundary. Stress test (concept 15) with a brute-force oracle, not with the other DP.

How To Use It

Defaults:

Prototype with memoization. The code looks like the recurrence; if the recurrence is correct, the code is.
If the state space is dense and small, convert to bottom-up to shrink constants.
Space-optimize when the recurrence depends only on the last few rows/columns.
If you hit recursion limits or cache-inefficiency, switch to bottom-up.
Test both implementations against a brute-force oracle on small inputs before trusting either.
Identify the iteration order explicitly for bottom-up: a state can be computed when all its dependencies have been computed.
Keep a path-reconstruction plan. If you need the actual optimal solution, not just the value, decide upfront: full table, back-pointers, or recompute by tracing.

Transfer / Where This Shows Up Later

S4 (systems) applies the same memoization vs table trade-off to compiler caches (inlined code vs LUT), CPU branch predictors (history tables), and JIT compilers (keep warm traces in a table).
S5 (OS) caches expensive kernel computations (e.g., fair-share scheduling weights) with memoization when the input distribution is skewed and tabulation when it is uniform.
S6 (databases) caches query plans (memoization on parameterized query shape) and materializes views (tabulation over the state space of queries).
S7 (architecture) uses memoization for expensive policy evaluations (authz decisions cached by principal/resource/action tuple) and tabulation for precomputed routing / sharding maps.
S8 (distributed systems) uses memoization for idempotency keys on retry paths and tabulation for precomputed consistent-hash rings.

Concept 12 defines the DP; concept 14 compares DP with greedy; concept 17 reminds you that constants decide the close calls.

Check Yourself

Why do top-down and bottom-up have the same big-O runtime for a given recurrence?
When is bottom-up strictly worse in wall-clock time than top-down?
How does space optimization work for the $0/1$ knapsack DP, and why is the order of the inner loop crucial?
If the DP needs to recover the optimal solution (not just its cost), which of the two implementations makes reconstruction easier?
What goes wrong if you memoize a recursive function whose arguments include a Python list?

Mini Drill or Application

For each problem, implement both top-down and bottom-up.

Fibonacci (and space-optimize the bottom-up version).
Minimum coin change.
Longest common subsequence of $X$ and $Y$; write the $O(\min(|X|, |Y|))$-space version.
$0/1$ knapsack; write the $O(W)$-space version.
Matrix-chain multiplication; include parenthesization reconstruction.
Implementation drill. Solve coin change for $C = [1, 3, 4]$ and $T = 10^4$. Compare:
- Top-down memoization with @lru_cache (time, peak memory).
- Bottom-up tabulation.
- Space-optimized bottom-up (if the recurrence admits it). Complexity derivation: $\Theta(T \cdot |C|)$ time, bottom-up $\Theta(T)$ space, top-down $\Theta(\text{reachable} \cdot |C|)$ time with same space bound.

Read This Only If Stuck

Skiena: Fibonacci by caching (memoization)
CLRS: Rod cutting (both implementations)
CLRS: Elements of dynamic programming
Grokking: Dynamic programming, Part 1
Grokking: Filling in the grid (DP tabulation intuition)
Competitive Programming: DP illustration
Competitive Programming: Classical DP examples
CP Algorithms: Introduction to dynamic programming (explicit top-down / bottom-up / space-optimized Fibonacci walkthrough)
CSES Problem Set - Dynamic Programming (standard DP problems graded by difficulty)
Python functools.lru_cache documentation (stdlib memoization primitive)

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconceptions​

How To Use It​

Transfer / Where This Shows Up Later​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​