Bitmask DP: TSP and Subset-Enumeration DP

What This Concept Is

Bitmask DP encodes a subset of a small universe as an integer bitmask (bit $i$ = element $i$ is in the subset). The state space is typically $2^n \cdot n$ or $2^n \cdot k$ for some small $n$ (usually $n \le 20$ to keep $2^n \le 10^6$). The topological order on the subset lattice is "fill by increasing popcount(mask)."

The canonical problem is the Traveling Salesman Problem (TSP) on $n$ cities, solved by the Held-Karp DP (1962):

Let $dp[S][j]$ = minimum cost of a path that starts at city $0$, visits exactly the cities in $S$, and ends at city $j \in S$ (assume $0 \in S$).

$$dp[S][j] = \min_{k \in S \setminus {j},\ k \ne 0 \text{ unless } S \setminus {j} = {0}} \big( dp[S \setminus {j}][k] + c(k, j) \big)$$

Base: $dp[{0}][0] = 0$.

Final answer: $\min_{j \ne 0} \big( dp[\mathrm{full}][j] + c(j, 0) \big)$, the cost of returning to start.

Time: $O(2^n \cdot n^2)$. Space: $O(2^n \cdot n)$. Dramatically faster than naive $O(n!)$ enumeration for $n$ up to $\sim 20$ -- Held-Karp pushes TSP from factorial to exponential-with-small-base.

Beyond TSP, bitmask DP appears for any problem where the state is "which of a small universe have I already used/visited/matched?" -- assignment, set partitioning, coloring, Hamiltonian variants, set-cover approximations.

Why It Matters Here

Bitmask DP is the standard technique for "which subset have I used so far" problems where $n$ is small. It covers assignment problems, tour problems, set-cover-like selections, and many "visit every node exactly once" variants. Past $n = 20$ or so, $2^n$ blows up and another approach is needed -- branch-and-bound, approximation (Christofides for metric TSP gives a $3/2$ factor), or problem-specific structure (Euclidean TSP has PTAS).

It is also the first DP where topological order on a lattice becomes non-trivial. "Fill by increasing popcount" is correct but subtle; implementing it naively (iterating mask from 0 to 2^n - 1) works because smaller subsets have strictly smaller integer values, but only for the specific transition "add one element."

Concrete Examples

Example 1: TSP with 4 cities. Distance matrix

1  2  3
[ 0, 2, 9, 10]
[ 1, 0, 6,  4]
[15, 7, 0,  8]
[ 6, 3, 12, 0]

Compute $dp[S][j]$ for increasing $|S|$ (subsets containing $0$).

After filling, the optimal tour cost is $21$ via $0 \to 2 \to 3 \to 1 \to 0$ (check: $9 + 8 + 3 + 1 = 21$).

Subset iteration:

for mask in range(1, 1 << n):
    for j in range(n):
        if not (mask & (1 << j)): continue
        if mask == (1 << j):              # only city j visited
            dp[mask][j] = cost(0, j)
            continue
        prev = mask ^ (1 << j)
        dp[mask][j] = min(dp[prev][k] + cost(k, j)
                          for k in range(n) if prev & (1 << k))

Example 2: Assignment problem. Given an $n \times n$ cost matrix, assign each worker to a distinct task minimizing total cost. State $dp[\mathrm{mask}]$ = min cost to assign workers $0..\mathrm{popcount}(\mathrm{mask}) - 1$ to tasks in $\mathrm{mask}$. Transition: worker $i = \mathrm{popcount}(\mathrm{mask}) - 1$ picks some task $j$ already in $\mathrm{mask}$; pre-decrement.

$$dp[\mathrm{mask}] = \min_{j \in \mathrm{mask}} \big( dp[\mathrm{mask} \setminus {j}] + c(\mathrm{popcount}(\mathrm{mask}) - 1,\ j) \big).$$

Time $O(2^n \cdot n)$; cleaner than Held-Karp because only one index is needed alongside the mask.

Example 3: Subset-sum count via bitmask. Count subsets of an array with given sum. For $n \le 20$ (small), iterate all $2^n$ subsets directly; for larger $n$, use subset-sum DP on value, not on mask. The bitmask form is useful when you want to enumerate specific subsets (e.g., for reconstruction).

Memoized Held-Karp:

@lru_cache(None)
def f(mask, j):
    if mask == (1 << j): return cost(0, j)
    prev = mask ^ (1 << j)
    return min(f(prev, k) + cost(k, j)
               for k in range(n) if prev & (1 << k))

Only states reachable from the full mask are computed -- a minor saving for TSP where essentially all states are reached, but a clean formulation nonetheless.

Common Confusion / Misconceptions

"$n$ is deceptively limited." $2^{20} \approx 10^6$, so $n = 20$ with an inner $n^2$ loop gives $4 \cdot 10^8$ ops -- on the edge of tractable. $n = 25$ is $2^{25} \cdot 25^2 \approx 2 \cdot 10^{10}$, too much. Students who try $n = 30$ without thinking hit a wall. Budget: $n \le 20$ for $n \cdot 2^n$, $n \le 16$ for $n^2 \cdot 2^n$ or $3^n$ submask work.

"Bitmask is a set indicator." Sometimes yes. But bitmask as ordered path prefix is different. TSP uses subsets; the DP does not care about the visit order within $S$, only that $S$ has been visited. Problems that depend on visit order need a different state (often a sequence, not a mask).

"All $2^n$ states are reachable." Not necessarily. For Hamiltonian-path variants starting at a fixed vertex, only masks containing that vertex matter -- halving the state space. For problems with forbidden transitions, many states have value $+\infty$ and can be skipped at transition time for speedups.

"Bitmask DP is always exact." It is, for the subset-indexed DP it defines. But the problem (e.g., TSP) remains NP-hard; bitmask DP just gives the fastest known exact algorithm for small $n$. For large $n$, switch to approximation or heuristics -- Lin-Kernighan, concorde-style branch-and-cut, simulated annealing.

"Iterating masks from 1 to $2^n - 1$ respects topology." It does for transitions that add one element (smaller subset has smaller integer value). It does not for transitions that change multiple bits at once. For SOS DP (sum over subsets), iterate in a specific bit-by-bit order.

How To Use It

Check that $n \le \sim 20$ (or the relevant $n \cdot 2^n$ fits your time budget, typically $10^7$-$10^8$).
Let $dp[\mathrm{mask}][\ldots]$ encode "elements in $\mathrm{mask}$ have been used/visited."
Transition by adding one new element (or removing one, depending on direction).
Fill by increasing $\mathrm{popcount}(\mathrm{mask})$ -- a specific topological order on the subset lattice.
Read answer from $dp[\mathrm{full_mask}][\ldots]$ or iterate over ending states.
For reconstruction, store predecessor[S][j] = k alongside the DP value.
If transitions depend on "all subsets of $\mathrm{mask}$," use submask iteration (next concept): $\sum_S 2^{\mathrm{popcount}(S)} = 3^n$.

Transfer / Where This Shows Up Later

S3 Software Design. Plugin / capability systems with at most 20-30 flags can be optimized at plan time using bitmask DP ("select the cheapest combination of flags that satisfies all constraints").
S5 Networking. Small multi-hop routing under "visit each data-center once" constraints (debug tours, synchronized backups) reduces to Held-Karp.
S6 Databases. Multi-way join enumeration for small $n$ (say $n \le 14$) uses bitmask DP over join subsets -- exactly the DPhyp algorithm in modern query optimizers.
S7 Architecture. Feature-selection under constraints among a small set of architectural alternatives is bitmask DP over the option set.
Phase 7 AI. Factored Markov Decision Processes with small factored state use bitmask DP variants. Set-cover in ML pipelines (selecting features whose combined coverage is maximal) uses the same pattern with approximation when $n$ grows.

Check Yourself

Why does bitmask DP fail when $n = 30$? What is the crossover to other techniques?
For TSP, why does $dp[S][j]$ not also index by "starting city"?
How would you modify the DP to forbid certain edges?
Show that iterating mask from 0 to 2^n - 1 respects the fill order for "add one element" transitions.
Express the assignment-problem recurrence in one line.
For $n = 18$, estimate the runtime of Held-Karp on a modern CPU.

Mini Drill or Application

Implement TSP via Held-Karp for $n \le 16$ and verify on a small random graph vs brute-force factorial.
Implement "assignment problem" via bitmask DP: given an $n \times n$ cost matrix, assign each worker to a distinct task minimizing total cost. State: $dp[\mathrm{mask}]$ = min cost to assign workers $0..\mathrm{popcount}(\mathrm{mask}) - 1$ to tasks in $\mathrm{mask}$.
Implement "minimum cost Hamiltonian path" (no return to start) as a variant of TSP.
Add path reconstruction to TSP: along with $dp[S][j]$, store the predecessor $k$ that achieved the minimum.
Count the number of Hamiltonian paths (not tours) in a given graph via bitmask DP with summation instead of min.
Implement "minimum number of groups to partition $n$ items such that each group satisfies a compatibility mask" -- bitmask DP with submask iteration.

What This Concept Is​

Why It Matters Here​

Concrete Examples​

Common Confusion / Misconceptions​

How To Use It​

Transfer / Where This Shows Up Later​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​