Skip to main content

Termination Needs a Decreasing Measure

What This Concept Is

An algorithm terminates if every execution path reaches a stopping state in finitely many steps. The standard way to prove this is to name a variant (also called a ranking function or termination measure): a value that is bounded below and strictly decreases at every step. "At every step" is the load-bearing phrase - one step where the variant does not decrease is enough to ruin the argument.

If the variant takes values in a well-founded ordering - any ordering in which every strictly decreasing sequence is finite, such as the non-negative integers, lexicographic tuples on $\mathbb{N}^k$, or countable ordinals - then the variant cannot decrease forever, so the algorithm must stop. The non-negative integers are the overwhelmingly common case; reach for tuples or ordinals only when a single number obviously cannot work.

Three standard shapes of variant:

  • Scalar, bounded below by 0. Loop counter, remaining work, recursive size argument.
  • Lexicographic tuple. $(\text{remaining problems}, \text{remaining work in current problem})$. Decreases if either coordinate strictly decreases with no later coordinate increasing.
  • Multiset ordering. Used when the state is a collection that shrinks "in some sense" - e.g. Dershowitz-Manna ordering in term-rewriting proofs.
  • Ordinal. Assigns to each state an ordinal number below some countable ordinal. The recursion terminates because the ordinals below any given ordinal are well-ordered. Used for Ackermann-like or nested-recursion proofs.

In practice a vast majority of algorithms have a scalar variant; reach for lexicographic when you see "outer iteration reduces X, inner iteration may increase X but decreases Y," and reach for multiset/ordinal only in term-rewriting or very nested recursion.

Termination is orthogonal to correctness (concept 6): correctness says "if it returns, it returns the right answer"; termination says "it does return." Both are needed. Dijkstra's quip applies: partial correctness plus termination equals total correctness, and neither alone is enough to ship an algorithm you can rely on.

Why It Matters Here

Correctness is useless if the algorithm never returns. Recursive functions can silently infinite-loop when the argument does not actually shrink. Loops can fail to exit when their exit condition is never established (off-by-one, concurrent modification, floating-point non-convergence, data races that flip a flag back to its pre-check value between check and use).

A scalar variant failing to strictly decrease is also the canonical signal that an algorithm is subtly wrong: the programmer meant for the state to shrink, and it doesn't. Finding the variant is therefore both a correctness exercise and a debugging heuristic.

Termination arguments also sharpen your understanding of why the algorithm works, because the variant often tells you what quantity is being driven toward zero. Euclid's variant is b; binary search's variant is hi - lo; mergesort's variant is input length; Dijkstra's variant is the number of unsettled vertices. Naming the variant is frequently the cleanest way to see the invariant.

In systems courses the same skeleton appears as liveness: every pending request is eventually served. Liveness proofs are termination proofs on an infinite-state transition system with a well-founded rank.

(Termination is orthogonal to complexity: an algorithm may terminate in at most $k$ steps where $k$ is huge - Ackermann is the canonical example. Termination proofs tell you "finite"; complexity proofs tell you "how big.")

Concrete Example

Example 1 (Euclid's algorithm).

def gcd(a, b):
if b == 0:
return a
return gcd(b, a % b)

Variant. b. At every recursive call, the new second argument is a % b, which is strictly less than b whenever b > 0. Values are non-negative integers, so b cannot decrease forever. The recursion must reach b == 0 and return. Together with correctness (GCD $(a, b) =$ GCD $(b, a \mod b)$), this proves Euclid halts and returns the right answer.

Example 2 (binary search). Variant: hi - lo. Each iteration either returns or replaces [lo, hi] with a strictly smaller interval: lo = mid + 1 or hi = mid - 1, both shrinking the interval by at least 1. When hi < lo, the loop exits. Variant is a non-negative integer; must terminate.

Example 3 (mergesort). Variant: $\text{len}(A)$. Every recursive call passes a strictly shorter list. Base case at length $\le 1$. Terminates.

Example 3b (quicksort). Variant: $\text{len}(A)$. Every recursive call passes a strictly shorter sublist provided the pivot sits somewhere in the middle. With a bad pivot that always lands at the end, one side of the partition has length $\text{len}(A) - 1$, and the other has length 0 - strictly shorter, still. Termination holds even on worst-case input; runtime does not, but runtime is a separate story (concept 3).

Example 4 (Ackermann-like lexicographic variant).

def A(m, n):
if m == 0: return n + 1
if n == 0: return A(m - 1, 1)
return A(m - 1, A(m, n - 1))

Variant. Lexicographic pair $(m, n)$. Each recursive call reduces $m$ or keeps $m$ fixed and reduces $n$. $(m, n)$ is well-founded under lexicographic order, so termination holds - even though the running time is colossal. This is the canonical example where a single scalar does not suffice.

Common Confusion / Misconceptions

  • "The variable gets smaller sometimes." Not enough. The variant must decrease at every step (or on every recursive call, along every branch). A loop that sometimes decreases and sometimes increases the variant might still loop forever; that is the point of well-foundedness.
  • "The variant must be a loop index." It can be any measurable aspect of state: number of unresolved items in a queue, number of un-visited nodes in a graph, the sum of pending tasks plus the maximum priority, a lexicographic tuple, or even a tree height.
  • "One scalar is always enough." For nested loops or recursion plus iteration, a single scalar is sometimes not. Use a tuple with lexicographic ordering, e.g. (remaining_problems, remaining_work_in_current_problem).
  • "If it terminates in tests, it terminates." Collatz-style state machines can terminate on every test input yet have no known termination proof. Either prove the variant exists or caveat the claim.
  • "Termination implies a good runtime bound." Termination says finite; it says nothing about "fast." The Ackermann function terminates but is not computable within a reasonable time.
  • "Floating-point comparisons are fine for termination." They usually are not. A loop while x != target with floating-point arithmetic may never see the exact target because of rounding. Use $|x - \text{target}| < \epsilon$ for a fixed $\epsilon$ - the variant becomes the number of iterations until $\epsilon$ is reached, not $x$ itself.
  • "If every recursive call reduces the input, termination follows." Only if the reduction is by at least 1 each time and reaches a base case. Consider f(n) that recurses on n - 0.5 - each call "reduces" the argument but never reaches the base case. The variant must reach the bottom of a well-founded order.

How To Use It

To argue termination:

  1. Identify a quantity the algorithm touches or computes.
  2. Check whether it is bounded below (usually by 0 or by some fixed value).
  3. Check whether every step (every iteration of every loop, every recursive call) strictly decreases it - or decreases it lexicographically if you chose a tuple.
  4. Conclude termination from well-foundedness of the chosen ordering.
  5. If the algorithm branches, the variant must decrease on every branch.
  6. If the variant is not a scalar, spell out the ordering you are using and why it is well-founded.
  7. Write the variant down as a comment next to the loop or function. If a future maintainer cannot guess it from the code, the comment is load-bearing documentation.
  8. When the variant depends on external input (e.g., timeout in a retry loop), make the input's finiteness an explicit precondition: "terminates provided MAX_RETRIES is finite."

When a termination proof fails, it almost always reveals a real bug: an off-by-one that leaves the loop index unchanged, a recursive call on the same or larger input, a missing base case, or an exit condition that the state can never establish.

For concurrent code, the right generalization is liveness under fair scheduling: every runnable thread eventually runs. The variant becomes "number of threads of higher priority that have not yielded," plus a fairness assumption from the scheduler. Without fairness, even a finite-state concurrent program can fail to make progress.

Transfer / Where This Shows Up Later

  • S1 M3 (probability) introduced stopping times; a variant is the deterministic version of an almost-surely-finite stopping time.
  • S4 (systems) demands termination arguments for every lock acquisition: "we either succeed or release and retry," with the retry count bounded by an exponential-backoff variant.
  • S5 (OS / scheduling) calls termination "liveness": each runnable task must eventually run. The variant is "number of tasks of higher priority" or "number of quanta until fair-share deadline."
  • S6 (databases) requires termination of transaction-retry loops (MVCC abort/retry until success or give-up) and of query optimizer rewrite rules (each rewrite reduces an estimated cost - an explicit variant).
  • S7 (architecture) expresses progress fitness functions in terms of bounded retries, bounded queue depth, and bounded convergence time.
  • S8 (distributed systems) proves consensus and replication protocols terminate under fair scheduling by a variant on "leader term plus un-committed entries."

Concept 5 (loop invariants) is the iterative companion: invariant + variant = full correctness. Concept 6 (recursive correctness) pairs with this page so that every recursive proof has both.

Within this semester: S2 M2 shows that quicksort terminates on any pivot because sub-list lengths strictly decrease; S2 M3 shows that BFS/DFS terminate because the unvisited-set cardinality strictly decreases; S2 M4 shows that DP terminates because the state rank under topological order is a well-founded variant; S2 M5 shows that union-find with path compression terminates because each node is visited at most tree-depth-times on any single operation.

Check Yourself

  1. What does it mean for a variant to be "bounded below," and why is bounded-below + strictly-decreasing the minimum pair of properties needed for termination?
  2. Why is a strictly decreasing sequence of non-negative integers necessarily finite?
  3. Sketch a variant for a loop while x != y: x = (x + y) // 2. What happens on integer vs floating-point?
  4. Give a reason a scalar variant might not suffice, and a worked example where a lexicographic tuple does.
  5. Why is the Collatz recursion while n > 1: n = n // 2 if n % 2 == 0 else 3*n + 1 not a counterexample to your usual termination intuition? What is known and what is not?
  6. State a lexicographic variant for a nested loop for i in 1..n: for j in 1..i: work(). Why is the single outer counter alone insufficient?
  7. Your distributed algorithm retries on network failure with exponential backoff capped at 60s. What is the variant under fair scheduling, and why does it fail without the cap?

Mini Drill or Application

Give a termination measure and justify it for each. State the ordering explicitly.

  1. The recursive Fibonacci function that calls fib(n-1) + fib(n-2).
  2. A loop while n > 1: n = n // 2 if n % 2 == 0 else 3*n + 1. (Be honest: this is the Collatz conjecture; state what is and is not known.)
  3. Quicksort: argue termination even though recursive call sizes depend on the pivot.
  4. Depth-first search on a finite graph with a visited set.
  5. A polygon triangulation algorithm that repeatedly removes ears: variant = number of vertices remaining.
  6. A topological sort based on repeatedly removing a vertex with in-degree 0: variant = number of vertices remaining. What breaks if the graph is not a DAG?
  7. A work-queue loop that processes tasks and may spawn new sub-tasks: variant = sum of remaining work across all pending tasks. What assumption does termination need on sub-task cost?
  8. Implementation drill. Write pseudocode for a retry loop while attempts < MAX and not success: attempt(); wait(backoff) and state both an invariant and a variant. Complexity derivation: at most MAX iterations, each $\Theta(\text{attempt})$, so total worst case $\Theta(\text{MAX} \cdot \text{attempt})$. Why does the variant argument fail without an explicit upper bound on attempts?

Read This Only If Stuck