Carrying Out the Plan Requires Per-Step Verification
What This Concept Is
Carrying out the plan is the disciplined execution of the strategy you devised, with verification after every step -- not only at the end.
Execution has its own skills, distinct from understanding and planning:
- translating plan steps into concrete moves (arithmetic, algebra, code)
- verifying each step before moving to the next ("why should I believe this line?")
- detecting when a step is stuck (technique is right but slow) versus when a step is wrong (technique cannot work here)
- deciding when to revise the plan versus when to keep pushing
Polya's advice is sharp: check each step. Do not barrel through. The reason is that errors in step 5 invisibly corrupt steps 6 through the end; the extra seconds spent justifying each step cost less than re-executing the remainder.
Per-step verification is different from answer-checking. Answer-checking ("plug the result back in") catches the subset of bugs whose effect survives to the final answer. Many bugs cancel each other out halfway through and yield an answer that happens to be correct. Argument-checking is the only discipline that catches those.
This phase is also where executability becomes a constraint that phase two could afford to ignore. A plan that said "iterate until convergence" can execute; a plan that said "choose the best pivot" cannot execute until you specify what "best" means operationally. Execution turns plan sentences into ground-level moves, and each move must be both correct and effective (finite work in a known language).
Why It Matters Here
Between a correct plan and a correct answer, many things can go wrong: sign errors, misapplied lemmas, implementation bugs, wrong base cases, stale intermediate variables. Per-step verification is the difference between "I solved it" and "I think I solved it."
In CS, this phase is where proofs become code and code becomes tests. The discipline is parallel: after every step, ask "why should I believe this?" In a proof you write a one-word margin justification; in code you write an assertion or a unit test; in a system you write a monitor.
Forward pipeline: Semester 2 (algorithms) extends this into loop invariants -- a written predicate that holds at every iteration, verified separately. Semester 4 (systems programming) uses assertions on every kernel boundary. Semester 5 (protocols) verifies every message handler against its state-machine invariant. Semester 7 (architecture reviews) treats "what would break this step?" as a first-class risk question.
Concrete Examples
Example 1 -- induction with a margin ledger
Plan: prove by induction that $\sum_{k=1}^n k = \frac{n(n+1)}{2}$.
Execute step by step, each with a justification:
- Base case. $n = 1$: LHS $= 1$, RHS $= \frac{1 \cdot 2}{2} = 1$. [arithmetic]
- Inductive hypothesis. Assume $\sum_{k=1}^n k = \frac{n(n+1)}{2}$ for some $n \geq 1$. [given]
- Inductive step goal. Show $\sum_{k=1}^{n+1} k = \frac{(n+1)(n+2)}{2}$. [restate target]
- Rewrite. $\sum_{k=1}^{n+1} k = \sum_{k=1}^{n} k + (n+1)$. [definition of sum]
- Apply hypothesis. $= \frac{n(n+1)}{2} + (n+1)$. [IH]
- Simplify. $= (n+1)\left(\frac{n}{2} + 1\right) = \frac{(n+1)(n+2)}{2}$. [algebra: $\frac{n}{2} + 1 = \frac{n+2}{2}$]
- Conclude. By induction, the claim holds for all $n \geq 1$. [induction rule]
Each step has an explicit one-word justification. If step 6 had been wrong, the rest of the proof would rest on a falsehood, and the ledger would have caught it.
Example 2 -- binary search with a loop invariant
Plan: find the index of target $t$ in a sorted array $a[0..n)$ by binary search.
lo, hi = 0, n
while lo < hi:
mid = (lo + hi) // 2
if a[mid] < t:
lo = mid + 1
else:
hi = mid
Execute step by step:
- State invariant. At loop entry, if $t$ exists in $a$, its index is in $[lo, hi)$. [chosen invariant]
- Base case (first entry). $[0, n)$ covers the whole array. [initialisation]
- Case $a[mid] < t$. Then $t$ cannot be in $[lo, mid]$; setting $lo = mid + 1$ preserves the invariant. [sortedness + transitivity]
- Case $a[mid] \geq t$. Then if $t$ exists, its index is $\leq mid$; setting $hi = mid$ preserves the invariant. [sortedness]
- Termination measure. $hi - lo$ strictly decreases by at least 1. Bounded below by 0. [monovariant]
- Post-condition. When $lo = hi$, if $a[lo] = t$ then $lo$ is the answer; else absent. [invariant + termination]
Six verified steps, one short function. Any bug now localises to one line and one justification.
Common Confusion / Misconceptions
Momentum loss. Learners often lose discipline mid-execution: they stop verifying as momentum builds, especially right before the punchline. That is exactly where arithmetic and sign errors accumulate.
Tweak-without-rethink. When a step fails, learners tweak the execution rather than reconsidering the plan. Sometimes the plan was wrong and no amount of patient retrying will help.
Answer-check instead of argument-check. Confusing "I checked the answer" with "I checked the argument." Answer-checking catches some bugs; argument-checking catches the bugs that silently become habits.
Silent assumption accretion. Importing a lemma or a library function without stating what it requires. A correct-looking step that relies on an unstated precondition is a bomb for the next person (including future you).
How To Use It
Use a margin ledger: next to each step, write a one-word justification (algebra, IH, def, lemma 2, previous case, monotonicity, pigeonhole). If you cannot write the justification, the step is unverified.
When a step stalls for more than 10-15 minutes:
- Ask whether the plan is wrong or the step is hard.
- Re-read the plan and check that the current step matches it.
- Try executing a simpler sub-step to confirm the technique works.
- If still stalled, return to the plan phase and declare the switch explicitly.
For code execution:
- add assertions that encode the invariant at the top of each loop
- write the smallest possible failing test before fixing anything
- keep a "last known good" version; diff to localise what broke
This is the difference between stuck-but-making-progress and thrashing.
Transfer / Where This Shows Up Later
- Semester 2 (algorithm correctness). Loop invariants and recurrence proofs are phase-three discipline at scale.
- Semester 3 (refactoring). Every refactor step preserves behaviour; per-step verification is the discipline that proves it.
- Semester 4 (debugging). Margin-ledger thinking becomes assertion-and-logging thinking. A debug session is a ledger with timestamps.
- Semester 5 (protocol design). Each message handler transitions one state; the ledger is the state-machine diagram.
- Semester 7 (code review). Reviewers look for the step whose justification is missing; that step is the most likely bug.
- Semester 10 (capstone). CI with per-commit tests is institutional per-step verification.
Check Yourself
- Why is per-step verification different from verifying the final answer? Give a bug that would survive answer-checking but not step-checking.
- When should you revise the plan rather than retry the step? What signal tells you which to do?
- What does a margin ledger buy you in proof writing that raw derivation does not?
- Why is an unstated precondition on an imported lemma a ledger-level bug, not a plan-level bug?
- For the binary-search example, what one-line modification to the code would break the termination measure? Explain.
Mini Drill or Application
Drill A. Take any proof you wrote in Module 1 that was more than a few lines. Retroactively add a margin ledger: a one-word justification for every equality, inequality, and implication. Find at least one step you cannot justify, and decide whether it was a mistake or just an unwritten assumption.
Drill B. Compute $\gcd(252, 198)$ using the Euclidean algorithm, writing a margin justification for every division step. Then verify by factoring both numbers and taking the minimum power of each common prime -- the agreement is your argument-check.
Drill C (unseen). For the algorithm below, state the loop invariant, the termination measure, and the post-condition. Then construct a minimal input that violates each if the < on the sixth line is weakened to <=.
def max_idx(a):
best = 0
for i in range(1, len(a)):
if a[best] < a[i]:
best = i
return best
Read This Only If Stuck
- Dromey: 1.4.4 Debugging programs -- execution-level bug localisation
- Dromey: 1.5 Program verification -- pre-/post-conditions and per-step justification
- Dromey: 1.5.5 Verification of program segments with branches -- invariants across branches
- Dromey: 1.5.8 Proof of termination -- termination measures
- Dromey: 1.3.6 Termination of loops -- finiteness discipline
- MCS: 2.1 Well-ordering and proof templates -- per-step template for induction proofs
- External: Wikipedia: Mathematical induction -- the canonical per-step verification pattern