Amortized Analysis: Aggregate, Accounting, Potential

What This Concept Is

Amortized analysis bounds the average cost per operation over a worst-case sequence. It is still deterministic: there is no probability and no average-input distribution. You name a sequence length $n$, and you bound the total cost $T(n)$ of the worst such sequence; the amortized cost per operation is $T(n)/n$. Every op in the sequence may have wildly different real costs, but their sum is controlled, and that sum divided by $n$ is the number you report.

Three methods give the same answer for different structures; the choice is for convenience.

Aggregate. Bound the total cost $T(n)$ of any sequence of $n$ operations and declare the amortized cost per operation as $T(n)/n$. Same amortized cost for every operation type in the sequence. Good first method because it reveals whether amortization is possible at all.
Accounting. Assign an "amortized charge" $\hat c_i$ to each operation that may exceed or under-shoot its real cost $c_i$; store surplus as credit on data-structure components; use saved credit to pay for expensive operations later. Invariant: total credit never goes negative, so $\sum \hat c_i \ge \sum c_i = T(n)$. Works well when you can visualize "each unit of state carries a few credits."
Potential. Define a potential function $\Phi$ on data-structure states with $\Phi(D_i) \ge \Phi(D_0)$ for every reachable $D_i$. The amortized cost of operation $i$ is $\hat c_i = c_i + \Phi(D_i) - \Phi(D_{i-1})$. Summing telescopes: $$\sum_{i=1}^{n} \hat c_i = \sum_{i=1}^{n} c_i + \Phi(D_n) - \Phi(D_0) \ge T(n).$$ The last inequality uses $\Phi(D_n) \ge \Phi(D_0)$. The sum of amortized costs therefore upper-bounds the real total cost, which is the point: you can reason with $\hat c$ instead of $c$ and still get a correct total.

All three are provably equivalent in power; they differ in convenience. Aggregate is the simplest when total work is easy to bound. Accounting is most intuitive when specific operations "create debt." Potential is the strongest when the data structure has state that changes incrementally and you can define $\Phi$ to track slack.

Formally, amortization is a redistribution: assign each operation $i$ a charge $\hat c_i$ whose sum upper-bounds the real total cost. The per-op cost you report is then $\max_i \hat c_i$ (worst amortized) or an average $\frac{1}{n}\sum \hat c_i$ (aggregate). The key soundness check is that the redistribution is conservative: $\sum_{i \le k} \hat c_i \ge \sum_{i \le k} c_i$ for every prefix, not just the full sequence.

Why It Matters Here

Some operations are cheap almost always but expensive occasionally. Worst-case-per-operation analysis overestimates total cost. Dynamic arrays (resizing), hash tables (rehashing), union-find with path compression, splay trees, Fibonacci heaps, the LSM-tree's compaction step, persistent data structures that occasionally rebuild, and many incremental algorithms only make sense under amortized bounds.

Saying "worst case is $\Theta(n)$ per op" on a dynamic array would give you a total bound of $\Theta(n^2)$ for $n$ operations - off by a factor of $n$. Amortized analysis recovers the tight $\Theta(n)$ total without lying.

If you can only state worst case per operation, you cannot justify that $n$ appends to a Python list take $\Theta(n)$ total rather than $\Theta(n^2)$, you cannot explain why the B-tree insert in an LSM is "cheap," and you cannot budget capacity for any system that batches or resizes.

Amortized analysis is also the cleanest bridge between the theoretical analysis of this module and the systems statements you will make in S5-S8 ("writes are $O(\log n)$ amortized," "compaction cost is linear in bytes written").

Later in this semester, S2 M5 (advanced structures) revisits the potential and accounting methods formally as concepts; S2 M2 uses amortized bounds for hash-table resizing; S2 M3 analyzes Fibonacci-heap DECREASE-KEY as amortized $O(1)$, enabling Dijkstra in $O(E + V \log V)$.

Concrete Example

Example 1 (dynamic array with doubling). Each append is normally $\Theta(1)$, but on overflow we allocate a new array of size $2k$ and copy all $k$ elements.

Aggregate argument. Consider $n$ appends starting from size 1. Resizes happen at sizes $1, 2, 4, \ldots$, up to the largest power of 2 $\le n$. Total copy work is $$1 + 2 + 4 + \cdots + 2^{\lfloor \log_2 n \rfloor} \le 2n.$$ Total append work is $n$ for the writes plus $\le 2n$ for copies, so total $\le 3n = \Theta(n)$. Amortized cost per append is $O(1)$.

Accounting view. Charge each append an amortized cost of 3. Pay 1 for the write itself and bank 2. When a resize from $k$ to $2k$ happens there are $k$ elements, and only the last $k/2$ of them banked credits after the previous resize (the older ones' credits were spent on the previous resize). Those $k/2$ elements carry 2 credits each, totaling $k$ units, exactly covering the $k$ copies. No operation ever overdraws, so $\sum \hat c_i \ge \sum c_i$ and amortized $O(1)$ holds.

Potential view. Let $\Phi(D) = 2 \cdot \text{size} - \text{capacity}$. Before first append $\Phi = 0$. On a non-resizing append, $\text{size}$ grows by 1, $\text{capacity}$ unchanged, so $\Delta \Phi = 2$, giving amortized $\hat c = 1 + 2 = 3$. On a resizing append, $\text{size}$ grows by 1 and $\text{capacity}$ doubles from $k$ to $2k$, so $\Delta \Phi = 2(k+1) - 2k - (k - (2k-k)) = 2 - k$; amortized $\hat c = (k+1) + (2 - k) = 3$. $\Phi$ is always $\ge 0$ provided size $\ge$ capacity$/2$ after any resize, which the doubling policy guarantees.

All three methods conclude: $n$ appends cost $\Theta(n)$, even though one specific append can cost $\Theta(n)$ on its own.

The doubling factor matters: growing by a constant additive amount $c$ instead of doubling gives resize work $\sum_{i} k_i \cdot c = \Theta(n^2)$ total, an amortized $\Theta(n)$ per op. The geometric growth is what makes the amortized sum telescope to linear. Java's ArrayList and Python's list both grow geometrically (factors near 1.5 and 1.125 respectively) for exactly this reason.

Example 2 (binary counter). Flipping bits to increment an $n$-bit counter from 0 to $2^n$. Bit 0 flips every increment ($n$ flips), bit 1 every 2 ($n/2$), bit $k$ every $2^k$. Total flips over $n$ increments: $$\sum_{k=0}^{\lfloor \log_2 n \rfloor} \left\lfloor \frac{n}{2^k} \right\rfloor < 2n = \Theta(n).$$ Amortized cost per increment: $O(1)$.

Example 3 (stack with MULTI-POP). Operations are PUSH(x) (cost 1), POP (cost 1), and MULTI-POP(k) (pop up to $k$ elements, cost = actual pops $\le k$). A single MULTI-POP can cost $\Theta(n)$ if the stack has $n$ items.

Aggregate. Each pushed element is popped at most once across the entire sequence. Over $n$ operations, total pops $\le$ total pushes $\le n$. Total work $\le 2n = \Theta(n)$. Amortized $O(1)$ per op.

Potential. $\Phi = \text{stack size}$. PUSH: $\hat c = 1 + 1 = 2$. POP: $\hat c = 1 - 1 = 0$. MULTI-POP(k): $\hat c = k' - k' = 0$ where $k' \le k$ is actual pops. Max amortized cost across any op: 2. Same answer, cleaner derivation.

Common Confusion / Misconceptions

"Amortized equals average-case." Average-case uses a probability distribution over inputs. Amortized bounds hold for every sequence of operations on any input, averaging only across the sequence. They are adversarial guarantees, not stochastic expectations.
"Amortized equals expected." Expected cost is an average over the algorithm's random choices (e.g., a randomized pivot). You can have all four - worst-case, average-case, expected, and amortized - simultaneously describing the same operation at different resolutions.
"Picking a potential function is guessing." It feels like guessing but it's design. A good $\Phi$ increases cheaply on cheap operations (paid for by their amortized cost) and decreases sharply on expensive ones (paying for them). Typical forms: count-of-something, work-left-to-do, imbalance.
"Amortized is weaker than worst-case." Stronger in many settings. An amortized $O(\log n)$-per-op guarantee over $n$ ops gives $O(n \log n)$ total, just like a worst-case guarantee. Only problems that care about single-operation latency (real-time, safety-critical) prefer worst-case.
"Amortized always hides a big spike." Not always: amortized bounds plus worst-case bounds on a specific op (e.g., "insert is amortized $O(\log n)$ and worst-case $O(\log n)$") are achievable and common.
"Amortized $O(1)$ means every op is $O(1)$ in the long run." It means the total work across $n$ ops is $O(n)$. A specific op in the sequence can still cost $\Theta(n)$; the other ops pre-paid for it. When you need a per-op cap for real-time guarantees, use de-amortization techniques (incremental rebuilding, doubled structures).
"You can choose $\Phi$ freely." You can choose its form, but three constraints are non-negotiable: $\Phi(D_0) \le \Phi(D_i)$ for all $i$ (never below start), $\Phi$ must be a function of observable state (no future knowledge), and the amortized costs it produces must agree with the real costs on the final operation of any sequence (so the bound is tight).

How To Use It

Side-by-side cheat sheet for the three methods, applied to dynamic array doubling:

Method	Charge per cheap op	Charge per resize	Conservation check
Aggregate	- (average)	-	Total copies $\le 2n$; total appends $n$; bound $T(n) \le 3n$
Accounting	3 (pay 1 write, bank 2)	$k$ copies paid by $k/2$ banked credits $\times 2$ each	Bank never negative after any prefix
Potential	$\hat c = 1 + \Delta\Phi = 3$	$\hat c = (k+1) + (2 - k) = 3$	$\Phi = 2,\text{size} - \text{capacity} \ge 0$ always

All three yield "amortized $O(1)$ per append." Pick the one whose conservation check is easiest to state in prose for your specific data structure.

When you see a data structure whose operations are "usually cheap, sometimes expensive":

Ask whether expensive operations happen rarely enough to be paid for by cheap ones. If rare-op cost grows with state size and cheap-op cost does not, amortization is likely available.

Compute frequency $\times$ cost for the expensive op and compare with the total budget from the cheap ops. If the cheap ops can absorb it, amortization holds; if not, you need a different structure.
If expensive ops are triggered by a deterministic counter (every $k$-th op), aggregate is easiest. If triggered by geometric state (e.g. size crossing a threshold), potential is cleaner.

Pick a method. Aggregate is simplest when total work is easy to bound uniformly. Accounting is good when specific operations create debt. Potential is best for incremental proofs on a structure with a natural "slack" measure.
State $\hat c_i = c_i + \Phi(D_i) - \Phi(D_{i-1})$ for each op, verify the sum telescopes across any sequence, and check $\Phi \ge \Phi(D_0)$ always.
Convert the amortized bound into a total: $T(n) \le n \cdot (\text{amortized per op}) + \Phi(D_0)$.
Sanity check numerically: simulate the worst-case sequence (e.g., appends that always hit a resize boundary) and confirm the empirical average matches the theoretical amortized cost.
If the amortized bound is what you want to expose to a latency-sensitive caller, add a de-amortization step: spread the expensive work across subsequent cheap operations (e.g., background rebuilding) so the worst single cost also matches the amortized bound.
When you cannot choose the operation sequence (adversarial API consumers), your amortized bound must be robust to any sequence. This is the standard setting; do not weaken the claim to "on random inputs."

A practical shape of potential: if the expensive op happens when a counter reaches some threshold, let $\Phi$ be a positive linear function of distance to the threshold - cheap ops reduce the distance by a constant (paying into $\Phi$), and the expensive op pays itself by crashing $\Phi$ back near zero. This is the skeleton that doubling, chunking, and buffer-flush analyses share.

Transfer / Where This Shows Up Later

S1 M3 (probability) offered the ensemble view; amortization is the deterministic sibling and composes cleanly with it (expected amortized bounds for randomized structures).
S4 (systems / memory hierarchy) analyzes block-moving algorithms where single I/O is expensive but most operations hit cache; the amortized framework is exactly right.
S5 (OS / scheduling) bounds context-switch cost amortized over many small tasks; work-stealing has an amortized $O(1)$ steal claim that rides on potential arguments.
S6 (databases) is the biggest beneficiary: LSM compaction write amplification ($O(\log n)$ amortized per key), B-tree splits ($O(1)$ amortized after ledger accounting), hash-table resize ($\Theta(1)$ amortized), transaction-log rotation.
S7 (architecture fitness functions) lets you state "amortized cost of X remains below Y" as an automated architectural check.
S8 (scaling) costs background compaction, rebalancing shards, and queue-draining with amortized bounds so capacity planning doesn't need to budget for worst-case spikes that balance out over time.

A practical warning for systems work: amortized bounds do not compose blindly. If you have two amortized-$O(1)$ operations used in alternation by an adversary, the adversary can sometimes force your potential function to keep rising, and the amortized bound stops describing reality. Check composition with a joint potential when combining amortized data structures.

Check Yourself

Why is amortized analysis not the same thing as average-case analysis?
For dynamic array doubling, what potential function $\Phi$ would you choose and why does it work?
Give an operation where worst case per op is $\Theta(n)$ but amortized per op is $\Theta(1)$, and name one system in later modules where this shows up.
Under what conditions is the accounting method easier than the potential method?
Can amortized analysis give a tighter bound than worst-case per operation? Explain.
What potential $\Phi$ makes the MULTI-POP stack analysis trivially telescope? Why does any bounded-above, non-negative function of stack size work?
Why is "expected amortized cost" well-defined for randomized data structures (splay trees, skip lists, hash tables) and what does it buy you over deterministic amortization?

Mini Drill or Application

Show that $n$ increment operations on a binary counter cost $\Theta(n)$ total. Repeat the analysis for a ternary counter (digits in ${0, 1, 2}$): is it still $\Theta(n)$?
Show that a stack with MULTI-POP(k) (which pops up to $k$ elements) has $O(1)$ amortized cost per operation even though a single MULTI-POP can cost $\Theta(n)$. Do it twice, once by aggregate and once by potential.
Explain informally why union-find with path compression amortizes to nearly $O(\alpha(n))$ per operation, where $\alpha$ is the inverse Ackermann function. (Hint: $\Phi$ charges each element by its rank and rewards path-compression work.)
Dynamic array with halving: the array halves in capacity when less than $1/4$ full. Prove that a mixed sequence of inserts and deletes has amortized $O(1)$ cost per op. Why does halving at $1/2$ full break this? (Hint: an adversary alternating insert/delete at the threshold forces a resize every op.)
Splay tree ACCESS: why is the amortized cost $O(\log n)$ even though a single access can visit $\Theta(n)$ nodes? Sketch the potential function (sum of $\log \text{size(subtree)}$ over all nodes).
Implementation drill. Implement a DynamicList in pseudocode with the doubling / quartering policy. State the explicit potential function $\Phi$ and verify by simulation that $\sum c_i \le \sum \hat c_i$ for $10^4$ random operations. Complexity derivation: over $n$ operations with amortized $\hat c = O(1)$, total time $\sum \hat c_i = O(n)$, and $\sum c_i \le \sum \hat c_i + \Phi(D_0) - \Phi(D_n) \le O(n)$. Report the observed max real cost of any single op and the observed mean.

Read This Only If Stuck

CLRS: Aggregate analysis
CLRS: The accounting method
CLRS: The potential method
CLRS: Dynamic tables
Skiena: Reasoning about efficiency (with constant factors)
Competitive Programming: Do algorithm analysis
MIT 6.046J Spring 2015 - Amortized analysis lecture (Demaine - dynamic tables and splay trees)
CMU 15-451 amortized analysis notes (formal treatment with additional exercises)
Tarjan, "Amortized Computational Complexity" (the 1985 paper that introduced the potential method to a wide audience)
Jeff Erickson, Algorithms - Amortized analysis chapter (free PDF; accessible and lightly formal)
Brown CS2560 / CSE373 notes on amortized analysis (instructor notes; useful worked problems)

What This Concept Is​

Why It Matters Here​

Concrete Example​

Common Confusion / Misconceptions​

How To Use It​

Transfer / Where This Shows Up Later​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​