Diagonalization and Similarity Make Repeated Action Computable
What This Concept Is
Similarity means two square matrices $A$ and $B$ represent the same linear transformation in different bases: there exists an invertible $P$ such that $B = P^{-1} A P$. Similarity preserves everything that is a property of the transformation rather than the coordinates: eigenvalues, determinant, trace, characteristic polynomial, rank, Jordan structure.
Diagonalization is the best case of similarity. If $A$ has $n$ linearly independent eigenvectors $v_1, \ldots, v_n$ with eigenvalues $\lambda_1, \ldots, \lambda_n$, stacking them as columns of $S$ gives
$$A = S \Lambda S^{-1}, \quad \Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n).$$
Equivalently $S^{-1} A S = \Lambda$. Diagonal matrices are trivial to power, invert, exponentiate, and interpret -- so if $A$ is similar to one, hard problems about $A$ collapse to easy problems about $\Lambda$. The miracle identity is
$$A^k = S \Lambda^k S^{-1},$$
turning matrix powers into coordinate-wise scalar powers. For $f$ any analytic function, $f(A) = S f(\Lambda) S^{-1}$; in particular $e^{At} = S e^{\Lambda t} S^{-1}$ is how you solve linear ODEs in closed form.
Distinguish diagonalization from its generalizations and special cases:
- Diagonalization $A = S\Lambda S^{-1}$: requires a full eigenbasis; not every matrix has one.
- Spectral theorem (symmetric case) $A = Q \Lambda Q^T$: for symmetric $A$, you can always choose an orthonormal eigenbasis $Q$, so $S^{-1} = S^T$.
- Jordan form $A = P J P^{-1}$: every square matrix over $\mathbb{C}$ is similar to a block-diagonal Jordan matrix; diagonalization is the case where every Jordan block has size 1.
- Schur decomposition $A = U T U^*$: every square matrix is unitarily similar to an upper triangular $T$. Always exists; cheaper than Jordan; the numerical workhorse for eigenvalue computation.
- SVD $A = U\Sigma V^T$: a different kind of decomposition that exists universally, works for rectangular $A$, and uses two orthogonal matrices instead of one (Cluster 5).
When $A$ cannot be diagonalized (defective case), you fall back to Jordan or Schur; the long-run behavior story becomes more complicated but does not disappear.
Why It Matters Here
Diagonalization is not just a prettier form. It turns difficult repeated action into easy scalar arithmetic. For recurrences, discrete-time linear systems, powers-of-matrix behavior, and linear ODEs, the eigenvalue/eigenvector picture is the primary tool. It is also one of the main places where "choose the right basis" becomes computationally decisive: in the eigenbasis, the action of $A$ is just coordinate-wise scaling.
It also gives operational content to Cluster 4's broader point: for iterative processes, long-run behavior is governed by the dominant eigenvalue. The dominant mode eventually overwhelms smaller magnitudes, which is why power iteration converges to the top eigenvector and why Fibonacci-like sequences grow at the golden-ratio rate.
Concrete Examples
Example 1 -- diagonalize and power. For $A = \begin{pmatrix} 2 & 1 \ 1 & 2 \end{pmatrix}$ from the previous concept: $\lambda_1 = 3$ with $v_1 = (1, 1)^T$, $\lambda_2 = 1$ with $v_2 = (1, -1)^T$. Then
$$S = \begin{pmatrix} 1 & 1 \ 1 & -1 \end{pmatrix}, \quad \Lambda = \begin{pmatrix} 3 & 0 \ 0 & 1 \end{pmatrix}, \quad S^{-1} = \tfrac{1}{2}\begin{pmatrix} 1 & 1 \ 1 & -1 \end{pmatrix}.$$
Compute
$$A^{10} = S \Lambda^{10} S^{-1} = \tfrac{1}{2}\begin{pmatrix} 1 & 1 \ 1 & -1 \end{pmatrix}\begin{pmatrix} 59049 & 0 \ 0 & 1 \end{pmatrix}\begin{pmatrix} 1 & 1 \ 1 & -1 \end{pmatrix} = \tfrac{1}{2}\begin{pmatrix} 59050 & 59048 \ 59048 & 59050 \end{pmatrix}.$$
Without diagonalization, you would need to multiply $A$ by itself 10 times, which is both more work and less insight.
Example 2 -- Fibonacci via diagonalization. The Fibonacci recurrence $F_{n+1} = F_n + F_{n-1}$ is the linear system
$$\begin{pmatrix} F_{n+1} \ F_n \end{pmatrix} = \underbrace{\begin{pmatrix} 1 & 1 \ 1 & 0 \end{pmatrix}}{A}\begin{pmatrix} F_n \ F{n-1} \end{pmatrix}.$$
Characteristic polynomial $\lambda^2 - \lambda - 1 = 0$, with roots $\phi = (1+\sqrt{5})/2$ (golden ratio) and $\psi = (1-\sqrt{5})/2$. Diagonalize to obtain Binet's formula
$$F_n = \frac{\phi^n - \psi^n}{\sqrt{5}}.$$
Since $|\psi| < 1$, $F_n \sim \phi^n/\sqrt{5}$ for large $n$. This is the MCS "linear recurrence" chapter restated in matrix form.
Example 3 -- not diagonalizable. Revisit $A = \begin{pmatrix} 4 & 1 \ 0 & 4 \end{pmatrix}$. Only one eigenvector exists for $\lambda = 4$, so $S$ cannot be invertible. Diagonalization fails. Jordan form gives $A = PJP^{-1}$ with $J = \begin{pmatrix} 4 & 1 \ 0 & 4 \end{pmatrix}$ (i.e., $J = A$ here). Powers of Jordan blocks grow polynomially times geometrically: $J^k = \begin{pmatrix} 4^k & k\cdot 4^{k-1} \ 0 & 4^k \end{pmatrix}$. The extra $k$ factor is the hallmark of defective spectra.
Common Confusion / Misconceptions
"Every matrix is diagonalizable because every matrix has a characteristic polynomial." False. Defective matrices (repeated eigenvalues with too few eigenvectors) cannot be diagonalized over $\mathbb{R}$ or $\mathbb{C}$. They can always be put in Jordan form, and every matrix is unitarily similar to a triangular one (Schur), but diagonalization is strictly narrower.
"Similarity means the matrices happen to look alike." Similarity is coordinate equivalence -- two numerically different matrices representing the same map in different bases -- not visual resemblance. A diagonal matrix and its similar-under-any-$P$ version look nothing alike.
"Diagonalization is just about making things pretty." It is computationally decisive for $A^k$, $e^{At}$, and more generally any function applied to a matrix. It also exposes the invariant directions and their growth rates, which is the content of spectral analysis.
"$S$ is unique." Eigenvectors are defined only up to nonzero scalar, and in eigenspaces of dimension $> 1$, up to any basis choice. So $S$ is a family of choices; typical conventions include normalizing columns ($|v_i| = 1$) or choosing orthogonal eigenvectors in the symmetric case.
How To Use It
When asked about repeated action:
- Compute the eigenvalues (characteristic polynomial or library call).
- Check whether enough independent eigenvectors exist: $\sum \text{(geom mult)} = n$?
- If yes, diagonalize: stack eigenvectors into $S$; build $\Lambda$; compute what you need.
- If no, say clearly why diagonalization fails; use Jordan or Schur for structural arguments; use numerical libraries for actual computation.
- Use this especially for recurrences, long-run behavior, and linear ODEs.
- For symmetric $A$: use
np.linalg.eigh; the eigenvectors are orthonormal and $A = Q\Lambda Q^T$ with $Q^{-1} = Q^T$, cheaper and more stable.
Transfer / Where This Shows Up Later
- S2 (algorithms): Linear recurrences and generating functions are solved by diagonalization. Fast matrix exponentiation for recurrence evaluation (the $O(\log n)$ Fibonacci algorithm) is matrix diagonalization turned into an algorithmic technique.
- S4 (computer organization): Numerical eigensolvers (
LAPACK/dgeev) sit at the bottom of almost every scientific-computing pipeline; GPUs provide blocked Schur implementations. - S5 (queueing & Markov chains): The stationary distribution is the left-eigenvector of $P$ for $\lambda = 1$. Mixing time is $O(\log n / \log(1/|\lambda_2|))$ -- the spectral gap directly.
- S7 (architecture): Systems with feedback loops are analyzed via the eigenvalues of the coupling matrix; unstable modes reveal feedback divergence.
- S8 (ranking): PageRank is "diagonalize to find the dominant eigenvector"; spectral clustering uses the bottom eigenvectors of the Laplacian.
- Phase 7 (ML): The power method approximates top eigenvectors of covariance matrices (essentially PCA without full diagonalization). The Jacobian of a recurrent net evaluated repeatedly is a product whose "eigenvalue" behavior (spectral radius) decides if gradients vanish or explode. Matrix exponentials appear in continuous-time neural ODEs and reservoir computing.
Check Yourself
- Why does diagonalization reduce matrix powers to scalar powers?
- What is the conceptual meaning of similarity?
- What exact condition is needed for diagonalization over $\mathbb{R}$ or $\mathbb{C}$?
- For symmetric $A$, why can $S$ be chosen orthogonal?
- What replaces diagonalization when $A$ is defective, and what does the long-run behavior of $A^k$ look like then?
- Why does the spectral radius govern stability of $A^k$?
Mini Drill or Application
- Determine whether $A = \begin{pmatrix} 4 & 1 \ 0 & 4 \end{pmatrix}$ is diagonalizable. Justify by counting independent eigenvectors.
- Determine whether $B = \begin{pmatrix} 2 & 0 \ 0 & 3 \end{pmatrix}$ is diagonalizable; write $B^k$ explicitly and describe long-run growth by direction.
- In NumPy: for the Fibonacci matrix $A = \begin{pmatrix} 1 & 1 \ 1 & 0 \end{pmatrix}$, compute
w, V = np.linalg.eig(A); Ak_direct = np.linalg.matrix_power(A, 30); Ak_diag = V @ np.diag(w**30) @ np.linalg.inv(V). Verifynp.allclose(Ak_direct, Ak_diag)and compute $F_{30}$. - Prove: if $A$ is diagonalizable with all $|\lambda_i| < 1$, then $A^k \to 0$. What changes if even one $|\lambda_i| = 1$?
- For a Markov transition matrix $P$ of your choice, compute its eigenvalues; find the left eigenvector for $\lambda = 1$ and verify that the entries sum to 1 after normalization. That vector is the stationary distribution.
Read This Only If Stuck
- LAIA 5.2 Diagonalization of a Matrix (Part 1)
- LAIA 5.2 Diagonalization of a Matrix (Part 3)
- LAIA 5.3 Difference Equations and Powers Ak (Part 1)
- LAIA 5.3 Difference Equations and Powers Ak (Part 2)
- LAIA 5.6 Similarity Transformations (Part 2)
- LAIA Appendix B: The Jordan Form (Part 1)
- MCS 22.3 Linear Recurrences
- MCS 22.3 Linear Recurrences (continuation)
- Wikipedia: Diagonalizable matrix
- 3Blue1Brown: Change of basis and eigenbasis
- MIT OCW 18.06 Lecture 22: Diagonalization and Powers of A