Skip to main content

Diagonalization and Similarity Make Repeated Action Computable

What This Concept Is

Similarity means two square matrices $A$ and $B$ represent the same linear transformation in different bases: there exists an invertible $P$ such that $B = P^{-1} A P$. Similarity preserves everything that is a property of the transformation rather than the coordinates: eigenvalues, determinant, trace, characteristic polynomial, rank, Jordan structure.

Diagonalization is the best case of similarity. If $A$ has $n$ linearly independent eigenvectors $v_1, \ldots, v_n$ with eigenvalues $\lambda_1, \ldots, \lambda_n$, stacking them as columns of $S$ gives

$$A = S \Lambda S^{-1}, \quad \Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n).$$

Equivalently $S^{-1} A S = \Lambda$. Diagonal matrices are trivial to power, invert, exponentiate, and interpret -- so if $A$ is similar to one, hard problems about $A$ collapse to easy problems about $\Lambda$. The miracle identity is

$$A^k = S \Lambda^k S^{-1},$$

turning matrix powers into coordinate-wise scalar powers. For $f$ any analytic function, $f(A) = S f(\Lambda) S^{-1}$; in particular $e^{At} = S e^{\Lambda t} S^{-1}$ is how you solve linear ODEs in closed form.

Distinguish diagonalization from its generalizations and special cases:

  • Diagonalization $A = S\Lambda S^{-1}$: requires a full eigenbasis; not every matrix has one.
  • Spectral theorem (symmetric case) $A = Q \Lambda Q^T$: for symmetric $A$, you can always choose an orthonormal eigenbasis $Q$, so $S^{-1} = S^T$.
  • Jordan form $A = P J P^{-1}$: every square matrix over $\mathbb{C}$ is similar to a block-diagonal Jordan matrix; diagonalization is the case where every Jordan block has size 1.
  • Schur decomposition $A = U T U^*$: every square matrix is unitarily similar to an upper triangular $T$. Always exists; cheaper than Jordan; the numerical workhorse for eigenvalue computation.
  • SVD $A = U\Sigma V^T$: a different kind of decomposition that exists universally, works for rectangular $A$, and uses two orthogonal matrices instead of one (Cluster 5).

When $A$ cannot be diagonalized (defective case), you fall back to Jordan or Schur; the long-run behavior story becomes more complicated but does not disappear.

Why It Matters Here

Diagonalization is not just a prettier form. It turns difficult repeated action into easy scalar arithmetic. For recurrences, discrete-time linear systems, powers-of-matrix behavior, and linear ODEs, the eigenvalue/eigenvector picture is the primary tool. It is also one of the main places where "choose the right basis" becomes computationally decisive: in the eigenbasis, the action of $A$ is just coordinate-wise scaling.

It also gives operational content to Cluster 4's broader point: for iterative processes, long-run behavior is governed by the dominant eigenvalue. The dominant mode eventually overwhelms smaller magnitudes, which is why power iteration converges to the top eigenvector and why Fibonacci-like sequences grow at the golden-ratio rate.

Concrete Examples

Example 1 -- diagonalize and power. For $A = \begin{pmatrix} 2 & 1 \ 1 & 2 \end{pmatrix}$ from the previous concept: $\lambda_1 = 3$ with $v_1 = (1, 1)^T$, $\lambda_2 = 1$ with $v_2 = (1, -1)^T$. Then

$$S = \begin{pmatrix} 1 & 1 \ 1 & -1 \end{pmatrix}, \quad \Lambda = \begin{pmatrix} 3 & 0 \ 0 & 1 \end{pmatrix}, \quad S^{-1} = \tfrac{1}{2}\begin{pmatrix} 1 & 1 \ 1 & -1 \end{pmatrix}.$$

Compute

$$A^{10} = S \Lambda^{10} S^{-1} = \tfrac{1}{2}\begin{pmatrix} 1 & 1 \ 1 & -1 \end{pmatrix}\begin{pmatrix} 59049 & 0 \ 0 & 1 \end{pmatrix}\begin{pmatrix} 1 & 1 \ 1 & -1 \end{pmatrix} = \tfrac{1}{2}\begin{pmatrix} 59050 & 59048 \ 59048 & 59050 \end{pmatrix}.$$

Without diagonalization, you would need to multiply $A$ by itself 10 times, which is both more work and less insight.

Example 2 -- Fibonacci via diagonalization. The Fibonacci recurrence $F_{n+1} = F_n + F_{n-1}$ is the linear system

$$\begin{pmatrix} F_{n+1} \ F_n \end{pmatrix} = \underbrace{\begin{pmatrix} 1 & 1 \ 1 & 0 \end{pmatrix}}{A}\begin{pmatrix} F_n \ F{n-1} \end{pmatrix}.$$

Characteristic polynomial $\lambda^2 - \lambda - 1 = 0$, with roots $\phi = (1+\sqrt{5})/2$ (golden ratio) and $\psi = (1-\sqrt{5})/2$. Diagonalize to obtain Binet's formula

$$F_n = \frac{\phi^n - \psi^n}{\sqrt{5}}.$$

Since $|\psi| < 1$, $F_n \sim \phi^n/\sqrt{5}$ for large $n$. This is the MCS "linear recurrence" chapter restated in matrix form.

Example 3 -- not diagonalizable. Revisit $A = \begin{pmatrix} 4 & 1 \ 0 & 4 \end{pmatrix}$. Only one eigenvector exists for $\lambda = 4$, so $S$ cannot be invertible. Diagonalization fails. Jordan form gives $A = PJP^{-1}$ with $J = \begin{pmatrix} 4 & 1 \ 0 & 4 \end{pmatrix}$ (i.e., $J = A$ here). Powers of Jordan blocks grow polynomially times geometrically: $J^k = \begin{pmatrix} 4^k & k\cdot 4^{k-1} \ 0 & 4^k \end{pmatrix}$. The extra $k$ factor is the hallmark of defective spectra.

Common Confusion / Misconceptions

"Every matrix is diagonalizable because every matrix has a characteristic polynomial." False. Defective matrices (repeated eigenvalues with too few eigenvectors) cannot be diagonalized over $\mathbb{R}$ or $\mathbb{C}$. They can always be put in Jordan form, and every matrix is unitarily similar to a triangular one (Schur), but diagonalization is strictly narrower.

"Similarity means the matrices happen to look alike." Similarity is coordinate equivalence -- two numerically different matrices representing the same map in different bases -- not visual resemblance. A diagonal matrix and its similar-under-any-$P$ version look nothing alike.

"Diagonalization is just about making things pretty." It is computationally decisive for $A^k$, $e^{At}$, and more generally any function applied to a matrix. It also exposes the invariant directions and their growth rates, which is the content of spectral analysis.

"$S$ is unique." Eigenvectors are defined only up to nonzero scalar, and in eigenspaces of dimension $> 1$, up to any basis choice. So $S$ is a family of choices; typical conventions include normalizing columns ($|v_i| = 1$) or choosing orthogonal eigenvectors in the symmetric case.

How To Use It

When asked about repeated action:

  1. Compute the eigenvalues (characteristic polynomial or library call).
  2. Check whether enough independent eigenvectors exist: $\sum \text{(geom mult)} = n$?
  3. If yes, diagonalize: stack eigenvectors into $S$; build $\Lambda$; compute what you need.
  4. If no, say clearly why diagonalization fails; use Jordan or Schur for structural arguments; use numerical libraries for actual computation.
  5. Use this especially for recurrences, long-run behavior, and linear ODEs.
  6. For symmetric $A$: use np.linalg.eigh; the eigenvectors are orthonormal and $A = Q\Lambda Q^T$ with $Q^{-1} = Q^T$, cheaper and more stable.

Transfer / Where This Shows Up Later

  • S2 (algorithms): Linear recurrences and generating functions are solved by diagonalization. Fast matrix exponentiation for recurrence evaluation (the $O(\log n)$ Fibonacci algorithm) is matrix diagonalization turned into an algorithmic technique.
  • S4 (computer organization): Numerical eigensolvers (LAPACK/dgeev) sit at the bottom of almost every scientific-computing pipeline; GPUs provide blocked Schur implementations.
  • S5 (queueing & Markov chains): The stationary distribution is the left-eigenvector of $P$ for $\lambda = 1$. Mixing time is $O(\log n / \log(1/|\lambda_2|))$ -- the spectral gap directly.
  • S7 (architecture): Systems with feedback loops are analyzed via the eigenvalues of the coupling matrix; unstable modes reveal feedback divergence.
  • S8 (ranking): PageRank is "diagonalize to find the dominant eigenvector"; spectral clustering uses the bottom eigenvectors of the Laplacian.
  • Phase 7 (ML): The power method approximates top eigenvectors of covariance matrices (essentially PCA without full diagonalization). The Jacobian of a recurrent net evaluated repeatedly is a product whose "eigenvalue" behavior (spectral radius) decides if gradients vanish or explode. Matrix exponentials appear in continuous-time neural ODEs and reservoir computing.

Check Yourself

  1. Why does diagonalization reduce matrix powers to scalar powers?
  2. What is the conceptual meaning of similarity?
  3. What exact condition is needed for diagonalization over $\mathbb{R}$ or $\mathbb{C}$?
  4. For symmetric $A$, why can $S$ be chosen orthogonal?
  5. What replaces diagonalization when $A$ is defective, and what does the long-run behavior of $A^k$ look like then?
  6. Why does the spectral radius govern stability of $A^k$?

Mini Drill or Application

  1. Determine whether $A = \begin{pmatrix} 4 & 1 \ 0 & 4 \end{pmatrix}$ is diagonalizable. Justify by counting independent eigenvectors.
  2. Determine whether $B = \begin{pmatrix} 2 & 0 \ 0 & 3 \end{pmatrix}$ is diagonalizable; write $B^k$ explicitly and describe long-run growth by direction.
  3. In NumPy: for the Fibonacci matrix $A = \begin{pmatrix} 1 & 1 \ 1 & 0 \end{pmatrix}$, compute w, V = np.linalg.eig(A); Ak_direct = np.linalg.matrix_power(A, 30); Ak_diag = V @ np.diag(w**30) @ np.linalg.inv(V). Verify np.allclose(Ak_direct, Ak_diag) and compute $F_{30}$.
  4. Prove: if $A$ is diagonalizable with all $|\lambda_i| < 1$, then $A^k \to 0$. What changes if even one $|\lambda_i| = 1$?
  5. For a Markov transition matrix $P$ of your choice, compute its eigenvalues; find the left eigenvector for $\lambda = 1$ and verify that the entries sum to 1 after normalization. That vector is the stationary distribution.

Read This Only If Stuck