Column Space, Nullspace, and Rank Explain What a Matrix Can Produce

What This Concept Is

For a matrix $A \in \mathbb{R}^{m \times n}$, the four fundamental subspaces are the first serious map of what the matrix does. The two most immediately useful are:

$\text{Col}(A) \subseteq \mathbb{R}^m$: all outputs $Ax$. Also called the range or image. A vector $b$ lies in $\text{Col}(A)$ iff $Ax = b$ has at least one solution. Equivalently: the span of the columns of $A$.
$\text{Null}(A) \subseteq \mathbb{R}^n$: all inputs sent to zero, ${x : Ax = 0}$. Also called the kernel. Captures what information $A$ destroys. Equivalently: the solution set of the homogeneous system.

The other two complete the picture. The row space $\text{Row}(A) = \text{Col}(A^T) \subseteq \mathbb{R}^n$ is the column space of the transpose and is orthogonal complement of $\text{Null}(A)$. The left nullspace $\text{Null}(A^T) \subseteq \mathbb{R}^m$ is orthogonal complement of $\text{Col}(A)$. Strang's "Big Picture" arranges these in a cross-diagram that is worth internalizing because it organizes almost every matrix decomposition you will meet.

In the SVD picture (Cluster 5), these four subspaces are read straight off the factorization $A = U\Sigma V^T$: the first $r$ columns of $U$ span $\text{Col}(A)$; the remaining columns of $U$ span $\text{Null}(A^T)$; the first $r$ columns of $V$ span $\text{Row}(A)$; the remaining columns of $V$ span $\text{Null}(A)$. The fundamental-subspaces picture is not a separate theory; it is the coordinate system that the SVD makes visible.

These are complementary viewpoints. Column space tells you what the matrix can produce; nullspace tells you what information the matrix loses; row space tells you which directions in the input are "heard"; left nullspace tells you which directions in the output $A$ cannot reach.

A compact way to remember the shapes. If $A \in \mathbb{R}^{m \times n}$ has rank $r$, then $\dim \text{Col}(A) = \dim \text{Row}(A) = r$; $\dim \text{Null}(A) = n - r$; $\dim \text{Null}(A^T) = m - r$. The input side $\mathbb{R}^n$ splits into a "heard" part of dimension $r$ and a "silenced" part of dimension $n - r$. The output side $\mathbb{R}^m$ splits into a "reached" part of dimension $r$ and an "unreached" part of dimension $m - r$. Rank is the common meter; the two complements pair off.

Rank is the dimension of the column space, written $r = \text{rank}(A)$. Equivalently, it is the number of pivots after elimination, the dimension of the row space, the maximum number of linearly independent columns, and the maximum number of linearly independent rows (these four are all equal -- a non-obvious theorem often called the row-rank equals column-rank theorem). Nullity is the dimension of the nullspace.

The Rank-Nullity theorem states $\text{rank}(A) + \text{nullity}(A) = n$, where $n$ is the number of columns. This is the numerical accounting of the fundamental subspaces: every input dimension is either "heard" by a pivot (contributes to rank) or "silenced" by the nullspace (contributes to nullity).

The orthogonality relations that pair these spaces are:

$$\text{Null}(A) \perp \text{Row}(A), \qquad \text{Null}(A^T) \perp \text{Col}(A).$$

Proof of the first: if $x \in \text{Null}(A)$, then $Ax = 0$ means every row $r_i^T$ of $A$ satisfies $r_i^T x = 0$, so $x$ is orthogonal to every row, hence to the entire row space. Same argument on $A^T$ gives the second. These orthogonal pairings are the skeleton on which least squares (Cluster 3) is built and are directly visible from the SVD (Cluster 5).

Why It Matters Here

This is where linear algebra stops being about solving one system and starts being about understanding an operator. Everything downstream uses the fundamental-subspaces picture: solvability of $Ax = b$ reduces to "is $b \in \text{Col}(A)$?"; uniqueness reduces to "is $\text{Null}(A) = {0}$?"; least squares projects $b$ onto $\text{Col}(A)$; SVD writes $A$ as a sum of rank-1 pieces aligned with these subspaces.

In applied work, rank answers questions like "how many genuinely distinct features does this data matrix have?" and "can a linear model possibly distinguish these inputs?" Low rank signals compressibility, hidden redundancy, and latent structure. Nullspace size signals how many inputs collapse to the same output -- the core of why collisions happen in hashing-like linear projections, and why some data directions are invisible to a given model.

A second, quieter payoff: whenever you write a pseudoinverse, least-squares solution, projection, or SVD argument in later clusters, you are manipulating these four subspaces. The fundamental-subspaces picture is the coordinate system for all of linear-algebra-for-engineers.

Concrete Examples

Example 1 -- rank 1 from duplicated rows. Let

$$A = \begin{pmatrix} 1 & 2 & 3 \ 2 & 4 & 6 \end{pmatrix}.$$

Row 2 is $2 \times$ row 1, so row reduction produces a single pivot. $\text{rank}(A) = 1$. Every output $Ax$ is a scalar multiple of $(1, 2)^T$, so $\text{Col}(A)$ is the line through $(1, 2)$ in $\mathbb{R}^2$. The nullspace is the set of $(x_1, x_2, x_3)$ with $x_1 + 2x_2 + 3x_3 = 0$, a 2-plane in $\mathbb{R}^3$. Nullity is 2, rank is 1, and $1 + 2 = 3$ columns -- the rank-nullity theorem ticks. The row space is the line through $(1, 2, 3)^T$, and you can verify it is orthogonal to the 2-plane nullspace by substituting any nullspace vector into the row equation.

Example 2 -- rank 2 with structure. Let

$$A = \begin{pmatrix} 1 & 0 & 1 & 2 \ 0 & 1 & 1 & 1 \ 1 & 1 & 2 & 3 \end{pmatrix}.$$

Row-reduce: $R_3 \leftarrow R_3 - R_1 - R_2$ gives a zero row. Rank is 2. The pivot columns in the original $A$ (columns 1 and 2) form a basis for $\text{Col}(A) \subseteq \mathbb{R}^3$; $\text{Col}(A)$ is the plane spanned by $(1, 0, 1)^T$ and $(0, 1, 1)^T$. Note: you read the basis of $\text{Col}(A)$ from the original matrix, not the reduced one, because row operations preserve the row space but change the column space.

For the nullspace, write columns 3 and 4 in terms of 1 and 2: column 3 $=$ column 1 $+$ column 2, giving the nullspace vector $(1, 1, -1, 0)^T$; column 4 $=$ $2\cdot$ column 1 $+$ column 2, giving $(2, 1, 0, -1)^T$. Nullity is 2. Rank + nullity = 4 = number of columns. Not every $b \in \mathbb{R}^3$ is reachable: $b$ must lie in the 2-plane $\text{Col}(A)$.

Example 3 -- numerical rank vs exact rank. Let

$$A = \begin{pmatrix} 1 & 1 \ 1 & 1 + 10^{-12} \end{pmatrix}.$$

Exact rank is 2 (the second column differs from the first by a nonzero vector). But the singular values are approximately $2$ and $5 \times 10^{-13}$. Any numerical rank decision that uses a tolerance $> 10^{-13}$ will call this matrix rank 1; at double precision (~$10^{-16}$), the right call is context-dependent. This is the engineering reason numpy.linalg.matrix_rank takes a tolerance parameter: exact rank is algebraically crisp but numerically fragile for nearly-dependent columns.

Common Confusion / Misconceptions

"Use the reduced columns as a basis for $\text{Col}(A)$." Row reduction preserves row relations but changes the column space. Pivot columns for $\text{Col}(A)$ must be pulled from the original matrix. Pivot columns of the reduced matrix tell you which columns of the original are independent, not what they are.

"Rank counts nonzero rows in the original matrix." Rank is revealed after row reduction, not before. Three nonzero original rows can reduce to one pivot if they are linearly dependent.

"Column space and row space are the same space." They have the same dimension ($= \text{rank}$), but they live in different ambient spaces: $\text{Col}(A) \subseteq \mathbb{R}^m$ and $\text{Row}(A) \subseteq \mathbb{R}^n$. The equality of dimensions is the "row rank equals column rank" theorem.

"Rank is invariant under arbitrary matrix transformations." Rank is preserved under invertible pre- or post-multiplication: $\text{rank}(PAQ) = \text{rank}(A)$ when $P, Q$ are invertible. Pre-multiplication by a singular $P$ can reduce rank (it squashes some columns to dependent combinations), and post-multiplication by a singular $Q$ can reduce rank too. "Invertible wrappers don't change rank" is the clean statement; more general transformations can.

"Nullspace is just the trivial zero vector when $A$ is square." Only if $A$ has full rank. A square matrix with nontrivial nullspace is singular; this is the same fact as "$\det A = 0$."

"Rank is independent of the ground field." Mostly yes for the fields you will encounter, but subtle in edge cases. A matrix of integers has the same rank over $\mathbb{Q}$ and over $\mathbb{R}$; over finite fields like $\mathbb{F}_2$, rank can be different (e.g., $\begin{pmatrix} 1 & 1 \ 1 & 1 \end{pmatrix}$ has rank 1 everywhere, but a matrix with $2$s in it becomes zero over $\mathbb{F}_2$ and its rank may drop). Coding theory and cryptography care about finite-field ranks; everyday ML/data work does not.

How To Use It

When analyzing a matrix:

Row-reduce $A$ to reduced row echelon form.
Count pivot columns to get rank.
For a basis of $\text{Col}(A)$, list the pivot columns from the original $A$.
For $\text{Null}(A)$, solve $Ax = 0$ by back-substitution with free variables parameterized; the coefficient vectors of the free variables form a basis.
Apply rank-nullity to sanity-check: $r + (n - r) = n$.
Interpret: high nullity $\Rightarrow$ many inputs become indistinguishable; low rank $\Rightarrow$ outputs live in a compressed lower-dimensional subspace.
For solvability, check $b \in \text{Col}(A)$ by augmenting and seeing if $\text{rank}([A \mid b]) = \text{rank}(A)$.
For numerical rank, use np.linalg.matrix_rank(A, tol=...) with an explicit tolerance; never trust the default for scientific claims, because the scale of the matrix changes the default threshold.
When rank-deficiency is structural (e.g., a graph Laplacian's zero eigenvalue for connected components), name the structural source and use it as a sanity check rather than a surprise.

Transfer / Where This Shows Up Later

S2 (algorithms): Graph Laplacian rank reveals connected components; the nullspace is spanned by connected-component indicator vectors.
S4 (computer organization): Vectorized kernels operate on tensor dimensions that correspond to column/row space axes; cache efficiency depends on which axis is "fast" in memory. Contiguous strided access along a row-major layout is reading from the row space of the tensor view.
S4 detail: Data-structure decisions like "store features as columns vs rows" are really choices about which ambient space ($\mathbb{R}^m$ or $\mathbb{R}^n$) you are optimizing locality for. Pivot column lookups, rank queries, and nullspace projections stress different axes, so the right storage depends on the dominant query.
S5 (databases): Functional dependencies in a relation correspond (loosely) to rank-deficiency in the data matrix of its attributes; compression and schema normalization share linear-algebraic intuition.
S7 (architecture): The rank of an interaction matrix between modules bounds the number of independent communication channels you can reason about before redundancy creeps in.
S8 (ranking & recommendation): Collaborative filtering's rank-$k$ factorization $R \approx UV^T$ assumes the data is well-approximated by a low-rank matrix. The approximation quality is bounded by how quickly singular values of $R$ decay (the SVD viewpoint, Cluster 5).
Phase 7 (ML): The "effective dimension" of learned features is often estimated by the numerical rank of the activation matrix; embedding collapse is nullspace growth during training. PCA keeps the top-$k$ principal components by projecting onto a rank-$k$ approximation of the data covariance. Low-rank adaptation (LoRA) for large language models constrains fine-tuning updates to be explicitly rank-$k$ factored: $\Delta W = BA$ with $B \in \mathbb{R}^{d \times k}$ and $A \in \mathbb{R}^{k \times d}$ for small $k$, cutting parameter counts by orders of magnitude while preserving most of the task lift.
Across phases: Any time you see "dominant directions," "effective features," "information bottleneck," or "latent dimensions," you are reading about rank and column-space thinking in disguise.

Check Yourself

What does $b \notin \text{Col}(A)$ tell you about $Ax = b$?
Why must basis vectors for $\text{Col}(A)$ come from the original matrix rather than from the reduced form?
What does a nontrivial nullspace say about information loss?
Why is $\text{rank}(A) = \text{rank}(A^T)$, and what does this mean for row space vs column space?
How do you read the nullspace basis directly from the reduced row echelon form?
Why does rank-nullity fail if you forget the "number of columns" part and use "number of rows"?
How do the four fundamental subspaces pair into orthogonal complements? Which theorem packages this cleanly?
Why is numerical rank tolerance-dependent, and how do you choose the tolerance in practice?
If $A \in \mathbb{R}^{m \times n}$ has rank $r$, what are the dimensions of each of the four fundamental subspaces?

Mini Drill or Application

For $A = \begin{pmatrix} 1 & 0 & 1 & 2 \ 0 & 1 & 1 & 1 \ 1 & 1 & 2 & 3 \end{pmatrix}$: row-reduce $A$; find the rank; give a basis for $\text{Col}(A)$; give a parametric description of $\text{Null}(A)$; state whether every $b \in \mathbb{R}^3$ is reachable as $Ax$.
In NumPy: A = np.array([...]); U, s, Vt = np.linalg.svd(A); rank = np.sum(s > 1e-10); print(rank). Compare against np.linalg.matrix_rank(A). Why does the singular-value threshold matter numerically?
Construct a $4\times 4$ matrix with rank 2, then verify that the nullspace has dimension 2 by solving $Ax = 0$ symbolically.
For the graph Laplacian $L$ of a graph with $c$ connected components, prove that $\text{nullity}(L) = c$ and the nullspace is spanned by component indicator vectors.
Pick a real dataset (e.g., sklearn.datasets.load_iris().data) and compute its numerical rank at several tolerances; explain what near-zero singular values tell you about feature redundancy.
Demonstrate orthogonality of $\text{Null}(A)$ and $\text{Row}(A)$ numerically: build a random $A$, compute a basis of each via scipy.linalg.null_space(A) and scipy.linalg.null_space(A.T), and confirm their inner products are at floating-point noise levels.
Construct a $3 \times 5$ matrix with exact rank 2 (e.g., two independent columns and three dependent ones). Predict the dimensions of all four fundamental subspaces and verify via np.linalg.svd.
Build a connected graph on six nodes, form its Laplacian $L = D - A$, and check by np.linalg.eigvalsh that the smallest eigenvalue is zero with multiplicity equal to the number of connected components. Add an edge; watch the spectrum update.

What This Concept Is​

Why It Matters Here​

Concrete Examples​

Common Confusion / Misconceptions​

How To Use It​

Transfer / Where This Shows Up Later​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​