Independence, Basis, and Dimension Choose Coordinates Without Waste

What This Concept Is

Linear independence means no vector in a set can be built from the others. Formally, ${v_1, \ldots, v_k}$ is independent iff $c_1 v_1 + \cdots + c_k v_k = 0$ implies every $c_i = 0$. Equivalently, each vector adds a genuinely new direction; equivalently, the matrix with these columns has trivial nullspace.

A basis of a subspace $V$ is a set that is both independent and spanning. Every vector in $V$ has a unique representation as a linear combination of basis vectors -- uniqueness is the critical property, and it follows directly from independence. The number of vectors in any basis of $V$ is the dimension $\dim V$, and this number does not depend on which basis you choose (the "invariance of dimension" theorem).

Three near-synonyms that students often conflate:

A spanning set reaches every vector in $V$ but may have redundancy.
An independent set has no redundancy but may not reach everything.
A basis is the Goldilocks point: spans without waste.

The test sequence matters. Given any spanning set, you can remove redundant vectors to obtain a basis. Given any independent set in $V$, you can add vectors to obtain a basis. Bases are maximal independent sets and minimal spanning sets -- two descriptions of the same object.

Distinguish basis from coordinate system. A basis $B = {b_1, \ldots, b_n}$ induces a coordinate map $[\cdot]_B : V \to \mathbb{R}^n$ that assigns every vector its unique combination coefficients. Different bases give different coordinates for the same vector, which is why change-of-basis matrices exist. The standard basis ${e_1, \ldots, e_n}$ of $\mathbb{R}^n$ is one special choice; any independent spanning set works equally well.

Explicit coordinate formulas. If $B$ is a basis matrix with columns $b_1, \ldots, b_n$, then coordinates of $v$ satisfy $v = B [v]_B$, so $[v]B = B^{-1} v$. Changing basis from $B$ to $B'$ uses $[v]{B'} = (B')^{-1} B,[v]_B$. When both bases are orthonormal, $B^{-1} = B^T$ and the coordinate formula collapses to a transpose multiplication -- a free, numerically stable operation. This is one reason the orthonormal-basis machinery of Cluster 3 is worth the extra construction cost.

Why It Matters Here

Basis and dimension control the rest of the module. Coordinates of vectors depend on basis choice; diagonalization depends on finding enough eigenvectors to form a basis; $QR$ and SVD build especially useful bases (orthonormal, and singular-vector aligned); rank is the dimension of the column space; uniqueness of solution sets is a dimension statement.

In engineering terms, basis choice is model representation. A bad representation hides structure (features of different variances mixed together, strong correlations that waste parameters). A good representation makes the structure obvious -- eigenbases align the action of a matrix with the coordinate axes; principal-component bases concentrate variance; Fourier bases diagonalize convolution. Choosing the right basis is frequently a more impactful design decision than choosing the right model.

Dimension is also the cleanest capacity statistic we have. Storage, bandwidth, and computational cost all scale with the dimension of the representation, not with the number of raw features you happened to record. Two ten-thousand-dimensional feature vectors whose spans are really 20-dimensional can be stored, transmitted, and multiplied 500× more cheaply if you bother to find a basis for that 20-dimensional subspace. Every compression scheme -- PCA, JPEG, low-rank fine-tuning -- cashes in on this exact fact.

Concrete Examples

Example 1 -- dependent set in $\mathbb{R}^3$. Consider

$$v_1 = \begin{pmatrix} 1 \ 0 \ 1 \end{pmatrix}, \quad v_2 = \begin{pmatrix} 0 \ 1 \ 1 \end{pmatrix}, \quad v_3 = \begin{pmatrix} 1 \ 1 \ 2 \end{pmatrix}.$$

Check: $v_3 = v_1 + v_2$. So $1\cdot v_1 + 1\cdot v_2 + (-1)\cdot v_3 = 0$ with nontrivial coefficients; the set is dependent. These three vectors span only a 2-plane. A basis for that plane is ${v_1, v_2}$; dimension is 2.

The lesson is not merely that one vector was redundant. The lesson is that coordinates should be assigned only after redundancy is removed -- otherwise the representation has ambiguity, and downstream algorithms (regression, classification, compression) silently waste parameters or become ill-conditioned.

Example 2 -- basis for a kernel. Let

$$A = \begin{pmatrix} 1 & 1 & 1 & 1 \end{pmatrix}.$$

The nullspace is ${x \in \mathbb{R}^4 : x_1 + x_2 + x_3 + x_4 = 0}$, a 3-dimensional hyperplane. Parameterize $x_2, x_3, x_4$ freely and set $x_1 = -(x_2 + x_3 + x_4)$. The basis reads off directly:

$$b_1 = \begin{pmatrix} -1 \ 1 \ 0 \ 0 \end{pmatrix}, \quad b_2 = \begin{pmatrix} -1 \ 0 \ 1 \ 0 \end{pmatrix}, \quad b_3 = \begin{pmatrix} -1 \ 0 \ 0 \ 1 \end{pmatrix}.$$

Verify each satisfies $Ab_i = 0$ and they are independent. $\dim \text{Null}(A) = 3$, consistent with rank-nullity (rank 1, 4 columns, nullity 3).

Example 3 -- two bases for the same plane. In the plane spanned by $v_1, v_2$ of Example 1, the vector $w = \begin{pmatrix} 3 \ 2 \ 5 \end{pmatrix}$ has coordinates $[w]B = (3, 2)^T$ in the basis $B = {v_1, v_2}$ because $w = 3v_1 + 2v_2$. Rotate the basis: $B' = {v_1 + v_2, v_1 - v_2}$. Then $w = \tfrac{5}{2}(v_1 + v_2) + \tfrac{1}{2}(v_1 - v_2)$, so $[w]{B'} = (\tfrac{5}{2}, \tfrac{1}{2})^T$. Same vector, different coordinates, related by an invertible change-of-basis matrix.

Example 4 -- extending an independent set. In $\mathbb{R}^3$, start with the independent set ${(1, 0, 1)^T, (0, 1, 1)^T}$. To extend to a basis, pick any vector not in their span: $(1, 0, 0)^T$ works, because $(1, 0, 0) \ne a(1, 0, 1) + b(0, 1, 1)$ for any $a, b$ (the third coordinate would force $a + b = 0$ while the first forces $a = 1$ and the second $b = 0$, contradiction). So ${(1, 0, 1), (0, 1, 1), (1, 0, 0)}$ is a basis of $\mathbb{R}^3$; verify by computing the determinant of the matrix with those columns ($= 1 \ne 0$). This is the "extension to a basis" operation every spanning-set argument relies on.

Common Confusion / Misconceptions

"Three vectors in $\mathbb{R}^3$ always form a basis." False. They must be independent. Three vectors on a plane through the origin are three vectors in $\mathbb{R}^3$ but span only a 2D subspace.

"Any minimal spanning set and any maximal independent set are unrelated notions." In finite-dimensional spaces, they are two equivalent ways of recognizing a basis. Every minimal spanning set is independent; every maximal independent set spans.

"Basis means orthonormal basis." Only in some contexts. An orthonormal basis additionally has $b_i \cdot b_j = \delta_{ij}$. Most bases you meet in raw data are not orthonormal -- Gram-Schmidt or $QR$ is how you upgrade them when orthonormality is needed.

"Dimension is the number of equations written down." No. For a subspace cut out by $k$ independent homogeneous equations in $\mathbb{R}^n$, the dimension is $n - k$. Dimension is the count of free directions after constraints apply, not the count of constraints.

"Two independent vectors automatically span a plane containing a given third vector." Only if the third vector lies in that plane. "Independent" is a statement about the set; "spans $w$" is a separate claim that requires $w$ to sit inside the span. Many students conflate them and end up believing in bases that do not exist.

How To Use It

When given a vector set:

Put the vectors into a matrix as columns.
Row-reduce.
Pivot columns (in the original matrix) identify an independent subset.
Count pivots to get dimension of the span.
If you need a basis, keep the pivot columns; drop the rest.

When given a subspace description:

Identify the defining equations or spanning vectors.
Use constraints to solve for the degrees of freedom.
Express general vectors in parametric form.
Read off a basis from the free parameters.

Use basis language to replace vague statements like "there are a few directions here" with exact dimensional claims. When a tool returns "the input is low-rank," it is telling you "a small basis suffices." When a dataset has high effective dimension, it is telling you "no small basis will do; expect expensive models."

For numerical work, prefer np.linalg.matrix_rank(A) and scipy.linalg.null_space(A) over hand row reduction -- both use a thresholded SVD internally, which gives a well-defined "numerical dimension" even when perfect arithmetic rank is ambiguous. When building a basis by hand, always sanity-check by multiplying $A B = 0$ for nullspace bases or by solving $A x = b$ with $b$ taken from the purported column-space basis.

Transfer / Where This Shows Up Later

S2 (algorithms): The cycle space and cut space of a graph each have a natural basis; spanning-tree algorithms exploit this.
S4 (computer organization): SIMD register width determines the natural "hardware basis" for vectorized loops; loop-unroll factors are chosen to align with lane counts.
S5 (databases & queueing): Principal components of a correlation matrix provide a basis aligned with variance directions -- the underpinning of many analytics dashboards.
S7 (architecture): A "reference architecture" plays the role of a basis: a small independent set from which team-specific architectures are constructed by combination.
S8 (ranking): Word embeddings are coordinates in a learned basis for semantic directions (gender, tense, plurality). The basis is not given; it is learned.
Phase 7 (ML): PCA selects an orthonormal basis aligned with variance. Random feature methods (Johnson-Lindenstrauss) use random bases that nearly preserve distances. Transformer attention can be read as a learnable change-of-basis over token embeddings. Sparse-coding dictionaries are overcomplete "almost-bases" that relax independence to allow redundant but expressive representations.
Across phases: Any argument that begins "express $v$ in terms of ..." is implicitly fixing a basis. Making that basis explicit -- and justifying its fitness for the task -- is one of the cheapest rigor improvements in any mathematical writeup.

Check Yourself

Why does dependence imply representational waste?
Can a set be independent without spanning the whole ambient space? Give an example.
How do pivots reveal a basis candidate?
Why is the dimension of a subspace independent of the basis you choose?
If $\dim V = n$ and ${v_1, \ldots, v_n} \subset V$, why does spanning imply independence (and vice versa) in this case?
What is the difference between "dimension" and "number of generators I wrote down"?
If a set of four vectors in $\mathbb{R}^3$ is given, why must it be dependent? What is the largest independent subset possible, and how do you find it?
In what sense are "basis" and "coordinate system" two words for the same idea?

Mini Drill or Application

Determine whether ${(1,1,0), (1,0,1), (0,1,1)}$ is independent. If it is, state what subspace it spans and why.
Find a basis and dimension for the subspace of $\mathbb{R}^4$ defined by $x_1 + x_2 + x_3 + x_4 = 0$ and $x_1 - x_2 = 0$.
In NumPy: A = np.array([[1,0,1,2],[0,1,1,1],[1,1,2,3]]); rank = np.linalg.matrix_rank(A); ns = scipy.linalg.null_space(A); print(rank, ns.shape). Confirm rank + nullity equals 4.
Extend the independent set ${(1, 2, 0), (0, 1, 3)}$ to a basis of $\mathbb{R}^3$; verify your choice by computing a determinant.
Explain in one sentence why "dimension" is not the same as "number of equations written down," and give a counterexample where the equation count overstates the constraint.
Given three feature vectors in $\mathbb{R}^{1000}$ produced by summing two base features, describe exactly which basis your downstream model should be regressing against and why using the raw features induces rank deficiency.

What This Concept Is​

Why It Matters Here​

Concrete Examples​

Common Confusion / Misconceptions​

How To Use It​

Transfer / Where This Shows Up Later​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​