Linear Transformations Are the Real Object and Matrices Are Representations

What This Concept Is

A linear transformation (or linear map) is a function $T : V \to W$ between vector spaces that preserves the two basic vector operations:

$T(u + v) = T(u) + T(v)$
$T(c u) = c,T(u)$

These two conditions together imply $T(0) = 0$, which is why a map $T(x) = Ax + b$ with $b \ne 0$ is not linear (it is affine). They also imply that $T$ is completely determined by its action on any basis: once you know what $T$ does to basis vectors, the action on every vector follows by linearity.

Matrices are how we represent linear transformations once bases are chosen. Given bases $B$ for $V$ and $C$ for $W$, each linear map $T$ has a unique matrix $[T]_C^B$ whose $j$-th column is the coordinate vector of $T(b_j)$ in basis $C$. Different bases produce different matrices for the same underlying map. This means the matrix is not the deepest object; the transformation is. The matrix is a coordinate encoding of a coordinate-free operator.

Distinguish several related objects:

Linear transformation $T$: the map itself, existing independently of any basis.
Matrix representation $[T]$: a numeric encoding, once bases are fixed.
Change-of-basis matrix $P$: converts coordinates from one basis to another; $[v]_{B'} = P^{-1}[v]_B$.
Similarity transformation: $[T]_{B'} = P^{-1}[T]_B P$, showing that two matrices are the "same map in different clothes."

Three fundamental operations on transformations correspond to three operations on matrices. Composition of $T$ then $S$ gives matrix $[S][T]$ (note order). Addition of $T + S$ gives $[T] + [S]$. Scalar scaling $cT$ gives $c[T]$. The kernel and image of $T$ correspond to the nullspace and column space of $[T]$.

Why It Matters Here

This viewpoint stabilizes many later ideas. Change of basis becomes "pick coordinates that make the action obvious." Diagonalization becomes "pick a basis of eigenvectors so the matrix becomes diagonal." SVD becomes "pick input and output bases so that $T$ acts by coordinate-wise scaling." Orthogonal representations become "pick an orthonormal basis so that coordinates preserve lengths."

If you think a matrix is the action itself rather than a coordinate encoding of the action, change-of-basis arguments remain mysterious. If you internalize the distinction, then similarity, diagonalization, SVD, and coordinate-free arguments in physics and graphics become natural. The transformation-first viewpoint is also what lets you compare two matrices that look completely different and ask whether they represent the same operator.

Concrete Examples

Example 1 -- a shear and its standard matrix. Consider $T : \mathbb{R}^2 \to \mathbb{R}^2$ defined by $T(x, y) = (x + y, y)$. Check linearity by direct computation. In the standard basis ${e_1, e_2}$, the matrix is built column by column:

$$T(e_1) = T(1, 0) = (1, 0), \quad T(e_2) = T(0, 1) = (1, 1),$$

$$[T]_{std} = \begin{pmatrix} 1 & 1 \ 0 & 1 \end{pmatrix}.$$

Each column tells you how the corresponding basis vector moves. The unit square with corners at $e_1, e_2$ becomes a parallelogram with corners at $T(e_1), T(e_2)$.

Example 2 -- same map, different basis. Keep the same $T$ but switch to basis $B' = {(1, 1), (1, -1)}$. Compute $T(1, 1) = (2, 1)$ and express it in $B'$: $(2, 1) = \tfrac{3}{2}(1, 1) + \tfrac{1}{2}(1, -1)$. Similarly $T(1, -1) = (0, -1) = -\tfrac{1}{2}(1, 1) + \tfrac{1}{2}(1, -1)$. Then

$$[T]_{B'} = \begin{pmatrix} 3/2 & -1/2 \ 1/2 & 1/2 \end{pmatrix}.$$

This matrix looks nothing like the standard matrix, yet represents the exact same transformation. The change-of-basis matrix $P = \begin{pmatrix} 1 & 1 \ 1 & -1 \end{pmatrix}$ relates them via $[T]{B'} = P^{-1}[T]{std} P$.

Example 3 -- a non-linear impostor. Consider $S(x, y) = (2x - y + 1, x + 3y)$. Check: $S(0, 0) = (1, 0) \ne (0, 0)$. Fails the "$T(0) = 0$" requirement, so $S$ is not linear. It is affine; it can be written as $S(v) = Av + c$ where $A$ is linear and $c = (1, 0)^T$. The distinction matters because matrix representations, the fundamental subspaces, eigen-analysis, and SVD are all built on linearity.

Common Confusion / Misconceptions

"Test linearity by whether the formula looks simple." Unreliable. A simple-looking formula like $T(x) = x + 1$ fails linearity; a complicated-looking one like the Fourier transform (viewed on a finite-dimensional periodic space) satisfies it exactly. Always test with the two axioms.

"A matrix always represents the same action in any basis." A matrix only has meaning relative to a basis choice. The same numeric matrix applied under two different basis choices represents two different transformations.

"Affine maps are linear." Only when the translation component is zero. Affine maps $T(v) = Av + b$ with $b \ne 0$ are crucial in graphics, neural networks (biases!), and robotics, but they live in a strictly bigger class than linear maps.

"The kernel and image of $T$ depend on the basis choice." They do not. Kernel and image are properties of the map, invariant under basis change. What changes is the coordinate representation of vectors in the kernel and image -- not the kernel and image themselves.

How To Use It

When given a transformation:

Test linearity directly with both axioms.
Apply $T$ to each basis vector of the domain.
Express each output in the codomain basis.
Stack those coordinate vectors as columns; that matrix is $[T]$.

When given a matrix:

Ask which vector space and basis it is written in. If unstated, assume standard basis.
Interpret its columns as images of basis vectors.
Use composition via multiplication to combine transformations.
If you want a cleaner representation, pick a basis adapted to the map (eigenvectors, singular vectors, invariant subspaces).

When given two matrices that should represent the same map, check similarity: are they related by $P^{-1}MP$ for some invertible $P$?

Transfer / Where This Shows Up Later

S2 (algorithms): Polynomial evaluation and multiplication are linear in their coefficients; FFT is a change of basis (monomial to Fourier) that makes convolution diagonal.
S4 (computer organization): Graphics pipelines apply chains of linear (rotation, scaling) and affine (translation) transformations; the 4×4 homogeneous matrix trick converts affine maps into linear ones so they compose uniformly.
S5 (databases & queueing): State updates in discrete-time linear systems (including Markov chains) are linear transformations; change of basis reveals modes (eigenvectors) that govern long-run behavior.
S7 (architecture): An adapter or bridge component between two subsystems is literally a linear (or more general) map between their state spaces; the architecture review asks whether that map is well-posed.
S8 (ranking): Learned embedding functions for entities are (near-)linear maps from sparse feature spaces to dense vector spaces; similarity in embedding space is the pullback of the inner product.
Phase 7 (ML): Every linear layer of a neural network is a linear map followed by an additive bias (affine) and a nonlinearity. Weight matrices are matrix representations of those linear maps. Initialization schemes (Xavier, He) are chosen so that the linear-map representations preserve signal magnitude through the network.

Check Yourself

Why do the columns of a matrix encode the images of basis vectors?
What fails if a function contains a constant offset, such as $T(x) = Ax + b$ with $b \ne 0$?
Why can two different matrices represent the same transformation?
If $T : \mathbb{R}^3 \to \mathbb{R}^2$, what is the shape of its matrix?
How does a change-of-basis matrix relate the coordinates of the same vector in two bases?
Why are kernel and image invariant under basis change, even though their coordinate representations change?

Mini Drill or Application

Decide whether each map is linear:
- $T_1(x, y) = (2x - y, x + 3y)$
- $T_2(x, y) = (2x - y + 1, x + 3y)$
- $T_3(x, y) = (x^2, y)$ For the linear one, find the standard matrix.
For the rotation map $R_\theta$ on $\mathbb{R}^2$, derive the standard matrix by applying it to $e_1$ and $e_2$. Verify $R_\theta R_\phi = R_{\theta+\phi}$.
In NumPy: let T = lambda v: np.array([2*v[0] - v[1], v[0] + 3*v[1]]). Compute np.column_stack([T(np.array([1,0])), T(np.array([0,1]))]) and confirm it equals the standard matrix you derived.
Change basis: for the shear matrix $\begin{pmatrix} 1 & 1 \ 0 & 1 \end{pmatrix}$, compute its representation in the basis $B' = {(1, 1), (1, -1)}$ by forming $P^{-1}[T]_{std} P$ with $P = \begin{pmatrix} 1 & 1 \ 1 & -1 \end{pmatrix}$.
Explain how you would compute the matrix of the projection onto the plane $x + y + z = 0$ in the standard basis, starting from its action on an orthonormal basis of that plane.

What This Concept Is​

Why It Matters Here​

Concrete Examples​

Common Confusion / Misconceptions​

How To Use It​

Transfer / Where This Shows Up Later​

Check Yourself​

Mini Drill or Application​

Read This Only If Stuck​