Linear Transformations
- The Central Question: What Does a Function That Preserves Structure Look Like?
- What Is a Linear Transformation?
- Every Linear Transformation Is a Matrix
- The Kernel (Nullspace): What Gets Destroyed
- The Image (Range): What Gets Produced
- The Rank-Nullity Theorem: The Conservation Law
- Injectivity, Surjectivity, and Invertibility
- Standard Transformations Gallery
- Composition: Transformations Multiply
- Summary
- Applications in Data Science and Machine Learning
- Guided Problems
- References
The Central Question: What Does a Function That Preserves Structure Look Like?
We want to understand functions that act on vectors. In machine learning, data flows through layers of operations: a neural network takes an input vector and produces an output . The simplest such functions, the building blocks of everything from image classifiers to language models, are linear transformations.
Consider these scenarios:
- Rotating an image: Each pixel coordinate maps to a new location
- Scaling features: A preprocessing step multiplies each feature by a different constant
- Neural network layer: An input vector (a flattened image) transforms to (a hidden layer)
What do these operations have in common? They all preserve the fundamental structure of linear combinations. If you know what happens to basic building blocks, you know what happens to everything built from them. Understanding this structure unlocks powerful tools: matrix representation, composition via multiplication, and the Rank-Nullity Theorem that tells us exactly what information a transformation preserves and what it destroys.
What Is a Linear Transformation?
Consider the function that rotates every vector by 90° counterclockwise.
Take the vector :
The point rotates to .
Now let's check a crucial property. Take two vectors and :
They're equal: . This is no coincidence. It's the defining property of linearity.
Geometrically, a linear transformation:
- Preserves the origin: The zero vector always maps to zero
- Preserves lines through the origin: A line through the origin maps to another line (or point) through the origin
- Preserves parallelism: Parallel lines remain parallel after transformation
- Preserves ratios on lines: If is the midpoint of , then is the midpoint of
Think of it as "stretching, rotating, reflecting, or projecting" the entire space while keeping the origin fixed.

Figure: A linear transformation maps the standard grid to a parallelogram grid. Grid lines remain straight and parallel. The origin stays fixed.
A function between vector spaces is a linear transformation (or linear map) if it satisfies two properties for all vectors and all scalars :
- Additivity:
- Homogeneity:
Equivalently (combining both):
Every linear transformation maps the zero vector to the zero vector:
Proof: for any vector .
Key Properties:
| Property | Description | Example |
|---|---|---|
| Additivity | Rotating = rotating + rotating | |
| Homogeneity | Rotating = twice rotating | |
| Origin fixed | Rotation keeps the origin in place |
Every Linear Transformation Is a Matrix
Consider the 90° rotation from before. Let's find what it does to the standard basis vectors:
Now, any vector .
By linearity:
This is exactly matrix multiplication:
The matrix columns are the images of the basis vectors.
The geometric insight: The matrix of a linear transformation tells you where the "coordinate axes" go:
- Column 1: Where lands
- Column 2: Where lands
- And so on...
Once you know where the basis vectors go, linearity determines everything else.
Let be a linear transformation. Then there exists a unique matrix such that:
The matrix is constructed by placing the images of the standard basis vectors as columns:
📌 Proof
Let be the standard basis for . Any vector can be written as:
By linearity of :
Define the matrix with columns . Then:
For uniqueness: if for all , then = column of , so .
Key Properties:
| Property | Matrix Form | Example |
|---|---|---|
| Column of | = image of -th basis vector | For rotation: |
| Applying | ||
| Size of | for | Rotation : matrix |
To find the matrix of a linear transformation :
- Compute
- Place these as columns:
The Kernel (Nullspace): What Gets Destroyed
Consider the projection that projects vectors onto the -plane:
The matrix is .
Which vectors get mapped to zero?
The kernel is the entire -axis:
Geometrically, the kernel consists of all vectors that get "crushed" to zero by the transformation. For our projection:
- The -axis gets flattened onto the origin
- All information about "height" is lost
- Once you project, you can't recover the original -coordinate

Figure: The kernel of projection onto the -plane is the -axis. Every point on the -axis maps to the origin.
The kernel (or nullspace) of a linear transformation is the set of all vectors that map to zero:
For a matrix , this is .
The kernel of any linear transformation is a subspace of the domain.
Proof sketch: If and , then . Similarly for scalar multiplication.
Key Properties:
| Property | Description | Our Example |
|---|---|---|
| Subspace | is always a subspace of the domain | The -axis is a subspace of |
| Contains zero | always | The origin is on the -axis |
| Information loss | Vectors in the kernel lose all their information | Any becomes |
The Image (Range): What Gets Produced
Using the same projection :
What vectors can we reach as outputs?
The image is the entire -plane.
Geometrically, the image tells you the "range of possibilities" for the output. For our projection:
- Every point in lands somewhere on the -plane
- The -plane is the "shadow" or "footprint" of the transformation
- Points off the -plane (like ) can never be outputs
The image (or range) of a linear transformation is the set of all possible outputs:
For a matrix , this is the column space .
The image of any linear transformation is a subspace of the codomain.
Proof sketch: If and are in the image, then is also in the image.
Key Properties:
| Property | Description | Our Example |
|---|---|---|
| Subspace | is always a subspace of the codomain | The -plane is a subspace of |
| Span of columns | Columns span the -plane | |
| Reachability | has a solution | Only vectors with are reachable |
The Rank-Nullity Theorem: The Conservation Law
Our projection onto the -plane:
- Domain dimension:
- Kernel dimension (nullity): (the -axis is a line)
- Image dimension (rank): (the -plane)
Notice: . The input dimension equals nullity plus rank.
The key insight: The Rank-Nullity Theorem says dimension is conserved.
Think of the input space as having a certain "budget" of dimensions. A linear transformation either:
- Preserves a dimension (it goes into the image), or
- Destroys a dimension (it goes into the kernel)
No dimension can be created or lost. They're redistributed between "preserved" and "destroyed."
Conservation of Information:
| Component | Role | ML Interpretation |
|---|---|---|
| Input dimension | Total information entering | Feature space dimension |
| Rank | Information that survives | Dimensions the model uses |
| Nullity | Information destroyed | Dimensions the model ignores |
See Defining the Four Subspaces for detailed proofs and the complete subspace picture.

Figure: The Rank-Nullity Theorem visualized. The domain decomposes into kernel (destroyed dimensions) and a complement (preserved dimensions). The preserved dimensions map isomorphically onto the image.
Let be a linear transformation where is finite-dimensional. Then:
Or equivalently:
For an matrix :
📌 Proof Sketch
Let be a basis for .
Extend this to a basis for all of .
Claim: is a basis for .
Spans: Any (since ).
Independent: If , then , so . Linear independence of the extended basis forces all .
Therefore: .
Key Properties:
| Property | Formula | Our Example |
|---|---|---|
| Conservation | ||
| Rank from nullity | ||
| Nullity from rank |
The theorem is powerful for deduction:
- If you know the matrix is and has rank 2, the nullspace has dimension
- If you know the nullspace is trivial (), the rank equals the number of columns
- If rank = number of columns, the transformation is injective (one-to-one)
Injectivity, Surjectivity, and Invertibility
Injective (One-to-One)
A transformation is injective if different inputs always produce different outputs: .
is injective if and only if .
Why? If , then , so . If the kernel is trivial, then , so .
Surjective (Onto)
A transformation is surjective if every possible output is actually achieved: .
is surjective if and only if .
Invertibility
A transformation is invertible (bijective) if it's both injective and surjective.
For with :
is invertible
Summary Table:
| Property | Condition | Matrix Test |
|---|---|---|
| Injective | Nullspace is trivial; rank = number of columns | |
| Surjective | Rank = number of rows | |
| Invertible | Both | Square matrix with rank = |
For the complete list of 14 equivalent conditions (including determinant, eigenvalues, and the four fundamental subspaces), see The Invertible Matrix Theorem.
Standard Transformations Gallery
Rotation (2D)
Rotation by angle counterclockwise:
- Kernel: (nothing gets crushed)
- Image: All of
- Invertible: Yes,
Projection
Projection onto the -axis in :
- Kernel: The -axis (dimension 1)
- Image: The -axis (dimension 1)
- Rank-Nullity: ✓
Reflection
Reflection across the -axis:
- Kernel:
- Image: All of
- Invertible: Yes, (self-inverse)
Scaling
Uniform scaling by factor :
- Kernel: if ; all of if
- Invertible: Yes if
Shear
Horizontal shear by factor :
- Kernel:
- Image: All of
- Invertible: Yes,
Composition: Transformations Multiply
Let be 90° rotation and be scaling by 2.
Apply rotation first, then scaling:
Apply scaling first, then rotation:
In this case they're equal (scaling commutes with everything). But in general, order matters.
If has matrix and has matrix , then the composition has matrix .
Summary
Fundamental definitions:
- A linear transformation preserves addition () and scalar multiplication ()
- Every linear transformation has a unique matrix with columns
Kernel and Image:
- Kernel : vectors destroyed (mapped to zero)
- Image : all possible outputs
The conservation law:
- Rank-Nullity Theorem:
- Input dimensions are either preserved (image) or destroyed (kernel)—no dimensions created or lost
Injectivity and surjectivity:
- Injective (one-to-one) rank = number of columns
- Surjective (onto) rank = number of rows
- Invertible both square matrix with full rank
Composition:
- has matrix (order reversed: last applied goes first)
Answering the Central Question: A function that preserves the structure of linear algebra (addition and scalar multiplication) is a linear transformation, and every such function between finite-dimensional spaces is uniquely represented by a matrix. Its kernel and image reveal exactly what information is preserved and what is destroyed, governed by the rank-nullity theorem: .
Applications in Data Science and Machine Learning
Linear transformations appear throughout machine learning, often disguised as matrices or "layers."
Neural Network Layers
Each linear layer in a neural network is a linear transformation . Without the bias , it's purely linear. The weight matrix transforms -dimensional inputs to -dimensional outputs.
Layer Types by Shape:
| Layer Type | Shape | Behavior |
|---|---|---|
| Expansion () | Skinny | Embeds into higher-dim space; cannot reach all of |
| Compression () | Fat | Dimensionality reduction; non-trivial nullspace guaranteed |
| Square () | Square | Can be invertible (if full rank) |
The linearity of has important consequences:
- Response to sums: The network's response to a sum of inputs equals the sum of responses. This property is exploited in understanding gradients and backpropagation
- Rank of : Determines the "effective dimensionality" of the layer's output
- Kernel of : Input directions that produce zero output (potential dead features)
- Bottlenecks: If rank, the layer compresses information
When we say a neural network layer is "a weight matrix ," we're using the Matrix Representation Theorem implicitly. Training the network means finding the right columns: where should each input feature direction map?
A deep neural network (without nonlinearities) computes . This is just one big linear transformation. Without nonlinear activation functions, depth adds no expressive power—the product of matrices is still a single matrix.
Principal Component Analysis (PCA)
PCA projects high-dimensional data onto a lower-dimensional subspace via , where contains the top principal components.
- Rank: (the number of components kept)
- Kernel: Dimension (the discarded directions)
- Rank-Nullity interpretation: The variance explained by the kept components plus the variance discarded equals total variance
Dimensionality Reduction and Bottleneck Layers
Any linear dimensionality reduction from to (where ) is a rank- transformation. By Rank-Nullity:
- The kernel has dimension (information destroyed)
- The image has dimension (information preserved)
Autoencoders: The Bottleneck Architecture
Consider a linear autoencoder: Input () → Encoder → Latent () → Decoder → Output ().
| Component | Matrix Shape | Type | Role |
|---|---|---|---|
| Encoder | Fat matrix | Compresses high-dim input to low-dim code | |
| Decoder | Skinny matrix | Embeds low-dim code back into high-dim space | |
| Full system | Square | Reconstruction (ideally close to identity on data manifold) |
Information flow:
- Encoder (Fat): Massive information destruction—nullspace has dimension . Many inputs map to the same code.
- Decoder (Skinny): Cannot reach all of —output lives in a -dimensional subspace.
- Bottleneck: The rank of the reconstruction is bounded by , regardless of .
The bottleneck forces the network to learn a compressed representation. The nullspace of the encoder contains all variations the network "ignores." For a well-trained autoencoder on natural images, this nullspace should contain noise while the image's essential structure survives.
Feature Engineering
Transforming raw features to engineered features often involves linear transformations:
- Standardization: (affine, nearly linear)
- Whitening: (linear after centering)
- Random projections: for random matrix
Guided Problems
Problem 1: Determining Injectivity from the Matrix
Consider the linear transformation given by:
- What is the rank of ?
- What is the dimension of ?
- Is injective? Is surjective?
💡 Solution
Hints:
- Row reduce to find the rank
- Use Rank-Nullity: nullity = - rank
- Injective requires kernel = 0
Solution:
-
Rank: Row reduce : Two pivots, so rank = 2.
-
Nullity: By Rank-Nullity, , so nullity = 1.
-
Injectivity/Surjectivity:
- is not injective because (nullity = 1 > 0)
- is surjective because rank = 2 = dim() (the codomain)
Problem 2: Composition and Rank
Let be a linear transformation with rank 2, and let be a linear transformation with rank 3.
- What are the possible values for rank?
- Give an example achieving the maximum rank.
- Give an example achieving the minimum rank.
💡 Solution
Hints:
- rank
- The composition's image is contained in 's image
- The composition's kernel contains 's kernel
Solution:
-
Possible values: rank, and rank. So rank can be 0, 1, or 2.
-
Maximum rank = 2: Let (projects onto first two coordinates) and (embeds into ).
Then , which has rank 2.
-
Minimum rank = 0: Let (puts output in last two components) and (only uses first component).
Then for all , giving rank 0.
Problem 3: Kernel and Image of a Projection
Let be the projection onto the plane .
- Find a basis for .
- Find a basis for .
- Verify the Rank-Nullity Theorem.
- What is ? What does this tell you about projections?
💡 Solution
Hints:
- The kernel of a projection onto a plane is the line perpendicular to that plane
- The image of a projection onto a plane is the plane itself
- For projections, (applying twice is the same as once)
Solution:
-
Kernel: The kernel consists of vectors perpendicular to the plane . The normal to this plane is .
Basis for :
-
Image: The image is the plane itself. Two independent vectors in this plane:
Basis for :
-
Rank-Nullity:
- dim() = 3
- nullity = 1, rank = 2
- ✓
-
: For any projection, . Once you're on the plane, projecting again does nothing. You stay where you are. This property () is called idempotence and characterizes projections.
References
- MIT OpenCourseWare - 18.06SC Linear Algebra - Linear Transformations and Their Matrices
- Stanford EE263 - Introduction to Linear Dynamical Systems - Linear Algebra Review
- Strang, Gilbert - Introduction to Linear Algebra (Chapter 8)
- Mathematics LibreTexts - Kernel and Image of a Linear Transformation