Skip to main content

Matrix Inverse and Invertibility


The Central Question: Can We Undo a Transformation?

Consider a linear transformation that maps vectors from Rn\mathbb{R}^n to Rm\mathbb{R}^m. We encode this transformation as a matrix AA, and applying it to a vector xx gives us Ax=bAx = b. The fundamental question of invertibility is this: given the output bb, can we uniquely recover the input xx?

If the answer is "yes" for every possible bb, then the transformation preserves all information and can be undone. The matrix that undoes this transformation is called the inverse, denoted A1A^{-1}.


Definition and Intuition

Definition: Invertible Matrix

A square matrix AA (of size n×nn \times n) is invertible (also called non-singular or non-degenerate) if there exists a matrix A1A^{-1} such that:

AA1=A1A=IAA^{-1} = A^{-1}A = I

where II is the n×nn \times n identity matrix.

The inverse matrix A1A^{-1}, when it exists, is unique. Here's why: suppose both BB and CC are inverses of AA. Then:

B=BI=B(AC)=(BA)C=IC=CB = BI = B(AC) = (BA)C = IC = C

Geometric Intuition: Think of AA as a machine that transforms vectors. The inverse A1A^{-1} is the machine that runs the transformation backwards. If AA stretches space by a factor of 2, then A1A^{-1} compresses it by a factor of 2. If AA rotates space by 30°, then A1A^{-1} rotates it by −30°.

Pseudo-inverse

Only square matrices can be invertible in the classical sense. A tall matrix (m>nm > n) squeezes Rn\mathbb{R}^n into a subspace of Rm\mathbb{R}^m—you can't get back to the full Rn\mathbb{R}^n. A wide matrix (m<nm < n) collapses some directions—multiple inputs map to the same output, so you can't uniquely recover the input. For non-square matrices, we use the pseudo-inverse (Moore-Penrose inverse), which gives the "best" answer in a least-squares sense.

Example: Tall Matrix (3×23 \times 2) — Not onto, can't reach all outputs

A=[100100]A = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix}

This matrix takes 2D vectors and embeds them into 3D: [xy][xy0]\begin{bmatrix} x \\ y \end{bmatrix} \mapsto \begin{bmatrix} x \\ y \\ 0 \end{bmatrix}

The output is always in the xyxy-plane of R3\mathbb{R}^3. If someone gives you b=[125]\mathbf{b} = \begin{bmatrix} 1 \\ 2 \\ 5 \end{bmatrix}, there's no xR2\mathbf{x} \in \mathbb{R}^2 such that Ax=bA\mathbf{x} = \mathbf{b}—you simply can't reach vectors with non-zero zz-component. No inverse can exist because the transformation isn't onto.

Example: Wide Matrix (2×32 \times 3) — Not one-to-one, multiple inputs give same output

B=[101011]B = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{bmatrix}

This matrix projects 3D vectors down to 2D. Consider these two different inputs:

B[110]=[11],B[001]=[11]B\begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}, \quad B\begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}

Both inputs map to [11]\begin{bmatrix} 1 \\ 1 \end{bmatrix}! In fact, [110]\begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix} and [001]\begin{bmatrix} 0 \\ 0 \\ 1 \end{bmatrix} differ by [111]\begin{bmatrix} 1 \\ 1 \\ -1 \end{bmatrix}, which is in the nullspace. Given an output, you can't uniquely recover the input—the transformation isn't one-to-one. No inverse can exist.

Example: Fitting a Line to Noisy Data

Suppose we have 4 data points and want to fit a line y=mx+by = mx + b. This gives us 4 equations in 2 unknowns—an overdetermined system that generally has no exact solution. The pseudo-inverse finds the line that minimizes the sum of squared errors (residuals).

Pseudo-inverse least squares example

The left panel shows the data points (blue), the best-fit line (red), and the residuals (green dashed). The right panel shows the geometric interpretation: the target vector b\mathbf{b} lies outside the column space of AA, so we project onto the closest point b^=Ax^\hat{\mathbf{b}} = A\hat{\mathbf{x}}. The pseudo-inverse formula x^=(ATA)1ATb\hat{\mathbf{x}} = (A^TA)^{-1}A^T\mathbf{b} computes this projection directly.


Conditions for Invertibility

For a square n×nn \times n matrix AA, the following conditions are all equivalent. If any one is true, they are all true. If any one is false, they are all false.

Theorem: The Invertible Matrix Theorem

For an n×nn \times n matrix AA, the following statements are equivalent:

  1. AA is invertible
  2. AA has full rank: rank(A)=n\text{rank}(A) = n
  3. The columns of AA are linearly independent
  4. The rows of AA are linearly independent
  5. det(A)0\det(A) \neq 0
  6. The nullspace contains only the zero vector: N(A)={0}N(A) = \{\mathbf{0}\}
  7. The left nullspace contains only the zero vector: N(AT)={0}N(A^T) = \{\mathbf{0}\}
  8. The column space is all of Rn\mathbb{R}^n: C(A)=RnC(A) = \mathbb{R}^n
  9. The row space is all of Rn\mathbb{R}^n: C(AT)=RnC(A^T) = \mathbb{R}^n
  10. The equation Ax=bAx = b has a unique solution for every bRnb \in \mathbb{R}^n
  11. The equation Ax=0Ax = 0 has only the trivial solution x=0x = 0
  12. AA can be row-reduced to the identity matrix II
  13. AA is a product of elementary matrices
  14. All eigenvalues of AA are non-zero

Why?

  • Rank = n means all columns are independent (no redundancy), so they span all of Rn\mathbb{R}^n
  • If columns span Rn\mathbb{R}^n, every bb is reachable, so Ax=bAx = b always has a solution
  • If columns are independent, Ax=0Ax = 0 has only x=0x = 0, so N(A)={0}N(A) = \{0\}, meaning solutions are unique
  • If we can reach every bb with a unique xx, the transformation is one-to-one and onto—it's invertible

Invertibility and the Four Fundamental Subspaces

The four fundamental subspaces tell the complete story of when and why a matrix is invertible.

When AA is Invertible

For an invertible n×nn \times n matrix AA:

SubspaceDimensionDescription
Column Space C(A)C(A)nn=Rn= \mathbb{R}^n (reaches everywhere)
Row Space C(AT)C(A^T)nn=Rn= \mathbb{R}^n (uses all input directions)
Nullspace N(A)N(A)00={0}= \{\mathbf{0}\} (no information lost)
Left Nullspace N(AT)N(A^T)00={0}= \{\mathbf{0}\} (no constraints on solvability)

An invertible matrix "preserves information" completely. No dimensions are collapsed (nullspace is trivial), and every output is reachable (column space is full). The transformation is a perfect bijection between Rn\mathbb{R}^n and Rn\mathbb{R}^n.

When AA is Singular (Not Invertible)

When a square matrix is singular (not invertible), something has gone wrong:

SubspaceWhat HappensConsequence
Nullspace N(A)N(A)Contains non-zero vectorsMultiple inputs give same output
Left Nullspace N(AT)N(A^T)Contains non-zero vectorsSome bb have no solution
Column Space C(A)C(A)Smaller than Rn\mathbb{R}^nCan't reach all outputs
Row Space C(AT)C(A^T)Smaller than Rn\mathbb{R}^nSome input directions ignored

Example: Consider the matrix A=[1224]A = \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}

  • Column 2 = 2 × Column 1, so the columns are dependent
  • rank(A)=1<2\text{rank}(A) = 1 < 2
  • Nullspace: N(A)=span{[21]}N(A) = \text{span}\left\{\begin{bmatrix} -2 \\ 1 \end{bmatrix}\right\} (a line)
  • Column space: C(A)=span{[12]}C(A) = \text{span}\left\{\begin{bmatrix} 1 \\ 2 \end{bmatrix}\right\} (a line, not all of R2\mathbb{R}^2)

This matrix collapses the line [21]t\begin{bmatrix} -2 \\ 1 \end{bmatrix}t to zero. It's not one-to-one (multiple inputs give the same output), and it's not onto (can't reach most of R2\mathbb{R}^2).

Non-Square Matrices: Structural Constraints

For an n×mn \times m matrix AA (maps RmRn\mathbb{R}^m \to \mathbb{R}^n), the shape alone determines certain limitations:

ShapeRank BoundColumn SpaceNullspace
Tall (n>mn > m)rank(A)m<n\text{rank}(A) \leq m < nC(A)RnC(A) \subsetneq \mathbb{R}^n (never full)Can be {0}\{\mathbf{0}\} if full column rank
Fat (n<mn < m)rank(A)n<m\text{rank}(A) \leq n < mCan equal Rn\mathbb{R}^n if full row rankdimN(A)mn>0\dim N(A) \geq m - n > 0 (always non-trivial)

Tall matrices have more equations than unknowns. The column space lives in Rn\mathbb{R}^n but has dimension at most m<nm < n, so it cannot fill Rn\mathbb{R}^n. Most target vectors b\mathbf{b} have no solution—the system is overdetermined.

Fat matrices have more unknowns than equations. The nullspace has dimension at least mn>0m - n > 0, so non-trivial vectors always get mapped to zero. Multiple inputs produce the same output—the system is underdetermined.

Only square matrices can potentially be both one-to-one (trivial nullspace) and onto (full column space) simultaneously.


Computing the Inverse

There are several methods to compute the inverse of a matrix. Each has its place depending on the context.

Method 1: Gauss-Jordan Elimination

The most general method works for any invertible matrix. The idea: apply the same row operations that reduce AA to II to the identity matrix II; the result is A1A^{-1}.

Procedure: Form the augmented matrix [AI][A \mid I] and row-reduce to [IA1][I \mid A^{-1}].

Example: Find the inverse of A=[2153]A = \begin{bmatrix} 2 & 1 \\ 5 & 3 \end{bmatrix}

[21105301]\left[\begin{array}{cc|cc} 2 & 1 & 1 & 0 \\ 5 & 3 & 0 & 1 \end{array}\right]

Step 1: R112R1R_1 \leftarrow \frac{1}{2}R_1: [1121205301]\left[\begin{array}{cc|cc} 1 & \frac{1}{2} & \frac{1}{2} & 0 \\ 5 & 3 & 0 & 1 \end{array}\right]

Step 2: R2R25R1R_2 \leftarrow R_2 - 5R_1: [112120012521]\left[\begin{array}{cc|cc} 1 & \frac{1}{2} & \frac{1}{2} & 0 \\ 0 & \frac{1}{2} & -\frac{5}{2} & 1 \end{array}\right]

Step 3: R22R2R_2 \leftarrow 2R_2: [1121200152]\left[\begin{array}{cc|cc} 1 & \frac{1}{2} & \frac{1}{2} & 0 \\ 0 & 1 & -5 & 2 \end{array}\right]

Step 4: R1R112R2R_1 \leftarrow R_1 - \frac{1}{2}R_2: [10310152]\left[\begin{array}{cc|cc} 1 & 0 & 3 & -1 \\ 0 & 1 & -5 & 2 \end{array}\right]

Therefore: A1=[3152]A^{-1} = \begin{bmatrix} 3 & -1 \\ -5 & 2 \end{bmatrix}

Verification: AA1=[2153][3152]=[1001]AA^{-1} = \begin{bmatrix} 2 & 1 \\ 5 & 3 \end{bmatrix}\begin{bmatrix} 3 & -1 \\ -5 & 2 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}

Method 2: The 2×2 Formula

For 2×22 \times 2 matrices, there's a closed-form formula:

Formula: 2×2 Inverse

If A=[abcd]A = \begin{bmatrix} a & b \\ c & d \end{bmatrix} and det(A)=adbc0\det(A) = ad - bc \neq 0, then:

A1=1adbc[dbca]A^{-1} = \frac{1}{ad - bc}\begin{bmatrix} d & -b \\ -c & a \end{bmatrix}

Method 3: The Adjugate Formula (General Case)

For any n×nn \times n matrix:

A1=1det(A)adj(A)A^{-1} = \frac{1}{\det(A)} \text{adj}(A)

where adj(A)\text{adj}(A) is the adjugate matrix (transpose of the matrix of cofactors). This formula is theoretically elegant but computationally expensive for large matrices.

Method 4: LU Decomposition

For numerical computation, factoring A=LUA = LU (lower × upper triangular) allows efficient solving of Ax=bAx = b without explicitly computing A1A^{-1}. This is the standard approach in numerical libraries.


Properties of the Inverse

Theorem: Properties of Matrix Inverse

If AA and BB are invertible n×nn \times n matrices, then:

  1. Inverse of inverse: (A1)1=A(A^{-1})^{-1} = A
  2. Inverse of product: (AB)1=B1A1(AB)^{-1} = B^{-1}A^{-1} (note the reversed order!)
  3. Inverse of transpose: (AT)1=(A1)T(A^T)^{-1} = (A^{-1})^T
  4. Inverse of scalar multiple: (cA)1=1cA1(cA)^{-1} = \frac{1}{c}A^{-1} for c0c \neq 0
  5. Determinant of inverse: det(A1)=1det(A)\det(A^{-1}) = \frac{1}{\det(A)}

Why does the product reverse? Think of putting on socks and shoes: if AA = "put on socks" and BB = "put on shoes", then (AB)1(AB)^{-1} must first undo BB (take off shoes), then undo AA (take off socks). Hence (AB)1=B1A1(AB)^{-1} = B^{-1}A^{-1}.


The Determinant Test

The determinant provides a single-number test for invertibility:

The Determinant Rule

A square matrix AA is invertible if and only if det(A)0\det(A) \neq 0.

The determinant measures the signed volume scaling factor of the transformation. If det(A)=0\det(A) = 0, the transformation collapses some dimension—the volume becomes zero. A collapsed dimension can't be recovered, so the matrix isn't invertible.

Determinant as volume scaling

The unit square (left) transforms to a parallelogram with area =det(A)= |\det(A)|. An invertible matrix (middle) scales area but keeps it non-zero. A singular matrix (right) collapses the square to a line—area becomes zero, and the original shape cannot be recovered.

Examples:

  • det[2153]=65=10\det\begin{bmatrix} 2 & 1 \\ 5 & 3 \end{bmatrix} = 6 - 5 = 1 \neq 0 → Invertible
  • det[1224]=44=0\det\begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix} = 4 - 4 = 0 → Singular

Applications in Data Science and Machine Learning

1. Linear Regression (Ordinary Least Squares)

The normal equations for OLS are:

(XTX)β=XTy(X^T X)\beta = X^T y

where XX is the m×nm \times n design matrix (mm samples, nn features), yRmy \in \mathbb{R}^m is the target vector, and βRn\beta \in \mathbb{R}^n are the coefficients we want to find.

The Gram Matrix G=XTXG = X^T X must be invertible to get a unique solution:

β=(XTX)1XTy\beta = (X^T X)^{-1} X^T y

When is XTXX^TX invertible?

  • The columns of XX must be linearly independent
  • There must be no perfect multicollinearity (no feature is a linear combination of others)
  • You need at least as many samples as features (mnm \geq n)

What goes wrong when XTXX^TX is singular?

When XTXX^TX is singular, N(XTX)=N(X){0}N(X^TX) = N(X) \neq \{\mathbf{0}\}, meaning some non-zero vector v\mathbf{v} satisfies Xv=0X\mathbf{v} = \mathbf{0}.

Why N(XTX)=N(X)N(X^TX) = N(X)?
  • N(X)N(XTX)N(X) \subseteq N(X^TX): If Xv=0X\mathbf{v} = \mathbf{0}, then XTXv=XT0=0X^TX\mathbf{v} = X^T\mathbf{0} = \mathbf{0}.
  • N(XTX)N(X)N(X^TX) \subseteq N(X): If XTXv=0X^TX\mathbf{v} = \mathbf{0}, multiply both sides by vT\mathbf{v}^T: vTXTXv=Xv2=0\mathbf{v}^TX^TX\mathbf{v} = \|X\mathbf{v}\|^2 = 0. A vector with zero norm must be 0\mathbf{0}, so Xv=0X\mathbf{v} = \mathbf{0}.
  1. Non-unique coefficients: If β\beta^* is a solution, then β+tv\beta^* + t\mathbf{v} is also a solution for any scalar tt, because X(β+tv)=Xβ+tXv=XβX(\beta^* + t\mathbf{v}) = X\beta^* + tX\mathbf{v} = X\beta^*. The solution space is an affine subspace, not a single point.

  2. Interpretability breaks down: Suppose features x2=2x1x_2 = 2x_1 (perfect multicollinearity). Then β1=5,β2=0\beta_1 = 5, \beta_2 = 0 and β1=1,β2=2\beta_1 = 1, \beta_2 = 2 produce identical predictions. You cannot attribute effect to individual features.

  3. Numerical instability: Even if XTXX^TX is technically invertible but nearly singular (high condition number), floating-point errors get amplified. A condition number of 10810^8 with double precision (16\sim 16 digits) leaves only 8\sim 8 reliable digits in β\beta.

2. Ridge Regression (Regularization)

When XTXX^TX is singular or nearly singular, we add a regularization term:

(XTX+λI)β=XTy(X^T X + \lambda I)\beta = X^T y

Why does this help?

Adding λI\lambda I eliminates the nullspace. Suppose vN(XTX+λI)\mathbf{v} \in N(X^TX + \lambda I), meaning (XTX+λI)v=0(X^TX + \lambda I)\mathbf{v} = \mathbf{0}. Then:

XTXv=λvX^TX\mathbf{v} = -\lambda\mathbf{v}

Taking the inner product with v\mathbf{v}:

vTXTXv=λvTv\mathbf{v}^TX^TX\mathbf{v} = -\lambda\mathbf{v}^T\mathbf{v} Xv2=λv2\|X\mathbf{v}\|^2 = -\lambda\|\mathbf{v}\|^2

The left side is 0\geq 0. The right side is 0\leq 0 when λ>0\lambda > 0 (and strictly negative if v0\mathbf{v} \neq \mathbf{0}). The only solution is v=0\mathbf{v} = \mathbf{0}. Therefore N(XTX+λI)={0}N(X^TX + \lambda I) = \{\mathbf{0}\}, so the matrix is invertible.

Eigenvalue perspective

The "flat direction" v\mathbf{v} is an eigenvector of XTXX^TX with a small eigenvalue λi0\lambda_i \approx 0. Eigenvalues measure curvature: XTXv=λivX^TX\mathbf{v} = \lambda_i\mathbf{v} means moving along v\mathbf{v} changes the loss by only λiv2\lambda_i\|\mathbf{v}\|^2—nearly zero if λi0\lambda_i \approx 0.

If XTXX^TX has eigenvalues λ1,λ2,,λn\lambda_1, \lambda_2, \ldots, \lambda_n, then XTX+λIX^TX + \lambda I has eigenvalues λ1+λ,λ2+λ,,λn+λ\lambda_1 + \lambda, \lambda_2 + \lambda, \ldots, \lambda_n + \lambda. The flat directions (small λi\lambda_i) become steep (now λi+λ\lambda_i + \lambda). If some λi=0\lambda_i = 0 (singular), adding λ>0\lambda > 0 shifts all eigenvalues positive, making the matrix invertible.

Trade-off: Regularization introduces bias (the solution is no longer exactly optimal for the training data) but reduces variance (the solution is more stable—small changes in data don't cause large swings in β\beta).

When XTXX^TX is nearly singular, there exists a direction v\mathbf{v} where XTXv0X^TX\mathbf{v} \approx \mathbf{0}. The solution can "slide" along v\mathbf{v} with almost no change in the loss Xβy2\|X\beta - y\|^2. Small noise in yy can push β\beta far along this flat direction.

Adding λI\lambda I penalizes movement in all directions equally. Now sliding along v\mathbf{v} costs λv2\lambda\|\mathbf{v}\|^2, anchoring the solution near the origin. The flatter the original direction, the more regularization helps.

3. Solving Linear Systems

Whenever we need to solve Ax=bAx = b:

  • Invertible AA: Unique solution x=A1bx = A^{-1}b
  • Singular AA: Either no solution or infinitely many solutions

In practice, we rarely compute A1A^{-1} explicitly. Instead, we use matrix factorizations (LU, QR, Cholesky) that solve Ax=bAx = b more efficiently and stably.


Near-Singularity and Condition Number

In numerical computing, a matrix can be "technically invertible" but still cause problems. This happens when the matrix is ill-conditioned—close to being singular.

Definition: Condition Number

The condition number of a matrix AA is:

κ(A)=AA1\kappa(A) = \|A\| \cdot \|A^{-1}\|

or equivalently (for the 2-norm):

κ(A)=σmaxσmin\kappa(A) = \frac{\sigma_{\max}}{\sigma_{\min}}

where σmax\sigma_{\max} and σmin\sigma_{\min} are the largest and smallest singular values of AA.

Interpretation:

  • κ(A)1\kappa(A) \approx 1: Well-conditioned; small errors in bb cause small errors in xx
  • κ(A)1\kappa(A) \gg 1: Ill-conditioned; small errors in bb can cause large errors in xx
  • κ(A)=\kappa(A) = \infty: Singular matrix

Example: Calculating κ\kappa

For A=[4000.01]A = \begin{bmatrix} 4 & 0 \\ 0 & 0.01 \end{bmatrix} (diagonal matrix):

  • Singular values = absolute values of diagonal entries: σmax=4\sigma_{\max} = 4, σmin=0.01\sigma_{\min} = 0.01
  • κ(A)=40.01=400\kappa(A) = \frac{4}{0.01} = 400

For A=[1111.001]A = \begin{bmatrix} 1 & 1 \\ 1 & 1.001 \end{bmatrix} (nearly singular):

  • Compute ATA=[22.0012.0012.002001]A^TA = \begin{bmatrix} 2 & 2.001 \\ 2.001 & 2.002001 \end{bmatrix}
  • Eigenvalues of ATAA^TA: λ14.002\lambda_1 \approx 4.002, λ20.000001\lambda_2 \approx 0.000001
  • Singular values: σmax=4.0022\sigma_{\max} = \sqrt{4.002} \approx 2, σmin=0.000001=0.001\sigma_{\min} = \sqrt{0.000001} = 0.001
  • κ(A)=20.001=2000\kappa(A) = \frac{2}{0.001} = 2000

The tiny σmin\sigma_{\min} signals a nearly flat direction—small changes in b\mathbf{b} along this direction get amplified by 1/σmin=10001/\sigma_{\min} = 1000.

Rule of thumb: If κ(A)=10k\kappa(A) = 10^k and you're working with dd digits of precision, expect to lose about kk digits of accuracy when solving Ax=bAx = b.

Practical advice:

  • Check the condition number before inverting matrices
  • Prefer factorization methods (LU, QR) over explicit inversion
  • Use regularization for ill-conditioned problems

Guided Problems

Problem 1: When Does Adding a Row Make a Matrix Invertible?

Consider the 2×32 \times 3 matrix:

A=[123456]A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}

  1. Is AA invertible? Explain why or why not using the shape of the matrix.

  2. Suppose we add a third row to create a 3×33 \times 3 matrix BB. Can you choose a third row such that BB is invertible? If yes, give an example. If no, explain why.

  3. Now consider removing a column from AA to get a 2×22 \times 2 matrix CC. Which column(s) can you remove to make CC invertible?

💡 Solution

Part 1: AA is not invertible because it's not square. A 2×32 \times 3 matrix maps R3R2\mathbb{R}^3 \to \mathbb{R}^2. It has a non-trivial nullspace (dimension 32=1\geq 3 - 2 = 1), so multiple inputs map to the same output. No inverse can exist.

Part 2: We need to check if rows 1 and 2 are linearly independent first. Row 2 = Row 1 + [333]\begin{bmatrix} 3 & 3 & 3 \end{bmatrix}, so they are independent.

For BB to be invertible, the third row must not be in the span of the first two rows.

Example: r3=[001]\mathbf{r}_3 = \begin{bmatrix} 0 & 0 & 1 \end{bmatrix} works.

B=[123456001]B = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 0 & 0 & 1 \end{bmatrix}

Check: det(B)=1(5160)2(4160)+3(4050)=58+0=30\det(B) = 1(5 \cdot 1 - 6 \cdot 0) - 2(4 \cdot 1 - 6 \cdot 0) + 3(4 \cdot 0 - 5 \cdot 0) = 5 - 8 + 0 = -3 \neq 0. ✓

Part 3: Removing column 3 gives C=[1245]C = \begin{bmatrix} 1 & 2 \\ 4 & 5 \end{bmatrix}, with det(C)=58=30\det(C) = 5 - 8 = -3 \neq 0. Invertible. ✓

Removing column 2 gives C=[1346]C = \begin{bmatrix} 1 & 3 \\ 4 & 6 \end{bmatrix}, with det(C)=612=60\det(C) = 6 - 12 = -6 \neq 0. Invertible. ✓

Removing column 1 gives C=[2356]C = \begin{bmatrix} 2 & 3 \\ 5 & 6 \end{bmatrix}, with det(C)=1215=30\det(C) = 12 - 15 = -3 \neq 0. Invertible. ✓

All three choices work because no two columns of AA are parallel.


Problem 2: Symmetric Matrix and Diagonal Perturbation

Let SS be a 4×44 \times 4 symmetric matrix defined as follows:

S=[1100110000200003]S = \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 \end{bmatrix}
  1. Find the rank of SS and a basis for its Nullspace, N(S)N(S). Is SS invertible?

  2. Let λ\lambda be a positive scalar (e.g., λ=0.5\lambda = 0.5). We construct a new matrix PP by adding λ\lambda to the diagonal of SS:

    P=S+λIP = S + \lambda I

    Find the rank of this new matrix PP. Is PP invertible?

💡 Solution

Part 1: Analyzing Matrix SS

To find the rank and nullspace, we perform Gaussian elimination to reach Row Echelon Form.

[1100110000200003]R2R1[1100000000200003]\begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 0 & 0 & 2 & 0 \\ 0 & 0 & 0 & 3 \end{bmatrix} \xrightarrow{R_2 - R_1} \begin{bmatrix} \mathbf{1} & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & \mathbf{2} & 0 \\ 0 & 0 & 0 & \mathbf{3} \end{bmatrix}

Rearranging the zero row to the bottom:

U=[1100002000030000]U = \begin{bmatrix} \mathbf{1} & 1 & 0 & 0 \\ 0 & 0 & \mathbf{2} & 0 \\ 0 & 0 & 0 & \mathbf{3} \\ 0 & 0 & 0 & 0 \end{bmatrix}
  • Rank: There are 3 pivots (in columns 1, 3, and 4). Therefore, rank(S)=3\text{rank}(S) = 3.
  • Invertibility: Since the rank (3) is less than the dimension (4), the matrix is singular (not invertible).
  • Nullspace Basis: There is one free variable corresponding to column 2 (x2x_2). The equation from row 1 is: x1+x2=0    x1=x2x_1 + x_2 = 0 \implies x_1 = -x_2. The other rows imply x3=0x_3 = 0 and x4=0x_4 = 0. Setting free variable x2=1x_2 = 1: Bnull={[1100]}\mathcal{B}_{null} = \left\{ \begin{bmatrix} -1 \\ 1 \\ 0 \\ 0 \end{bmatrix} \right\}

Part 2: Analyzing the Perturbed Matrix P=S+λIP = S + \lambda I

We add λ\lambda (where λ>0\lambda > 0) to the diagonal elements of SS.

P=[1+λ10011+λ00002+λ00003+λ]P = \begin{bmatrix} 1+\lambda & 1 & 0 & 0 \\ 1 & 1+\lambda & 0 & 0 \\ 0 & 0 & 2+\lambda & 0 \\ 0 & 0 & 0 & 3+\lambda \end{bmatrix}

To check the rank/invertibility, we can check the determinant. Since the matrix is block diagonal, the determinant is the product of the determinants of the blocks.

Block 1 (Top-Left 2×22\times2):

det([1+λ111+λ])=(1+λ)(1+λ)(1)(1)\det \left( \begin{bmatrix} 1+\lambda & 1 \\ 1 & 1+\lambda \end{bmatrix} \right) = (1+\lambda)(1+\lambda) - (1)(1)=(1+2λ+λ2)1=2λ+λ2= (1 + 2\lambda + \lambda^2) - 1 = 2\lambda + \lambda^2

Block 2 (Bottom-Right Diagonal): The determinants are simply the diagonal entries: (2+λ)(2+\lambda) and (3+λ)(3+\lambda).

Total Determinant:

det(P)=(2λ+λ2)(2+λ)(3+λ)\det(P) = (2\lambda + \lambda^2) \cdot (2+\lambda) \cdot (3+\lambda)

Conclusion: Since λ\lambda is positive (λ>0\lambda > 0), every term in that product is positive. Therefore, det(P)0\det(P) \neq 0.

  • Invertibility: The matrix PP is invertible.
  • Rank: Since it is invertible, it must have full rank. rank(P)=4\text{rank}(P) = 4.

Connection Summary: Even though the original matrix SS lost information (rank 3, non-invertible), adding a small scalar matrix λI\lambda I restored the rank to 4, making the system solvable for a unique solution. This is exactly what Ridge Regression does!


Problem 3: Linear Dependence and Gram Matrix

Let AA be a 4×34 \times 3 matrix defined as:

A=[123134145156]A = \begin{bmatrix} 1 & 2 & 3 \\ 1 & 3 & 4 \\ 1 & 4 & 5 \\ 1 & 5 & 6 \end{bmatrix}
  1. Determine if the columns of AA are linearly independent. If they are dependent, find a non-zero vector x\mathbf{x} such that Ax=0A\mathbf{x} = \mathbf{0}.

  2. Calculate the symmetric matrix G=ATAG = A^T A.

  3. Without performing full Gaussian elimination on GG, determine the rank of GG and whether GG is invertible. Explain your reasoning based on the result from Part 1.

💡 Solution

Part 1: Linear Independence and Nullspace

We examine the columns of AA:

c3=c1+c2c_3 = c_1 + c_2

Since c3\mathbf{c}_3 can be written as a linear combination of c1\mathbf{c}_1 and c2\mathbf{c}_2, the columns are linearly dependent.

To find the vector x\mathbf{x} (which is in the Nullspace N(A)N(A)), we rewrite the dependency equation:

1c1+1c21c3=01\cdot\mathbf{c}_1 + 1\cdot\mathbf{c}_2 - 1\cdot\mathbf{c}_3 = \mathbf{0}

This gives us the coefficients for our vector x\mathbf{x}:

x=[111]\mathbf{x} = \begin{bmatrix} 1 \\ 1 \\ -1 \end{bmatrix}

Part 2: Calculating G=ATAG = A^T A

G=[111123453456][123134145156]G = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 2 & 3 & 4 & 5 \\ 3 & 4 & 5 & 6 \end{bmatrix} \begin{bmatrix} 1 & 2 & 3 \\ 1 & 3 & 4 \\ 1 & 4 & 5 \\ 1 & 5 & 6 \end{bmatrix}G=[41418145468186886]G = \begin{bmatrix} 4 & 14 & 18 \\ 14 & 54 & 68 \\ 18 & 68 & 86 \end{bmatrix}

Part 3: Rank and Invertibility of GG

Key Fact: rank(ATA)=rank(A)\text{rank}(A^T A) = \text{rank}(A) (see Vector Spaces Deep Dive for proof)

  1. Rank of A: Since column 3 is dependent on columns 1 and 2 (and columns 1 and 2 are independent of each other), rank(A)=2\text{rank}(A) = 2.
  2. Rank of G: Therefore, rank(G)=rank(ATA)=rank(A)=2\text{rank}(G) = \text{rank}(A^T A) = \text{rank}(A) = 2.
  3. Invertibility: The matrix GG is a 3×33 \times 3 matrix. For a 3×33 \times 3 matrix to be invertible, it must have full rank (rank = 3).
    • Since rank(G)=2<3\text{rank}(G) = 2 < 3, GG is not invertible (it is singular).

Mathematical Connection to Data Science

This problem illustrates Perfect Multicollinearity.

  1. The Matrix AA (Feature Matrix): Imagine AA represents your data. Column 1 is a bias term, Column 2 is "Feature A", and Column 3 is "Feature B". The math shows that Feature B is exactly Feature A plus the bias. They contain redundant information.
  2. The Vector x\mathbf{x} (Non-uniqueness): The existence of a non-zero vector x\mathbf{x} in the nullspace means there are infinite ways to combine the features to get the same result. The weights are not unique.
  3. The Matrix GG (Hessian/Gram Matrix): In optimization (OLS), we must invert G=ATAG = A^T A. Because the columns were dependent, GG became singular. You literally cannot calculate (ATA)1(A^T A)^{-1}, causing the algorithm to crash or output "NaN".

Problem 4: True or False — Invertibility Concepts

For each statement, determine if it is True or False. Justify your answer.

  1. If AA and BB are both invertible n×nn \times n matrices, then A+BA + B is also invertible.

  2. If A2A^2 is invertible, then AA is invertible.

  3. If AB=IAB = I, then BA=IBA = I.

  4. If AA is invertible and Ax=AyA\mathbf{x} = A\mathbf{y}, then x=y\mathbf{x} = \mathbf{y}.

💡 Solution

1. False.

Counterexample: Let A=IA = I and B=IB = -I. Both are invertible, but A+B=I+(I)=0A + B = I + (-I) = 0, which is not invertible.

2. True.

If A2A^2 is invertible, then det(A2)0\det(A^2) \neq 0. Since det(A2)=(det(A))2\det(A^2) = (\det(A))^2, we have (det(A))20(\det(A))^2 \neq 0, which means det(A)0\det(A) \neq 0. Therefore AA is invertible.

3. True (when AA and BB are square matrices of the same size).

If AB=IAB = I, then BB is a right inverse of AA. For square matrices, a right inverse is also a left inverse.

Proof: From AB=IAB = I, we have det(A)det(B)=1\det(A)\det(B) = 1, so det(A)0\det(A) \neq 0 and det(B)0\det(B) \neq 0. Thus both AA and BB are invertible. Then B=IB=(A1A)B=A1(AB)=A1I=A1B = IB = (A^{-1}A)B = A^{-1}(AB) = A^{-1}I = A^{-1}. So BA=A1A=IBA = A^{-1}A = I.

4. True.

If Ax=AyA\mathbf{x} = A\mathbf{y}, then A(xy)=0A(\mathbf{x} - \mathbf{y}) = \mathbf{0}. Since AA is invertible, N(A)={0}N(A) = \{\mathbf{0}\}, so xy=0\mathbf{x} - \mathbf{y} = \mathbf{0}, meaning x=y\mathbf{x} = \mathbf{y}.

Alternatively: multiply both sides by A1A^{-1}: A1Ax=A1AyA^{-1}A\mathbf{x} = A^{-1}A\mathbf{y}, giving x=y\mathbf{x} = \mathbf{y}.


Summary

Invertibility fundamentals:

  • A square matrix AA is invertible if AA1=A1A=IAA^{-1} = A^{-1}A = I; the inverse is unique when it exists
  • Non-square matrices cannot be invertible: tall matrices aren't onto, wide matrices aren't one-to-one
  • For non-square systems, the pseudo-inverse provides the least-squares best answer

Equivalent conditions (Invertible Matrix Theorem):

  • Full rank (rank(A)=n\text{rank}(A) = n)
  • Linearly independent columns (and rows)
  • Trivial nullspace: N(A)={0}N(A) = \{\mathbf{0}\}
  • Non-zero determinant: det(A)0\det(A) \neq 0
  • Column space spans Rn\mathbb{R}^n: C(A)=RnC(A) = \mathbb{R}^n

Four fundamental subspaces view:

  • Invertible: all four subspaces have "perfect" dimensions (column/row space = Rn\mathbb{R}^n, nullspaces = {0}\{\mathbf{0}\})
  • Singular: some dimension collapses (non-trivial nullspace), some outputs unreachable (column space Rn\subsetneq \mathbb{R}^n)

Computing and testing:

  • Gauss-Jordan elimination: row-reduce [AI][A \mid I] to [IA1][I \mid A^{-1}]
  • The 2×22 \times 2 formula: A1=1adbc[dbca]A^{-1} = \frac{1}{ad-bc}\begin{bmatrix} d & -b \\ -c & a \end{bmatrix}
  • Determinant test: det(A)=0\det(A) = 0 iff the transformation collapses volume to zero

Applications:

  • Linear regression requires (XTX)1(X^TX)^{-1}; singular XTXX^TX means non-unique coefficients
  • Ridge regression adds λI\lambda I to eliminate the nullspace, guaranteeing invertibility
  • Condition number κ=σmax/σmin\kappa = \sigma_{\max}/\sigma_{\min} measures numerical stability; high κ\kappa amplifies errors

References

  1. Strang, Gilbert - Introduction to Linear Algebra (Chapters 2-3)
  2. MIT OpenCourseWare - 18.06SC Linear Algebra
  3. Golub, Gene H. and Van Loan, Charles F. - Matrix Computations (Chapter 2)