Skip to main content

Determinants


The Central Question: What Single Number Captures a Matrix's Essence?

We want a single number that tells us whether a matrix is invertible, how it scales volume, and what its eigenvalues multiply to. That number is the determinant.

Consider these scenarios:

  1. A linear system Ax=bAx = b has a unique solution if and only if det(A)0\det(A) \neq 0.
  2. A neural network layer's weight matrix expands or compresses the space of activations by a factor of det(W)|\det(W)|.
  3. A normalizing flow requires the Jacobian determinant of the mapping to correctly transform probability densities.

The determinant encodes invertibility, volume scaling, and orientation into one number. It connects algebra (is the matrix singular?) to geometry (how does the transformation change space?) to probability (how do densities transform?).


Definition and the Three Properties

Definition: The Determinant

The determinant is the unique function det:Rn×nR\det: \mathbb{R}^{n \times n} \to \mathbb{R} satisfying:

  1. det(I)=1\det(I) = 1
  2. Exchanging two rows reverses the sign of det\det
  3. The determinant is linear in each row separately:
    • det[tab]=tdet[ab]\det\begin{bmatrix} ta \\ b \end{bmatrix} = t \cdot \det\begin{bmatrix} a \\ b \end{bmatrix}
    • det[a+ab]=det[ab]+det[ab]\det\begin{bmatrix} a + a' \\ b \end{bmatrix} = \det\begin{bmatrix} a \\ b \end{bmatrix} + \det\begin{bmatrix} a' \\ b \end{bmatrix}

where a,a,ba, a', b represent rows.

Property 3 is not saying det(A+B)=det(A)+det(B)\det(A + B) = \det(A) + \det(B). That is false. The linearity is in one row at a time, holding all other rows fixed. This is called multilinearity.

From these three properties alone, every other determinant fact follows.

The 2×22 \times 2 Formula

For a general 2×22 \times 2 matrix:

det[abcd]=adbc\det\begin{bmatrix} a & b \\ c & d \end{bmatrix} = ad - bc

Derivation from the three properties

Write each row as a combination of standard basis rows:

det[abcd]=det[a0cd]+det[0bcd]\det\begin{bmatrix} a & b \\ c & d \end{bmatrix} = \det\begin{bmatrix} a & 0 \\ c & d \end{bmatrix} + \det\begin{bmatrix} 0 & b \\ c & d \end{bmatrix}

Expand the second row of each term the same way:

=det[a0c0]+det[a00d]+det[0bc0]+det[0b0d]= \det\begin{bmatrix} a & 0 \\ c & 0 \end{bmatrix} + \det\begin{bmatrix} a & 0 \\ 0 & d \end{bmatrix} + \det\begin{bmatrix} 0 & b \\ c & 0 \end{bmatrix} + \det\begin{bmatrix} 0 & b \\ 0 & d \end{bmatrix}

The first and last terms have proportional rows (both entries in the same column), so their determinants are zero. The remaining two terms:

=addet[1001]+bcdet[0110]=ad1+bc(1)=adbc= ad \cdot \det\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} + bc \cdot \det\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} = ad \cdot 1 + bc \cdot (-1) = ad - bc

The sign flip in the second term comes from Property 2: swapping the two rows of II reverses the sign.

The 3×33 \times 3 Formula and Cofactor Expansion

For 3×33 \times 3, the cofactor expansion along the first row gives:

det[a11a12a13a21a22a23a31a32a33]=a11det[a22a23a32a33]a12det[a21a23a31a33]+a13det[a21a22a31a32]\det\begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{bmatrix} = a_{11}\det\begin{bmatrix} a_{22} & a_{23} \\ a_{32} & a_{33} \end{bmatrix} - a_{12}\det\begin{bmatrix} a_{21} & a_{23} \\ a_{31} & a_{33} \end{bmatrix} + a_{13}\det\begin{bmatrix} a_{21} & a_{22} \\ a_{31} & a_{32} \end{bmatrix}

The signs alternate: +,,+,,+, -, +, -, \ldots following the checkerboard pattern (1)i+j(-1)^{i+j}.

Example. Compute det[213045102]\det\begin{bmatrix} 2 & 1 & 3 \\ 0 & 4 & 5 \\ 1 & 0 & 2 \end{bmatrix}:

=2det[4502]1det[0512]+3det[0410]= 2\det\begin{bmatrix} 4 & 5 \\ 0 & 2 \end{bmatrix} - 1\det\begin{bmatrix} 0 & 5 \\ 1 & 2 \end{bmatrix} + 3\det\begin{bmatrix} 0 & 4 \\ 1 & 0 \end{bmatrix}

=2(80)1(05)+3(04)=16+512=9= 2(8 - 0) - 1(0 - 5) + 3(0 - 4) = 16 + 5 - 12 = 9

Computing via Elimination

Cofactor expansion costs O(n!)O(n!) operations. For practical computation, elimination is far better.

Row-reduce AA to upper triangular form UU, tracking row swaps. Since elimination subtracts multiples of one row from another (which does not change the determinant by Property 3), and each row swap flips the sign (Property 2):

det(A)=(1)si=1nuii\det(A) = (-1)^s \cdot \prod_{i=1}^n u_{ii}

where ss is the number of row swaps and uiiu_{ii} are the pivots (diagonal entries of UU).

Example. For A=[213045102]A = \begin{bmatrix} 2 & 1 & 3 \\ 0 & 4 & 5 \\ 1 & 0 & 2 \end{bmatrix}, swap R1R3R_1 \leftrightarrow R_3 then eliminate:

R1R3[102045213]R32R1[102045011]R314R2[1020450094]\xrightarrow{R_1 \leftrightarrow R_3} \begin{bmatrix} 1 & 0 & 2 \\ 0 & 4 & 5 \\ 2 & 1 & 3 \end{bmatrix} \xrightarrow{R_3 - 2R_1} \begin{bmatrix} 1 & 0 & 2 \\ 0 & 4 & 5 \\ 0 & 1 & -1 \end{bmatrix} \xrightarrow{R_3 - \frac{1}{4}R_2} \begin{bmatrix} 1 & 0 & 2 \\ 0 & 4 & 5 \\ 0 & 0 & -\frac{9}{4} \end{bmatrix}

One row swap (s=1s = 1), pivots 1,4,941, 4, -\frac{9}{4}:

det(A)=(1)114(94)=(1)(9)=9\det(A) = (-1)^1 \cdot 1 \cdot 4 \cdot \left(-\frac{9}{4}\right) = (-1)(- 9) = 9 \checkmark

This costs O(23n3)O(\frac{2}{3}n^3), the same as LU factorization (see Matrix Operations: LU Decomposition).

Practical Insight

In practice, nobody computes determinants by cofactor expansion for n>3n > 3. Use LU factorization: det(A)=(1)s(pivots)\det(A) = (-1)^s \prod(\text{pivots}).


Properties of the Determinant

The three defining properties generate a rich collection of consequences.

Theorem: Properties of Determinants

For n×nn \times n matrices AA and BB, and scalar cc:

PropertyFormula
Multiplicativedet(AB)=det(A)det(B)\det(AB) = \det(A)\det(B)
Transposedet(AT)=det(A)\det(A^T) = \det(A)
Inversedet(A1)=1/det(A)\det(A^{-1}) = 1/\det(A)
Scalar multipledet(cA)=cndet(A)\det(cA) = c^n \det(A)
Triangulardet=product of diagonal entries\det = \text{product of diagonal entries}
SingularAA is singular \Leftrightarrow det(A)=0\det(A) = 0

The multiplicative property is the most important and the most surprising. The determinant is not additive: det(A+B)det(A)+det(B)\det(A + B) \neq \det(A) + \det(B). But it turns products into products.

Example. Let A=[1203]A = \begin{bmatrix} 1 & 2 \\ 0 & 3 \end{bmatrix} and B=[2014]B = \begin{bmatrix} 2 & 0 \\ 1 & 4 \end{bmatrix}.

det(A)=3,det(B)=8,AB=[48312],det(AB)=4824=24=38\det(A) = 3, \quad \det(B) = 8, \quad AB = \begin{bmatrix} 4 & 8 \\ 3 & 12 \end{bmatrix}, \quad \det(AB) = 48 - 24 = 24 = 3 \cdot 8 \checkmark

Why det(AB)=det(A)det(B)\det(AB) = \det(A)\det(B)

If AA is singular, then ABAB is also singular (its columns are dependent), so both sides are zero.

If AA is invertible, define f(B)=det(AB)/det(A)f(B) = \det(AB)/\det(A). Check the three properties:

  • f(I)=det(A)/det(A)=1f(I) = \det(A)/\det(A) = 1
  • Row swap in BB causes a row swap in ABAB (left-multiplying by AA does not mix up the "row swap" structure once we fix AA), so the sign flips ✓
  • Linearity in each row of BB carries through to ABAB

Since the determinant is the unique function satisfying the three properties, f(B)=det(B)f(B) = \det(B). Multiplying both sides by det(A)\det(A) gives det(AB)=det(A)det(B)\det(AB) = \det(A)\det(B).

The scalar multiple rule det(cA)=cndet(A)\det(cA) = c^n\det(A) is a common trap. Multiplying every row by cc applies Property 3 once per row, so the factor is cnc^n, not cc.

Example. det(2I3)=23det(I3)=8\det(2I_3) = 2^3 \det(I_3) = 8, not 2.

The singularity test follows from elimination: if AA is singular, at least one pivot is zero, making det(A)=0\det(A) = 0.


Geometric Interpretation

Let two column vectors a1=[30]a_1 = \begin{bmatrix} 3 \\ 0 \end{bmatrix}, a2=[12]a_2 = \begin{bmatrix} 1 \\ 2 \end{bmatrix}, so:

A=[3102],det(A)=6A = \begin{bmatrix} 3 & 1 \\ 0 & 2 \end{bmatrix}, \det(A) = 6.

Geometrically, the columns of AA span a parallelogram in R2\mathbb{R}^2. The base is a1=3\|a_1\| = 3 and the height (perpendicular distance from a2a_2 to a1a_1) is 2. The area is 3×2=6=det(A)3 \times 2 = 6 = |\det(A)|.

This generalizes to any dimension:

Theorem: Determinant as Volume

For an n×nn \times n matrix AA with columns a1,,ana_1, \ldots, a_n:

det(A)=volume of the parallelepiped spanned by a1,,an|\det(A)| = \text{volume of the parallelepiped spanned by } a_1, \ldots, a_n

Three consequences follow immediately:

1. Zero determinant means collapse.

If the columns are linearly dependent, the parallelepiped collapses to a lower dimension and has zero volume. This is why det(A)=0\det(A) = 0 characterizes singular matrices.

2. The sign encodes orientation.

In R2\mathbb{R}^2, det(A)>0\det(A) > 0 means the columns a1,a2a_1, a_2 form a counterclockwise (right-handed) pair. det(A)<0\det(A) < 0 means clockwise (left-handed). A row swap reverses orientation, consistent with Property 2.

3. Orthogonal matrices preserve volume.

If QQ is orthogonal, det(Q)=1|\det(Q)| = 1. Its columns are orthonormal, so they span a unit cube with volume 1.

The Jacobian Determinant

When a differentiable map f:RnRnf: \mathbb{R}^n \to \mathbb{R}^n transforms a small region around point xx, the local volume scaling factor is det(Jf(x))|\det(J_f(x))|, where JfJ_f is the Jacobian matrix of partial derivatives:

(Jf)ij=fixj(J_f)_{ij} = \frac{\partial f_i}{\partial x_j}

For a linear transformation f(x)=Axf(x) = Ax, the Jacobian is AA itself, so the scaling factor is det(A)|\det(A)| everywhere. For nonlinear maps, the Jacobian determinant varies from point to point.

Example. The polar-to-Cartesian transformation f(r,θ)=(rcosθ,rsinθ)f(r, \theta) = (r\cos\theta, r\sin\theta) has Jacobian:

Jf=[cosθrsinθsinθrcosθ],det(Jf)=rcos2θ+rsin2θ=rJ_f = \begin{bmatrix} \cos\theta & -r\sin\theta \\ \sin\theta & r\cos\theta \end{bmatrix}, \quad \det(J_f) = r\cos^2\theta + r\sin^2\theta = r

The familiar dA=rdrdθdA = r\,dr\,d\theta in polar integration comes directly from det(Jf)=r|\det(J_f)| = r.


Cofactors and the Adjugate

The cofactor expansion from the 3×33 \times 3 case generalizes to any size.

Definition: Cofactor

The (i,j)(i,j) cofactor of AA is:

Cij=(1)i+jMijC_{ij} = (-1)^{i+j} M_{ij}

where MijM_{ij} is the (i,j)(i,j) minor: the determinant of the (n1)×(n1)(n-1) \times (n-1) matrix obtained by deleting row ii and column jj.

The determinant expands along any row ii or any column jj:

det(A)=j=1naijCij(expansion along row i)\det(A) = \sum_{j=1}^n a_{ij} C_{ij} \quad \text{(expansion along row } i\text{)}

det(A)=i=1naijCij(expansion along column j)\det(A) = \sum_{i=1}^n a_{ij} C_{ij} \quad \text{(expansion along column } j\text{)}

Practical Insight

Expand along the row or column with the most zeros. Each zero entry eliminates an entire minor computation.

The Adjugate and the Cofactor Inverse Formula

The adjugate (or classical adjoint) of AA is the transpose of the cofactor matrix:

adj(A)=[Cij]T\text{adj}(A) = [C_{ij}]^T

It provides an explicit formula for the inverse:

Theorem: Cofactor Formula for the Inverse

If det(A)0\det(A) \neq 0:

A1=1det(A)adj(A)A^{-1} = \frac{1}{\det(A)} \text{adj}(A)

Example. For A=[2153]A = \begin{bmatrix} 2 & 1 \\ 5 & 3 \end{bmatrix}, det(A)=1\det(A) = 1. The cofactors are:

C11=3,C12=5,C21=1,C22=2C_{11} = 3, \quad C_{12} = -5, \quad C_{21} = -1, \quad C_{22} = 2

A1=11[3152]=[3152]A^{-1} = \frac{1}{1}\begin{bmatrix} 3 & -1 \\ -5 & 2 \end{bmatrix} = \begin{bmatrix} 3 & -1 \\ -5 & 2 \end{bmatrix}

For 2×22 \times 2 matrices this gives the well-known formula: swap the diagonal, negate the off-diagonal, divide by det\det. For larger matrices, the cofactor formula costs O(nn!)O(n \cdot n!), making it useless for computation. Its value is theoretical: it proves the inverse exists when det(A)0\det(A) \neq 0 and provides explicit formulas.

See Matrix Inverse for practical methods of computing A1A^{-1}.

Cramer's Rule

Cramer's rule gives each component of x=A1bx = A^{-1}b as a ratio of determinants:

xj=det(Bj)det(A)x_j = \frac{\det(B_j)}{\det(A)}

where BjB_j is AA with column jj replaced by bb.

Example. Solve [2153]x=[12]\begin{bmatrix} 2 & 1 \\ 5 & 3 \end{bmatrix} x = \begin{bmatrix} 1 \\ 2 \end{bmatrix}:

x1=det[1123]det[2153]=3265=1,x2=det[2152]1=451=1x_1 = \frac{\det\begin{bmatrix} 1 & 1 \\ 2 & 3 \end{bmatrix}}{\det\begin{bmatrix} 2 & 1 \\ 5 & 3 \end{bmatrix}} = \frac{3 - 2}{6 - 5} = 1, \quad x_2 = \frac{\det\begin{bmatrix} 2 & 1 \\ 5 & 2 \end{bmatrix}}{1} = \frac{4 - 5}{1} = -1

Cramer's rule requires computing n+1n+1 determinants, each costing O(n3)O(n^3). Total cost: O(n4)O(n^4), which is far worse than the O(n3)O(n^3) of Gaussian elimination. Like the cofactor inverse formula, Cramer's rule is a theoretical tool, not a computational one.


The Characteristic Polynomial

The determinant connects directly to eigenvalues through the characteristic polynomial.

Definition: Characteristic Polynomial

The characteristic polynomial of an n×nn \times n matrix AA is:

p(λ)=det(AλI)p(\lambda) = \det(A - \lambda I)

The eigenvalues of AA are the roots of p(λ)=0p(\lambda) = 0.

For the 2×22 \times 2 case:

det[aλbcdλ]=λ2(a+d)λ+(adbc)=λ2tr(A)λ+det(A)\det\begin{bmatrix} a - \lambda & b \\ c & d - \lambda \end{bmatrix} = \lambda^2 - (a+d)\lambda + (ad - bc) = \lambda^2 - \text{tr}(A)\lambda + \det(A)

This gives two elegant relationships:

Theorem: Determinant and Trace from Eigenvalues

For an n×nn \times n matrix AA with eigenvalues λ1,,λn\lambda_1, \ldots, \lambda_n:

det(A)=i=1nλitr(A)=i=1nλi\det(A) = \prod_{i=1}^n \lambda_i \qquad \text{tr}(A) = \sum_{i=1}^n \lambda_i

The determinant is the product of all eigenvalues. The trace is their sum. The determinant equals p(0)p(0) (the constant term of the characteristic polynomial), and the trace relates to the coefficient of λn1\lambda^{n-1} (with sign (1)n1(-1)^{n-1}).

Example. The matrix A=[4213]A = \begin{bmatrix} 4 & 2 \\ 1 & 3 \end{bmatrix} has tr(A)=7\text{tr}(A) = 7 and det(A)=10\det(A) = 10.

Characteristic polynomial: λ27λ+10=(λ5)(λ2)\lambda^2 - 7\lambda + 10 = (\lambda - 5)(\lambda - 2).

Eigenvalues: λ1=5\lambda_1 = 5, λ2=2\lambda_2 = 2. Check: 52=10=det(A)5 \cdot 2 = 10 = \det(A) and 5+2=7=tr(A)5 + 2 = 7 = \text{tr}(A).

This means that det(A)=0\det(A) = 0 if and only if at least one eigenvalue is zero, which is equivalent to AA being singular. The chain of equivalences grows:

A is singular    det(A)=0    some λi=0    some pivot is zero    columns are dependentA \text{ is singular} \iff \det(A) = 0 \iff \text{some } \lambda_i = 0 \iff \text{some pivot is zero} \iff \text{columns are dependent}

For deeper treatment of eigenvalues, see Eigenvalues and Eigenvectors.


Summary

  • The determinant is uniquely defined by three properties: det(I)=1\det(I) = 1, row swaps flip the sign, and the determinant is linear in each row separately
  • For 2×22 \times 2: det[abcd]=adbc\det\begin{bmatrix} a & b \\ c & d \end{bmatrix} = ad - bc. For larger matrices, use elimination: det(A)=(1)s(pivots)\det(A) = (-1)^s \prod(\text{pivots})
  • The determinant is multiplicative (det(AB)=det(A)det(B)\det(AB) = \det(A)\det(B)) but not additive
  • det(A)|\det(A)| equals the volume of the parallelepiped spanned by the columns. det(A)=0\det(A) = 0 means the columns are dependent and the volume collapses
  • The sign of the determinant encodes orientation (preserved or reversed)
  • The Jacobian determinant det(Jf)|\det(J_f)| measures local volume scaling of nonlinear transformations
  • The cofactor inverse A1=1det(A)adj(A)A^{-1} = \frac{1}{\det(A)}\text{adj}(A) and Cramer's rule are elegant but computationally impractical
  • The characteristic polynomial det(AλI)=0\det(A - \lambda I) = 0 defines eigenvalues. The determinant equals the product of all eigenvalues, and the trace equals their sum

Answering the Central Question: The determinant is the single number that captures invertibility (det(A)0\det(A) \neq 0 iff invertible), volume scaling (det(A)|\det(A)| is the factor by which the transformation stretches space), and eigenvalue information (det(A)=λi\det(A) = \prod \lambda_i). It bridges algebra, geometry, and probability in one formula.


Applications in Data Science and Machine Learning

The determinant appears throughout machine learning whenever probability densities are transformed, covariance matrices are evaluated, or volume changes matter.

Multivariate Gaussian Log-Likelihood

The density of a multivariate Gaussian N(μ,Σ)\mathcal{N}(\mu, \Sigma) is:

p(x)=1(2π)n/2det(Σ)1/2exp(12(xμ)TΣ1(xμ))p(x) = \frac{1}{(2\pi)^{n/2}\det(\Sigma)^{1/2}} \exp\left(-\frac{1}{2}(x - \mu)^T\Sigma^{-1}(x - \mu)\right)

The log-likelihood involves logdet(Σ)\log|\det(\Sigma)|. Computing this naively is expensive (O(n3)O(n^3) for the determinant) and numerically unstable (the determinant can overflow or underflow for large nn).

The standard approach uses Cholesky decomposition. Since Σ\Sigma is symmetric positive definite, factor Σ=LLT\Sigma = LL^T. Then:

logdet(Σ)=logdet(LLT)=log(det(L))2=2logdet(L)=2i=1nlogLii\log\det(\Sigma) = \log\det(LL^T) = \log(\det(L))^2 = 2\log\det(L) = 2\sum_{i=1}^n \log L_{ii}

Since LL is triangular, its determinant is just the product of diagonal entries. Working in log-space avoids overflow entirely.

Normalizing Flows

A normalizing flow transforms a simple base distribution pz(z)p_z(z) (e.g., standard Gaussian) into a complex distribution px(x)p_x(x) through an invertible mapping x=f(z)x = f(z).

The change-of-variables formula requires the Jacobian determinant:

px(x)=pz(f1(x))det(f1x)p_x(x) = p_z(f^{-1}(x)) \cdot \left|\det\left(\frac{\partial f^{-1}}{\partial x}\right)\right|

Without the det(J)|\det(J)| factor, the transformed density would not integrate to 1. The Jacobian determinant corrects for the volume change: where the map stretches space, density decreases; where it compresses, density increases.

Computing det(J)\det(J) for a general n×nn \times n Jacobian costs O(n3)O(n^3), which is prohibitive for high-dimensional data. Modern flow architectures (RealNVP, MAF, Glow) design transformations whose Jacobians are triangular, reducing the cost to O(n)O(n) since det(L)=Lii\det(L) = \prod L_{ii}.

Bayesian Model Comparison

In Bayesian inference, the marginal likelihood (model evidence) involves integrating over parameters:

p(DM)=p(Dθ)p(θ)dθp(D \mid \mathcal{M}) = \int p(D \mid \theta) p(\theta) \, d\theta

The Laplace approximation to this integral introduces det(H)\det(H) where HH is the Hessian of the negative log-posterior at the mode:

logp(DM)logp(Dθ^)+logp(θ^)12logdet(H)+n2log(2π)\log p(D \mid \mathcal{M}) \approx \log p(D \mid \hat{\theta}) + \log p(\hat{\theta}) - \frac{1}{2}\log\det(H) + \frac{n}{2}\log(2\pi)

The logdet(H)\log\det(H) term penalizes model complexity: a model with a sharply peaked posterior (large det(H)\det(H), tightly constrained parameters) is penalized less than one with a flat posterior.

Fisher Information and Experimental Design

The Fisher information matrix I(θ)\mathcal{I}(\theta) measures how much information the data carries about parameters θ\theta. Its determinant, known as the D-optimality criterion, quantifies the volume of the confidence ellipsoid for parameter estimation:

Volume1det(I(θ))\text{Volume} \propto \frac{1}{\sqrt{\det(\mathcal{I}(\theta))}}

Larger det(I)\det(\mathcal{I}) means a smaller confidence region, so each parameter is better determined. In optimal experimental design, one chooses experiments to maximize det(I)\det(\mathcal{I}), minimizing the volume of uncertainty in parameter space.


Guided Problems

Problem 1: Properties in Action

Let AA be a 4×44 \times 4 matrix with det(A)=6\det(A) = 6.

  1. What is det(2A)\det(2A)?
  2. What is det(A1)\det(A^{-1})?
  3. What is det(A3)\det(A^3)?
  4. If BB is obtained by swapping two rows of AA, what is det(B)\det(B)?
💡 Solution
  1. det(2A)=24det(A)=166=96\det(2A) = 2^4 \det(A) = 16 \cdot 6 = 96. (Scalar multiple rule with n=4n = 4.)

  2. det(A1)=1/det(A)=1/6\det(A^{-1}) = 1/\det(A) = 1/6.

  3. det(A3)=det(AAA)=(detA)3=216\det(A^3) = \det(A \cdot A \cdot A) = (\det A)^3 = 216. (Multiplicative property applied twice.)

  4. det(B)=det(A)=6\det(B) = -\det(A) = -6. (Row exchange reverses sign.)


Problem 2: Determinant and Eigenvalues

The matrix A=[5222]A = \begin{bmatrix} 5 & 2 \\ 2 & 2 \end{bmatrix} has eigenvalues λ1=6\lambda_1 = 6 and λ2=1\lambda_2 = 1.

  1. Verify that det(A)=λ1λ2\det(A) = \lambda_1 \lambda_2 and tr(A)=λ1+λ2\text{tr}(A) = \lambda_1 + \lambda_2.
  2. Without computing the eigenvalues directly, determine det(A2)\det(A^2) and the eigenvalues of A2A^2.
  3. If one eigenvalue of BB is zero and BB is 3×33 \times 3, what can you say about det(B)\det(B)?
💡 Solution
  1. det(A)=5222=6=61\det(A) = 5 \cdot 2 - 2 \cdot 2 = 6 = 6 \cdot 1 ✓. tr(A)=5+2=7=6+1\text{tr}(A) = 5 + 2 = 7 = 6 + 1 ✓.

  2. det(A2)=(detA)2=36\det(A^2) = (\det A)^2 = 36. The eigenvalues of A2A^2 are λ12=36\lambda_1^2 = 36 and λ22=1\lambda_2^2 = 1 (since if Ax=λxAx = \lambda x then A2x=λ2xA^2 x = \lambda^2 x). Check: 361=3636 \cdot 1 = 36 ✓.

  3. det(B)=λ1λ2λ3=0\det(B) = \lambda_1 \lambda_2 \lambda_3 = 0 (since one factor is zero). The matrix BB is singular, regardless of the other two eigenvalues.


Problem 3: Volume and Linear Dependence

Consider the matrix A=[123456789]A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}.

  1. Compute det(A)\det(A) using elimination or cofactor expansion.
  2. What does the result tell you geometrically about the three column vectors?
  3. Verify by showing an explicit linear dependence among the columns.
💡 Solution
  1. Cofactor expansion along row 1: det(A)=1(4548)2(3642)+3(3235)=1(3)2(6)+3(3)=3+129=0\det(A) = 1(45 - 48) - 2(36 - 42) + 3(32 - 35) = 1(-3) - 2(-6) + 3(-3) = -3 + 12 - 9 = 0

  2. det(A)=0|\det(A)| = 0 means the parallelepiped spanned by the three columns has zero volume. The three columns lie in a plane (a 2-dimensional subspace of R3\mathbb{R}^3), not spanning all of R3\mathbb{R}^3. The matrix is singular.

  3. Column 3 = 22 \cdot Column 2 - Column 1: [369]=2[258][147]=[369]\begin{bmatrix}3\\6\\9\end{bmatrix} = 2\begin{bmatrix}2\\5\\8\end{bmatrix} - \begin{bmatrix}1\\4\\7\end{bmatrix} = \begin{bmatrix}3\\6\\9\end{bmatrix} \checkmark


Problem 4: The Jacobian in Practice

Let f:R2R2f: \mathbb{R}^2 \to \mathbb{R}^2 be defined by f(u,v)=(u2v2,  2uv)f(u, v) = (u^2 - v^2, \; 2uv).

  1. Compute the Jacobian matrix JfJ_f.
  2. Compute det(Jf)\det(J_f).
  3. At the point (u,v)=(1,0)(u, v) = (1, 0), how does ff scale areas locally?
  4. At the point (u,v)=(3,4)(u, v) = (3, 4), how does ff scale areas locally?
💡 Solution
  1. Jf=[f1/uf1/vf2/uf2/v]=[2u2v2v2u]J_f = \begin{bmatrix} \partial f_1/\partial u & \partial f_1/\partial v \\ \partial f_2/\partial u & \partial f_2/\partial v \end{bmatrix} = \begin{bmatrix} 2u & -2v \\ 2v & 2u \end{bmatrix}

  2. det(Jf)=(2u)(2u)(2v)(2v)=4u2+4v2=4(u2+v2)\det(J_f) = (2u)(2u) - (-2v)(2v) = 4u^2 + 4v^2 = 4(u^2 + v^2)

  3. At (1,0)(1, 0): det(Jf)=4(1+0)=4|\det(J_f)| = 4(1 + 0) = 4. The map stretches areas by a factor of 4.

  4. At (3,4)(3, 4): det(Jf)=4(9+16)=100|\det(J_f)| = 4(9 + 16) = 100. The map stretches areas by a factor of 100.

Note: this function is f(z)=z2f(z) = z^2 in complex notation where z=u+ivz = u + iv. The area scaling 4z2=f(z)24|z|^2 = |f'(z)|^2 is the squared modulus of the complex derivative f(z)=2zf'(z) = 2z.


References

  1. Strang, Gilbert - Introduction to Linear Algebra, 5th ed., Chapter 5
  2. MIT OpenCourseWare - 18.06 Linear Algebra - Lecture 18: Properties of Determinants
  3. MIT OpenCourseWare - 18.06 Linear Algebra - Lecture 19: Determinant Formulas and Cofactors
  4. Stanford CS229 - Linear Algebra Review (Kolter), Section 3.9
  5. Deisenroth, Faisal, and Ong - Mathematics for Machine Learning, Chapter 4.1
  6. Stanford CS236 - Normalizing Flow Models