Skip to main content

Singular Value Decomposition

The Central Question: What Is the Best Low-Rank Approximation of Any Matrix?

Eigendecomposition requires square matrices. But data matrices are typically rectangular (mm samples by nn features). The SVD works for any matrix, decomposing it as A=UΣVTA = U\Sigma V^T: rotate, stretch along axes, rotate again. The Eckart-Young theorem proves that truncating the smallest singular values gives the optimal low-rank approximation, which is the mathematical foundation of PCA, latent semantic analysis, and data compression.

Topics to Cover

Motivation: Beyond Eigenvalues

  • Eigenvalues require square matrices; SVD works for any m×nm \times n matrix
  • Every matrix has an SVD (existence theorem)
  • SVD reveals the "true geometry" of a linear transformation

The SVD: A=UΣVTA = U\Sigma V^T

  • VV: orthonormal basis for the row space (and nullspace)
  • UU: orthonormal basis for the column space (and left nullspace)
  • Σ\Sigma: diagonal matrix of singular values σ1σ20\sigma_1 \geq \sigma_2 \geq \cdots \geq 0
  • Connection to the four fundamental subspaces (cross-reference to Vector Spaces)

Where Singular Values Come From

  • σi=λi(ATA)\sigma_i = \sqrt{\lambda_i(A^TA)}: singular values are square roots of eigenvalues of ATAA^TA
  • VV = eigenvectors of ATAA^TA, UU = eigenvectors of AATAA^T
  • Derivation from the eigendecomposition of ATAA^TA and AATAA^T

Geometric Interpretation

  • Any linear map = rotate (VTV^T) → stretch (Σ\Sigma) → rotate (UU)
  • Mapping the unit sphere to an ellipsoid: singular values = semi-axis lengths
  • Rank = number of nonzero singular values

Reduced vs Full SVD

  • Full SVD: UU is m×mm \times m, Σ\Sigma is m×nm \times n, VV is n×nn \times n
  • Reduced (compact) SVD: keep only the rr nonzero singular values
  • Truncated SVD: keep only the top k<rk < r singular values

Low-Rank Approximation

  • Eckart-Young theorem: the best rank-kk approximation to AA (in Frobenius or spectral norm) is Ak=i=1kσiuiviTA_k = \sum_{i=1}^k \sigma_i u_i v_i^T
  • Error = σk+1\sigma_{k+1} (spectral norm) or σk+12++σr2\sqrt{\sigma_{k+1}^2 + \cdots + \sigma_r^2} (Frobenius)
  • The "elbow" in the singular value spectrum: choosing kk

Pseudoinverse and Least Squares

  • Moore-Penrose pseudoinverse: A+=VΣ+UTA^+ = V\Sigma^+U^T
  • Σ+\Sigma^+: invert nonzero singular values, transpose
  • Minimum-norm least-squares solution: x+=A+bx^+ = A^+b
  • Connection to the complete solution theory (cross-reference to Vector Spaces)

Condition Number via SVD

  • κ(A)=σ1/σr\kappa(A) = \sigma_1 / \sigma_r: ratio of largest to smallest singular value
  • Large condition number = near-singular, numerically unstable
  • Cross-reference to Matrix Inverse condition number section

Summary

Answering the Central Question: The best rank-kk approximation to any matrix AA is obtained by keeping only the kk largest singular values in the SVD: Ak=i=1kσiuiviTA_k = \sum_{i=1}^k \sigma_i u_i v_i^T. The Eckart-Young theorem guarantees this is optimal in both the Frobenius and spectral norms. The singular values quantify how much "energy" each component carries, the condition number κ=σ1/σr\kappa = \sigma_1/\sigma_r measures numerical sensitivity, and the pseudoinverse A+=VΣ+UTA^+ = V\Sigma^+ U^T extends the inverse to non-square and singular matrices.

Applications in Data Science and Machine Learning

  • PCA: SVD of the centered data matrix directly gives principal components (no need to form XTXX^TX)
  • Latent Semantic Analysis (LSA): truncated SVD of term-document matrix discovers topics
  • Recommender systems: low-rank matrix factorization via SVD (Netflix problem)
  • Image compression: each image channel as a matrix, truncated SVD keeps dominant structure
  • Numerical rank and noise: singular values separate signal from noise; effective rank = number of singular values above a threshold
  • Data whitening: Σ1/2UTX\Sigma^{-1/2}U^T X decorrelates and normalizes features

Guided Problems

References

  • Strang, Introduction to Linear Algebra, Chapter 6 (6.3–6.4)
  • Strang, Linear Algebra and Its Applications, Chapter 6