Singular Value Decomposition
The Central Question: What Is the Best Low-Rank Approximation of Any Matrix?
Eigendecomposition requires square matrices. But data matrices are typically rectangular ( samples by features). The SVD works for any matrix, decomposing it as : rotate, stretch along axes, rotate again. The Eckart-Young theorem proves that truncating the smallest singular values gives the optimal low-rank approximation, which is the mathematical foundation of PCA, latent semantic analysis, and data compression.
Topics to Cover
Motivation: Beyond Eigenvalues
- Eigenvalues require square matrices; SVD works for any matrix
- Every matrix has an SVD (existence theorem)
- SVD reveals the "true geometry" of a linear transformation
The SVD:
- : orthonormal basis for the row space (and nullspace)
- : orthonormal basis for the column space (and left nullspace)
- : diagonal matrix of singular values
- Connection to the four fundamental subspaces (cross-reference to Vector Spaces)
Where Singular Values Come From
- : singular values are square roots of eigenvalues of
- = eigenvectors of , = eigenvectors of
- Derivation from the eigendecomposition of and
Geometric Interpretation
- Any linear map = rotate () → stretch () → rotate ()
- Mapping the unit sphere to an ellipsoid: singular values = semi-axis lengths
- Rank = number of nonzero singular values
Reduced vs Full SVD
- Full SVD: is , is , is
- Reduced (compact) SVD: keep only the nonzero singular values
- Truncated SVD: keep only the top singular values
Low-Rank Approximation
- Eckart-Young theorem: the best rank- approximation to (in Frobenius or spectral norm) is
- Error = (spectral norm) or (Frobenius)
- The "elbow" in the singular value spectrum: choosing
Pseudoinverse and Least Squares
- Moore-Penrose pseudoinverse:
- : invert nonzero singular values, transpose
- Minimum-norm least-squares solution:
- Connection to the complete solution theory (cross-reference to Vector Spaces)
Condition Number via SVD
- : ratio of largest to smallest singular value
- Large condition number = near-singular, numerically unstable
- Cross-reference to Matrix Inverse condition number section
Summary
Answering the Central Question: The best rank- approximation to any matrix is obtained by keeping only the largest singular values in the SVD: . The Eckart-Young theorem guarantees this is optimal in both the Frobenius and spectral norms. The singular values quantify how much "energy" each component carries, the condition number measures numerical sensitivity, and the pseudoinverse extends the inverse to non-square and singular matrices.
Applications in Data Science and Machine Learning
- PCA: SVD of the centered data matrix directly gives principal components (no need to form )
- Latent Semantic Analysis (LSA): truncated SVD of term-document matrix discovers topics
- Recommender systems: low-rank matrix factorization via SVD (Netflix problem)
- Image compression: each image channel as a matrix, truncated SVD keeps dominant structure
- Numerical rank and noise: singular values separate signal from noise; effective rank = number of singular values above a threshold
- Data whitening: decorrelates and normalizes features
Guided Problems
References
- Strang, Introduction to Linear Algebra, Chapter 6 (6.3–6.4)
- Strang, Linear Algebra and Its Applications, Chapter 6