Least Squares and QR Decomposition
The Central Question: What Is the Best Approximate Solution When Has No Exact Answer?
When there are more equations than unknowns, typically has no solution. The least squares approach finds the that minimizes , which is the projection of onto the column space of . The normal equations, QR decomposition, and Gram-Schmidt process provide three ways to compute this, with different tradeoffs between speed and numerical stability.
Topics to Cover
The Least Squares Problem
- When has no solution (): minimize
- Geometric view: find the point in closest to (i.e., project)
- Normal equations:
- The solution when has independent columns
- Cross-reference to Projections
Linear Regression as Least Squares
- Fitting : minimize
- Design matrix , coefficient vector
- The hat matrix : maps to
- Residual properties: , (when intercept is included)
- Weighted least squares: minimize for non-uniform noise
Gram-Schmidt Process
- Input: linearly independent vectors
- Output: orthonormal vectors
- Algorithm: subtract projections onto previous vectors, then normalize
- Step-by-step example
- Classical vs Modified Gram-Schmidt: same math, different numerical stability
QR Decomposition
- : orthonormal columns, upper triangular
- Gram-Schmidt produces and simultaneously
- Solving least squares via QR: (triangular solve, no needed)
- Why QR is more stable than normal equations:
- Normal equations square the condition number:
- QR works with directly
- Householder reflections (brief): the numerically preferred way to compute QR
Regularized Least Squares
- When is singular or near-singular (rank-deficient or ill-conditioned)
- Ridge regression: minimize
- Solution:
- Geometric view: shrinks the ellipsoid, rounds the bowl
- Cross-reference to Ridge Regression
- Connection to SVD truncation: ridge suppresses small singular values smoothly
Summary
Answering the Central Question: The best approximate solution is , the least squares solution that minimizes . This is geometrically the projection of onto . The normal equations give this directly but square the condition number. QR decomposition (, then ) avoids forming and is numerically stable, making it the default in practice.
Applications in Data Science and Machine Learning
- Ordinary Least Squares (OLS): the foundation of linear regression
- Polynomial and basis-function regression: same framework, different design matrix
- QR in practice:
numpy.linalg.lstsqandscipy.linalg.qruse QR internally, not normal equations - Stable training: QR-based solvers avoid catastrophic cancellation in ill-conditioned feature matrices
- Orthogonal features: after Gram-Schmidt, each feature's contribution is independent — interpretable coefficients
- Recursive least squares: updating the QR factorization as new data arrives (online learning)
Guided Problems
References
- Strang, Introduction to Linear Algebra, Chapter 4 (4.3–4.4)