Skip to main content

Orthogonality and Projections

The Central Question: What Is the Closest Point in a Subspace?

Given a vector bb that does not lie in a subspace SS, what point in SS is nearest to bb? The answer is the orthogonal projection: drop a perpendicular from bb onto SS. This geometric idea underlies least squares regression (project bb onto the column space of AA), Gram-Schmidt orthogonalization, and the orthogonal complement relationships among the four fundamental subspaces.

Topics to Cover

Orthogonal Vectors and Subspaces

  • Definition: vw=0v \cdot w = 0
  • Orthogonal complements: VV^{\perp} = all vectors perpendicular to VV
  • Orthogonality of the four fundamental subspaces:
    • Row space \perp Nullspace (both in Rn\mathbb{R}^n)
    • Column space \perp Left nullspace (both in Rm\mathbb{R}^m)
  • Why this orthogonality is the geometric backbone of Ax=bAx = b
  • Cross-reference to The Four Fundamental Subspaces

Orthogonal Bases and Orthonormal Bases

  • Orthogonal basis: mutually perpendicular, any length
  • Orthonormal basis: mutually perpendicular, unit length
  • Why orthonormal bases are computationally ideal: ci=qiTbc_i = q_i^T b (no system to solve)
  • Orthogonal matrices: QTQ=IQ^TQ = I, Q1=QTQ^{-1} = Q^T
  • Cross-reference to Special Matrices: Orthogonal

Projection onto a Line

  • Projection of bb onto line through aa: p=aTbaTaap = \frac{a^Tb}{a^Ta}a
  • Projection matrix: P=aaTaTaP = \frac{aa^T}{a^Ta}
  • Error e=bpe = b - p is perpendicular to aa (the key idea)
  • Geometric picture: dropping a perpendicular

Projection onto a Subspace

  • Projection of bb onto column space of AA: p=Ax^p = A\hat{x}
  • Derivation from eC(A)e \perp C(A), i.e., AT(bAx^)=0A^T(b - A\hat{x}) = 0
  • Normal equations: ATAx^=ATbA^TA\hat{x} = A^Tb
  • Projection matrix: P=A(ATA)1ATP = A(A^TA)^{-1}A^T
  • Properties: P2=PP^2 = P (idempotent), PT=PP^T = P (symmetric)
  • (IP)(I - P) projects onto the orthogonal complement

Summary

Answering the Central Question: The closest point in a subspace SS to a vector bb is the orthogonal projection p=Pbp = Pb, where P=A(ATA)1ATP = A(A^TA)^{-1}A^T is the projection matrix. The error e=bpe = b - p is perpendicular to SS, making bp\|b - p\| the minimum possible distance. This is equivalent to solving the normal equations ATAx^=ATbA^TA\hat{x} = A^Tb, which is exactly the least squares problem.

Applications in Data Science and Machine Learning

  • Linear regression as projection: y^=Pb\hat{y} = Pb projects the response vector onto the column space of the feature matrix
  • Residuals: e=(IP)be = (I - P)b lives in the left nullspace — the part of bb unexplained by features
  • Dimensionality reduction: projection onto top-kk principal subspace
  • Signal vs noise decomposition: projecting data onto signal subspace, residual = noise
  • Feature orthogonalization: removing collinearity by projecting out shared directions

Guided Problems

References

  • Strang, Introduction to Linear Algebra, Chapter 4 (4.1–4.2)