The Multivariate Gaussian
The Central Question: Why Is the Gaussian Distribution So Central to Machine Learning?
The multivariate Gaussian (MVN) is the most important distribution in machine learning. Its closed-form conditioning and marginalization formulas make it analytically tractable. The central limit theorem makes it a natural model for aggregated noise. And many ML algorithms (GP regression, Kalman filters, LDA) are fundamentally Gaussian.
Consider these scenarios:
- Gaussian process regression places a multivariate Gaussian prior over function values. Conditioning on observed data gives a posterior that is also Gaussian, with closed-form mean and variance.
- The Kalman filter models state evolution and observations as linear-Gaussian. All updates are applications of the MVN conditioning formula.
- Linear discriminant analysis assumes each class has a Gaussian distribution, and the decision boundary comes from comparing Gaussian densities.
The MVN is the workhorse distribution of probabilistic ML.
Topics to Cover
MVN Definition and Parameterization
- with density
- Mean vector and covariance matrix
- Precision matrix and information form
Conditioning and Marginalization Formulas
- Partition:
- Marginal:
- Conditional:
Mahalanobis Distance
- : the Gaussian's notion of "distance"
- Reduces to Euclidean distance when
- Level curves of the Gaussian are ellipsoids defined by constant Mahalanobis distance
Affine Transformations
- If and , then
- Whitening: choosing to decorrelate
- Connection to Spectral Theory (eigendecomposition of )
Summary
Answering the Central Question: The Gaussian is central because: (1) it is closed under marginalization, conditioning, and affine transformation, making it analytically tractable; (2) the central limit theorem makes it a natural model for averaged quantities; (3) it is the maximum entropy distribution for given mean and variance; (4) many ML algorithms (GPs, Kalman filters, LDA, factor analysis) are built on Gaussian assumptions. The key formulas are the conditioning formula and the affine transformation .
Applications in Data Science and Machine Learning
- Gaussian process regression: The posterior over function values is computed using the MVN conditioning formula
- Kalman filter: State estimation via sequential application of MVN conditioning
- Linear discriminant analysis: Assumes class-conditional Gaussian distributions, derives linear decision boundaries
- Factor analysis and PCA: Model data as a low-rank Gaussian plus noise
- Variational autoencoders: The encoder outputs parameters of a Gaussian, and the reparameterization trick samples from it
Guided Problems
References
- Bishop, Christopher - Pattern Recognition and Machine Learning, Chapter 2.3
- Murphy, Kevin - Machine Learning: A Probabilistic Perspective, Chapter 4
- Deisenroth, Faisal, and Ong - Mathematics for Machine Learning, Chapter 6.5
- Rasmussen and Williams - Gaussian Processes for Machine Learning, Chapter 2