Exponential Families
The Central Question: Is There a Unifying Framework for Common Distributions?
Bernoulli, Gaussian, Poisson, Exponential, Gamma, Beta, Categorical -- these distributions look different but share a common mathematical structure. The exponential family unifies them into a single framework with elegant properties: sufficient statistics, conjugate priors, and a direct connection to generalized linear models.
Consider these scenarios:
- The sufficient statistic for a Gaussian is . For any exponential family, the sufficient statistic has a fixed dimension regardless of sample size, enabling efficient data compression.
- Every exponential family has a natural conjugate prior, making Bayesian inference tractable. The prior is also an exponential family with the same sufficient statistics.
- Generalized linear models (logistic regression, Poisson regression, linear regression) each correspond to choosing a different exponential family for the response distribution.
Exponential families are the theoretical backbone of classical statistical ML.
Topics to Cover
Exponential Family Canonical Form
- Natural parameter , sufficient statistic , log-partition function , base measure
- Writing Bernoulli, Gaussian, Poisson, and others in this form
Natural Parameters and Sufficient Statistics
- Natural parameters as the "canonical" parameterization
- Sufficient statistics: captures all information about in the data
- Factorization theorem: is sufficient iff the likelihood factors through it
Conjugate Priors for Exponential Families
- For any exponential family likelihood, the conjugate prior has the form
- The posterior updates simply by adding observed sufficient statistics
- Examples: Beta-Bernoulli, Gamma-Poisson, Normal-Normal
Connection to Generalized Linear Models
- GLMs: where is the link function
- Each GLM corresponds to an exponential family for
- Canonical link: when (natural parameter equals linear predictor)
Summary
Answering the Central Question: The exponential family unifies the most common distributions under a single framework. Its key properties are: (1) sufficient statistics compress data without losing information about ; (2) the log-partition function generates all moments via differentiation (, ); (3) conjugate priors exist automatically; (4) GLMs connect exponential families to linear models via link functions.
Applications in Data Science and Machine Learning
- Generalized linear models: Logistic regression (Bernoulli), Poisson regression (Poisson), and linear regression (Gaussian) are all GLMs based on exponential families
- Sufficient statistics: Enable efficient learning without storing all data points
- Variational inference: The exponential family structure enables mean-field variational inference with closed-form coordinate updates
- Natural gradient descent: The Fisher information of exponential families has a simple form, enabling efficient natural gradient computation
- Boltzmann machines and energy-based models: Follow exponential family structure with learned sufficient statistics
Guided Problems
References
- Bishop, Christopher - Pattern Recognition and Machine Learning, Chapter 2.4
- Murphy, Kevin - Machine Learning: A Probabilistic Perspective, Chapter 9
- Wainwright and Jordan - Graphical Models, Exponential Families, and Variational Inference
- Deisenroth, Faisal, and Ong - Mathematics for Machine Learning, Chapter 6.6