Probability & Statistics for Machine Learning
Introduction
This probability and statistics course is structured around the concepts most essential for machine learning, drawing from Harvard Stat 110, CMU 36-700, Bishop's Pattern Recognition and Machine Learning, and Murphy's Machine Learning: A Probabilistic Perspective. The material is organized into three sections progressing from foundational probability through multivariate distributions and estimation to advanced topics that underpin modern ML.
The Three Sections
Section 1: Probability Foundations
Core probability theory: axioms, random variables, distributions, and moments.
Topics:
- Probability axioms, conditional probability, Bayes' theorem, independence
- Random variables: discrete and continuous, PMF, PDF, CDF
- Common distributions: Bernoulli, Binomial, Poisson, Gaussian, Exponential, Beta, and more
- Expectation, variance, covariance, correlation, and moments
Section 2: Multivariate Distributions and Estimation
Joint distributions, the multivariate Gaussian, and parameter estimation.
Topics:
- Joint, marginal, and conditional distributions; sum and product rules
- The multivariate Gaussian: conditioning, marginalization, Mahalanobis distance
- Maximum likelihood estimation, MAP, Fisher information, Cramer-Rao bound
- Bayesian inference: priors, posteriors, conjugacy, posterior predictive distributions
Section 3: Advanced Topics for ML
Exponential families, information theory, and concentration inequalities.
Topics:
- Exponential family distributions, natural parameters, sufficient statistics, connection to GLMs
- Entropy, cross-entropy, KL divergence, mutual information, connections to loss functions
- Law of large numbers, CLT, Hoeffding/Chernoff bounds, PAC learning connections