Skip to main content

Probability & Statistics for Machine Learning

Introduction

This probability and statistics course is structured around the concepts most essential for machine learning, drawing from Harvard Stat 110, CMU 36-700, Bishop's Pattern Recognition and Machine Learning, and Murphy's Machine Learning: A Probabilistic Perspective. The material is organized into three sections progressing from foundational probability through multivariate distributions and estimation to advanced topics that underpin modern ML.

The Three Sections

Section 1: Probability Foundations

Core probability theory: axioms, random variables, distributions, and moments.

Topics:

  • Probability axioms, conditional probability, Bayes' theorem, independence
  • Random variables: discrete and continuous, PMF, PDF, CDF
  • Common distributions: Bernoulli, Binomial, Poisson, Gaussian, Exponential, Beta, and more
  • Expectation, variance, covariance, correlation, and moments

Section 2: Multivariate Distributions and Estimation

Joint distributions, the multivariate Gaussian, and parameter estimation.

Topics:

  • Joint, marginal, and conditional distributions; sum and product rules
  • The multivariate Gaussian: conditioning, marginalization, Mahalanobis distance
  • Maximum likelihood estimation, MAP, Fisher information, Cramer-Rao bound
  • Bayesian inference: priors, posteriors, conjugacy, posterior predictive distributions

Section 3: Advanced Topics for ML

Exponential families, information theory, and concentration inequalities.

Topics:

  • Exponential family distributions, natural parameters, sufficient statistics, connection to GLMs
  • Entropy, cross-entropy, KL divergence, mutual information, connections to loss functions
  • Law of large numbers, CLT, Hoeffding/Chernoff bounds, PAC learning connections