Skip to main content

Integration for ML


The Central Question: How Do We Compute the Sums and Averages That Probability Requires?

Probability and statistics are built on integration: expectations are integrals, marginalizations require integrating out variables, and normalizing constants ensure densities integrate to one. Many of these integrals have no closed form, making approximation methods essential.

Consider these scenarios:

  1. The posterior in Bayesian inference is p(θD)=p(Dθ)p(θ)p(D)p(\theta | D) = \frac{p(D|\theta)p(\theta)}{p(D)}. The denominator p(D)=p(Dθ)p(θ)dθp(D) = \int p(D|\theta)p(\theta)d\theta is often an intractable integral.
  2. Computing E[f(X)]E[f(X)] for a complex distribution requires f(x)p(x)dx\int f(x)p(x)dx. Monte Carlo approximation replaces this with a sample average.
  3. In normalizing flows, the change-of-variables formula transforms integrals via the Jacobian determinant, connecting integration to linear algebra.

Integration is the bridge between probability models and computable quantities.


Topics to Cover

Computing Expectations

  • E[f(X)]=f(x)p(x)dxE[f(X)] = \int f(x)p(x)dx (continuous) or xf(x)p(x)\sum_x f(x)p(x) (discrete)
  • Linearity of expectation
  • Expectations of common distributions

Marginalization

  • p(x)=p(x,y)dyp(x) = \int p(x, y)dy: integrating out variables
  • Applications in latent variable models
  • Connection to the sum rule of probability

Normalizing Constants

  • p(x)=1Zp~(x)p(x) = \frac{1}{Z}\tilde{p}(x) where Z=p~(x)dxZ = \int \tilde{p}(x)dx
  • Partition functions in exponential families and energy-based models
  • When ZZ is tractable vs intractable

Monte Carlo Integration

  • Basic Monte Carlo: I^=1Ni=1Nf(xi)\hat{I} = \frac{1}{N}\sum_{i=1}^N f(x_i) where xipx_i \sim p
  • Convergence rate: O(1/N)O(1/\sqrt{N}) regardless of dimension
  • Importance sampling: Ep[f]=Eq[f(x)p(x)q(x)]E_p[f] = E_q\left[\frac{f(x)p(x)}{q(x)}\right]

Change of Variables with the Jacobian

  • Af(x)dx=g1(A)f(g(u))det(Jg(u))du\int_A f(x)dx = \int_{g^{-1}(A)} f(g(u)) |\det(J_g(u))| du
  • Connection to Determinants
  • Application to probability density transformation

Summary

Answering the Central Question: Integration is how we compute expectations (E[f(X)]=f(x)p(x)dxE[f(X)] = \int f(x)p(x)dx), marginalize out variables (p(x)=p(x,y)dyp(x) = \int p(x,y)dy), and normalize densities (Z=p~(x)dxZ = \int \tilde{p}(x)dx). When these integrals are intractable, Monte Carlo methods approximate them with sample averages that converge at O(1/N)O(1/\sqrt{N}). The change-of-variables formula with the Jacobian determinant allows us to transform integrals between coordinate systems, which is fundamental to normalizing flows and density estimation.


Applications in Data Science and Machine Learning

  • Bayesian inference: Computing posterior distributions requires marginalizing over parameters (intractable integrals motivate MCMC and variational inference)
  • Expectation-maximization (EM): The E-step computes expected sufficient statistics, requiring integration over latent variables
  • Normalizing flows: The change-of-variables formula with Jacobian determinants enables tractable density estimation
  • Monte Carlo methods: MCMC, importance sampling, and particle filters approximate intractable integrals
  • Variational inference: The ELBO is an expectation that lower-bounds the log-evidence

Guided Problems


References

  1. Deisenroth, Faisal, and Ong - Mathematics for Machine Learning, Chapter 6.3
  2. Bishop, Christopher - Pattern Recognition and Machine Learning, Chapter 11 (Sampling Methods)
  3. Murphy, Kevin - Machine Learning: A Probabilistic Perspective, Chapter 24 (Monte Carlo Methods)
  4. Papamakarios et al. - Normalizing Flows for Probabilistic Modeling and Inference (2021)