Skip to main content

Section 2: Problems

University-level exam questions for Matrix Calculus and Automatic Differentiation.

Matrix Calculus

Problem 1.1

Compute Xtr(AXB)\frac{\partial}{\partial X} \text{tr}(AXB) where AA, XX, BB are matrices of compatible dimensions.

Difficulty: Medium

Problem 1.2

Show that Xlogdet(X)=XT\frac{\partial}{\partial X} \log\det(X) = X^{-T} for a positive definite matrix XX.

Difficulty: Hard

Problem 1.3

Derive XX1=X1(X)X1\frac{\partial}{\partial X} X^{-1} = -X^{-1} (\partial X) X^{-1} using the identity XX1=IXX^{-1} = I.

Difficulty: Medium

Problem 1.4

For the linear regression loss L=yXβ2L = \|y - X\beta\|^2, compute Lβ\frac{\partial L}{\partial \beta} and 2Lβ2\frac{\partial^2 L}{\partial \beta^2} using matrix calculus.

Difficulty: Medium

Automatic Differentiation

Problem 2.1

Draw the computational graph for f(x1,x2)=ln(x1)+x1x2sin(x2)f(x_1, x_2) = \ln(x_1) + x_1 x_2 - \sin(x_2) and compute the gradient using reverse-mode AD.

Difficulty: Medium

Problem 2.2

Explain why forward-mode AD is efficient for f:RRmf: \mathbb{R} \to \mathbb{R}^m and reverse-mode AD is efficient for f:RnRf: \mathbb{R}^n \to \mathbb{R}. What are the computational costs of each?

Difficulty: Medium

Problem 2.3

Implement dual numbers for forward-mode AD and verify on f(x)=x2+2x+1f(x) = x^2 + 2x + 1 that f(3)=8f'(3) = 8.

Difficulty: Medium

Challenge Problems

Problem 3.1

Derive the backpropagation equations for a two-layer neural network with ReLU activations and cross-entropy loss, identifying each step as a VJP computation.

Difficulty: Very Hard

Problem 3.2

Prove that the memory cost of reverse-mode AD is proportional to the number of operations in the computational graph.

Difficulty: Hard


Solutions

Solutions are available in the implementation file with verification code.