Skip to main content

Matrix Operations


The Central Question: How Do We Compute With Matrices Efficiently?

In practice, we need to compute with matrices: multiply them, solve systems, and decompose them into simpler pieces.

Naive approaches are expensive. Solving Ax=bAx = b by computing A1A^{-1} costs O(n3)O(n^3) operations and is numerically unstable.

Instead, we factorize AA into structured components (triangular, orthogonal, diagonal) that are cheap to work with. This is the core idea behind every numerical linear algebra algorithm used in machine learning.


Matrix Multiplication

Four Ways to See ABAB

The same product C=ABC = AB can be interpreted in four equivalent ways. Each reveals different structure.

1. Entry-level (Dot Product)

Each entry CijC_{ij} is the dot product of row ii of AA with column jj of BB:

Cij=(row i of A)(col j of B)C_{ij} = (\text{row } i \text{ of } A) \cdot (\text{col } j \text{ of } B)

2. Column Interpretation

Each column of CC is a linear combination of the columns of AA, with coefficients from BB:

colj(C)=Acolj(B)=b1jcol1(A)+b2jcol2(A)+\text{col}_j(C) = A \cdot \text{col}_j(B) = b_{1j} \cdot \text{col}_1(A) + b_{2j} \cdot \text{col}_2(A) + \cdots

This view explains Ax=bAx = b: the output is a combination of columns of AA weighted by xx.

3. Row Interpretation

Each row of CC is a linear combination of the rows of BB, with coefficients from AA:

rowi(C)=rowi(A)B=ai1row1(B)+ai2row2(B)+\text{row}_i(C) = \text{row}_i(A) \cdot B = a_{i1} \cdot \text{row}_1(B) + a_{i2} \cdot \text{row}_2(B) + \cdots

This view explains why row operations on AA correspond to left-multiplication by elementary matrices.

4. Outer Product (Column × Row)

CC is a sum of rank-1 outer products:

AB=k=1ncolk(A)rowk(B)AB = \sum_{k=1}^{n} \text{col}_k(A) \cdot \text{row}_k(B)

Each term colk(A)rowk(B)\text{col}_k(A) \cdot \text{row}_k(B) is an m×pm \times p matrix of rank 1. This decomposition is the basis of low-rank approximation and the SVD.

Example:

Let A=[1234]A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}, B=[5678],C=AB=[19224350]B = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}, C = AB = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}:

1. Dot product: C11=[12][57]=5+14=19C_{11} = \begin{bmatrix}1&2\end{bmatrix}\cdot\begin{bmatrix}5\\7\end{bmatrix} = 5+14 = 19, and so on for each entry.

2. Column: Each column of CC is a combination of columns of AA:

col1(C)=5[13]+7[24]=[1943],col2(C)=6[13]+8[24]=[2250]\text{col}_1(C) = 5\begin{bmatrix}1\\3\end{bmatrix} + 7\begin{bmatrix}2\\4\end{bmatrix} = \begin{bmatrix}19\\43\end{bmatrix}, \quad \text{col}_2(C) = 6\begin{bmatrix}1\\3\end{bmatrix} + 8\begin{bmatrix}2\\4\end{bmatrix} = \begin{bmatrix}22\\50\end{bmatrix}

3. Row: Each row of CC is a combination of rows of BB:

row1(C)=1[56]+2[78]=[1922],row2(C)=3[56]+4[78]=[4350]\text{row}_1(C) = 1\begin{bmatrix}5&6\end{bmatrix} + 2\begin{bmatrix}7&8\end{bmatrix} = \begin{bmatrix}19&22\end{bmatrix}, \quad \text{row}_2(C) = 3\begin{bmatrix}5&6\end{bmatrix} + 4\begin{bmatrix}7&8\end{bmatrix} = \begin{bmatrix}43&50\end{bmatrix}

4. Outer product: Sum of rank-1 matrices:

AB=[13][56]+[24][78]=[561518]+[14162832]=[19224350]AB = \begin{bmatrix} 1 \\ 3 \end{bmatrix}\begin{bmatrix} 5 & 6 \end{bmatrix} + \begin{bmatrix} 2 \\ 4 \end{bmatrix}\begin{bmatrix} 7 & 8 \end{bmatrix} = \begin{bmatrix} 5 & 6 \\ 15 & 18 \end{bmatrix} + \begin{bmatrix} 14 & 16 \\ 28 & 32 \end{bmatrix} = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}

Block Multiplication

Matrices partitioned into blocks multiply like scalar entries, as long as block dimensions are compatible:

[A11A12A21A22][B11B12B21B22]=[A11B11+A12B21A11B12+A12B22A21B11+A22B21A21B12+A22B22]\begin{bmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{bmatrix} \begin{bmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{bmatrix} = \begin{bmatrix} A_{11}B_{11} + A_{12}B_{21} & A_{11}B_{12} + A_{12}B_{22} \\ A_{21}B_{11} + A_{22}B_{21} & A_{21}B_{12} + A_{22}B_{22} \end{bmatrix}

Properties

Theorem: Properties of Matrix Multiplication

For matrices of compatible sizes:

  1. Associativity: (AB)C=A(BC)(AB)C = A(BC)
  2. Distributivity: A(B+C)=AB+ACA(B + C) = AB + AC
  3. Not commutative: ABBAAB \neq BA in general
  4. Transpose reverses: (AB)T=BTAT(AB)^T = B^T A^T
  5. Identity: AI=IA=AAI = IA = A

Non-commutativity matters. If AA is 2×32 \times 3 and BB is 3×23 \times 2, then ABAB is 2×22 \times 2 but BABA is 3×33 \times 3—different sizes entirely. Even when both products are defined (square matrices), they're typically different.


LU Decomposition

Gaussian Elimination as Matrix Factorization

Gaussian elimination transforms AA into an upper triangular matrix UU by applying elementary row operations. Each row operation corresponds to left-multiplying by an elementary matrix EkE_k:

EkE2E1A=UE_k \cdots E_2 E_1 A = U

Rearranging:

A=E11E21Ek1U=LUA = E_1^{-1} E_2^{-1} \cdots E_k^{-1} U = LU

The product of inverse elementary matrices is a lower triangular matrix LL.

Definition: LU Decomposition

An LU decomposition of a matrix AA is a factorization:

A=LUA = LU

where LL is lower triangular (with 1s on the diagonal) and UU is upper triangular.

Example:

A=[211433879]A = \begin{bmatrix} 2 & 1 & 1 \\ 4 & 3 & 3 \\ 8 & 7 & 9 \end{bmatrix}

Step 1: R2R22R1R_2 \leftarrow R_2 - 2R_1, R3R34R1R_3 \leftarrow R_3 - 4R_1:

[211011035]\begin{bmatrix} 2 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 3 & 5 \end{bmatrix}

Step 2: R3R33R2R_3 \leftarrow R_3 - 3R_2:

U=[211011002]U = \begin{bmatrix} 2 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 2 \end{bmatrix}

The multipliers (2, 4, 3) fill LL below the diagonal:

L=[100210431]L = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 4 & 3 & 1 \end{bmatrix}

Verification: LU=[100210431][211011002]=[211433879]=ALU = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 4 & 3 & 1 \end{bmatrix}\begin{bmatrix} 2 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 2 \end{bmatrix} = \begin{bmatrix} 2 & 1 & 1 \\ 4 & 3 & 3 \\ 8 & 7 & 9 \end{bmatrix} = A

Why Factor Instead of Invert?

Solving Ax=bAx = b via A1bA^{-1}b costs O(n3)O(n^3) for the inverse plus O(n2)O(n^2) for the multiply. With LU:

  1. Factor once: A=LUA = LU costs O(23n3)O(\frac{2}{3}n^3)
  2. Solve twice: Ly=bLy = b (forward substitution, O(n2)O(n^2)), then Ux=yUx = y (back substitution, O(n2)O(n^2))

For multiple right-hand sides b1,b2,,bkb_1, b_2, \ldots, b_k, the factorization is reused. Solving kk systems costs O(23n3+2kn2)O(\frac{2}{3}n^3 + 2kn^2) instead of O(kn3)O(kn^3).

Proof

LU factorization — O(23n3)O(\frac{2}{3}n^3)

Consider a 4×44 \times 4 matrix. At each step of elimination, we pick a pivot and use it to zero out everything below it. The work happens in the shaded submatrix — the block below and to the right of the pivot:

Step k=1k=1: Pivot at position (1,1)(1,1). To zero out the 3 entries below it (rows 2, 3, 4), we subtract a multiple of row 1 from each row. But the subtraction updates every entry to the right of the pivot too — that's 3 entries per row. So the work is a 3×33 \times 3 block: 9 multiply-subtracts.

[a11a12a13a14000](n1)2=32=9 operations\begin{bmatrix} \boxed{a_{11}} & a_{12} & a_{13} & a_{14} \\ 0 & \color{red}{*} & \color{red}{*} & \color{red}{*} \\ 0 & \color{red}{*} & \color{red}{*} & \color{red}{*} \\ 0 & \color{red}{*} & \color{red}{*} & \color{red}{*} \end{bmatrix} \quad (n-1)^2 = 3^2 = 9 \text{ operations}

Step k=2k=2: Pivot at (2,2)(2,2). Zero out 2 rows below, each with 2 entries to update. Work is a 2×22 \times 2 block: 4 multiply-subtracts.

[0a22a23a240000](n2)2=22=4 operations\begin{bmatrix} * & * & * & * \\ 0 & \boxed{a_{22}} & a_{23} & a_{24} \\ 0 & 0 & \color{red}{*} & \color{red}{*} \\ 0 & 0 & \color{red}{*} & \color{red}{*} \end{bmatrix} \quad (n-2)^2 = 2^2 = 4 \text{ operations}

Step k=3k=3: Pivot at (3,3)(3,3). One row below, one entry: 1 multiply-subtract.

The pattern: at step kk, the work is an (nk)×(nk)(n-k) \times (n-k) block. Each entry in this block requires 1 multiply + 1 subtract. Total:

Multiplications=k=1n1(nk)2=j=1n1j2=(n1)n(2n1)6n33\text{Multiplications} = \sum_{k=1}^{n-1}(n-k)^2 = \sum_{j=1}^{n-1}j^2 = \frac{(n-1)n(2n-1)}{6}\approx \frac{n^3}{3}

Each multiply is paired with a subtract, so the total operation count is 2n33\sim \frac{2n^3}{3}.

Forward/back substitution — O(n2)O(n^2)

Solving Ly=bLy = b: the kk-th unknown requires (k1)(k-1) multiplications. Total: k=1n(k1)=n(n1)2=O(n2)\sum_{k=1}^{n}(k-1) = \frac{n(n-1)}{2} = O(n^2). Same for Ux=yUx = y.

Computing A1A^{-1}O(n3)O(n^3)

A1A^{-1} solves AX=IAX = I, i.e., nn separate systems Axj=ejAx_j = e_j. LU factorization costs 2n33\frac{2n^3}{3} once, then each of the nn forward/back solves costs O(n2)O(n^2), giving an additional nO(n2)=O(n3)n \cdot O(n^2) = O(n^3). Total: 2n33+n35n33\sim \frac{2n^3}{3} + n^3 \approx \frac{5n^3}{3}, which is roughly 2.5x more than a single LU solve.

Matrix-vector multiply A1bA^{-1}bO(n2)O(n^2)

Each of the nn entries of the result requires a dot product of length nn: n×n=n2n \times n = n^2 operations.

MethodOne systemkk systems
InverseO(n3)+O(n2)O(n^3) + O(n^2)O(n3)+O(kn2)O(n^3) + O(kn^2)
LUO(23n3)+O(n2)O(\frac{2}{3}n^3) + O(n^2)O(23n3)+O(kn2)O(\frac{2}{3}n^3) + O(kn^2)

The LU approach is also more numerically stable.

When Does LU Exist?

LU exists without row swaps when all leading principal minors of AA are nonzero (i.e., the top-left k×kk \times k submatrix is invertible for all kk). When a zero pivot is encountered, row swaps are necessary.

PLU Decomposition

When row swaps are needed, we use a permutation matrix PP:

Definition: PLU Decomposition

Every invertible matrix AA has a factorization:

PA=LUPA = LU

where PP is a permutation matrix, LL is lower triangular, and UU is upper triangular.

A permutation matrix PP is obtained by reordering the rows of the identity matrix. It satisfies PTP=IP^T P = I (i.e., P1=PTP^{-1} = P^T).

Example: Swapping rows 1 and 2:

P=[010100001]P = \begin{bmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}

Partial pivoting: In practice, even when row swaps aren't strictly necessary, we swap to place the largest available pivot in position. This controls the growth of entries in LL and UU, improving numerical stability.


Special Matrices

Certain matrix structures appear repeatedly in linear algebra and ML. Recognizing them unlocks computational shortcuts and theoretical guarantees.

Diagonal Matrices

D=[d1d2d3]D = \begin{bmatrix} d_1 & & \\ & d_2 & \\ & & d_3 \end{bmatrix}

OperationFormulaCost
Multiply DxDx(d1x1,d2x2,d3x3)(d_1 x_1, d_2 x_2, d_3 x_3)O(n)O(n)
Inverse D1D^{-1}diag(1/d1,1/d2,1/d3)\text{diag}(1/d_1, 1/d_2, 1/d_3)O(n)O(n)
Powers DkD^kdiag(d1k,d2k,d3k)\text{diag}(d_1^k, d_2^k, d_3^k)O(n)O(n)
Determinantd1d2d3d_1 d_2 d_3O(n)O(n)
Eigenvaluesd1,d2,d3d_1, d_2, d_3Already known

Diagonal matrices are the "ideal" form. Many decompositions (eigendecomposition, SVD) aim to reduce a matrix to diagonal form.

Triangular Matrices

Lower triangular LL: all entries above the diagonal are zero. Upper triangular UU: all entries below the diagonal are zero.

L=[l1100l21l220l31l32l33],U=[u11u12u130u22u2300u33]L = \begin{bmatrix} l_{11} & 0 & 0 \\ l_{21} & l_{22} & 0 \\ l_{31} & l_{32} & l_{33} \end{bmatrix}, \quad U = \begin{bmatrix} u_{11} & u_{12} & u_{13} \\ 0 & u_{22} & u_{23} \\ 0 & 0 & u_{33} \end{bmatrix}

Key properties:

  • Determinant: Product of diagonal entries: det(L)=l11l22l33\det(L) = l_{11} l_{22} l_{33}
  • Solving Lx=bLx = b: Forward substitution, O(n2)O(n^2)
  • Solving Ux=bUx = b: Back substitution, O(n2)O(n^2)
  • Product: The product of lower (upper) triangular matrices is lower (upper) triangular
  • Inverse: The inverse of a lower (upper) triangular matrix is lower (upper) triangular

Symmetric Matrices

Definition: Symmetric Matrix

A matrix SS is symmetric if S=STS = S^T, meaning Sij=SjiS_{ij} = S_{ji} for all i,ji, j.

Key properties:

  • Eigenvalues are real (even for matrices with complex entries)
  • Eigenvectors corresponding to distinct eigenvalues are orthogonal
  • Can be diagonalized as S=QΛQTS = Q\Lambda Q^T where QQ is orthogonal

Where they appear:

  • Gram matrix: ATAA^T A is always symmetric
  • Covariance matrix: Σ=1n1XTX\Sigma = \frac{1}{n-1}X^T X (centered data)
  • Hessian: Second-derivative matrix in optimization
  • Kernel/similarity matrices: Kij=k(xi,xj)K_{ij} = k(x_i, x_j)
Theorem: Spectral Theorem for Symmetric Matrices

Every real symmetric matrix SS can be factored as:

S=QΛQTS = Q\Lambda Q^T

where QQ is orthogonal (QTQ=IQ^T Q = I) and Λ\Lambda is diagonal (the eigenvalues).

Positive Definite Matrices

Definition: Positive Definite Matrix

A symmetric matrix SS is positive definite (PD) if for all non-zero vectors xx:

xTSx>0x^T S x > 0

It is positive semi-definite (PSD) if xTSx0x^T S x \geq 0.

Equivalent conditions for PD:

  • All eigenvalues are positive: λi>0\lambda_i > 0
  • All pivots are positive
  • All leading principal minors are positive
  • S=RTRS = R^T R for some matrix RR with independent columns

Where they appear:

  • ATAA^T A is PSD for any AA; PD when AA has independent columns
  • Covariance matrices are PSD
  • Hessians at local minima are PSD

Cholesky decomposition: Every PD matrix has a unique factorization S=LLTS = LL^T where LL is lower triangular with positive diagonal. This is the "square root" of a matrix, and costs half the operations of LU.

Orthogonal Matrices

Definition: Orthogonal Matrix

A square matrix QQ is orthogonal if its columns are orthonormal:

QTQ=QQT=IQ^T Q = QQ^T = I

Equivalently, Q1=QTQ^{-1} = Q^T.

Key properties:

  • Preserves lengths: Qx=x\|Qx\| = \|x\|
  • Preserves angles: (Qx)T(Qy)=xTy(Qx)^T(Qy) = x^T y
  • Preserves determinant magnitude: det(Q)=1|\det(Q)| = 1
  • Numerically stable: No error amplification when multiplying by QQ

Examples:

  • Rotation: Rθ=[cosθsinθsinθcosθ]R_\theta = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}
  • Reflection: H=I2uuTH = I - 2uu^T (Householder reflector, u=1\|u\| = 1)
  • Permutation: Any row-permuted identity matrix

Where they appear:

  • QR decomposition: A=QRA = QR
  • SVD: A=UΣVTA = U\Sigma V^T where U,VU, V are orthogonal
  • PCA: the principal component matrix is orthogonal

Identity and Zero Matrices

Identity II: the multiplicative identity. AI=IA=AAI = IA = A. Diagonal with all 1s.

Zero matrix 00: the additive identity. A+0=AA + 0 = A. The unique matrix that maps everything to zero.


Matrix Factorizations at a Glance

FactorizationFormRequiresCostPurpose
LUA=LUA = LUSquare, no zero pivots23n3\frac{2}{3}n^3Solve Ax=bAx = b
PLUPA=LUPA = LUSquare, invertible23n3\frac{2}{3}n^3Solve Ax=bAx = b with pivoting
CholeskyA=LLTA = LL^TSymmetric positive definite13n3\frac{1}{3}n^3Solve SPD systems (2× faster)
QRA=QRA = QRAny m×nm \times n (mnm \geq n)43n3\frac{4}{3}n^3Least squares, eigenvalues
EigendecompositionA=QΛQTA = Q\Lambda Q^TSymmetricIterativeSpectral analysis, PCA
SVDA=UΣVTA = U\Sigma V^TAny m×nm \times nIterativeLow-rank approx, pseudoinverse

Each factorization exploits different structure. The choice depends on the matrix properties and the problem at hand.


Summary

Matrix multiplication:

  • Four views: dot product (entry), column combination, row combination, outer product sum
  • Not commutative; transpose reverses order: (AB)T=BTAT(AB)^T = B^T A^T
  • Block multiplication works when block dimensions are compatible

LU decomposition:

  • Gaussian elimination encoded as A=LUA = LU (lower × upper triangular)
  • Factor once (O(n3)O(n^3)), solve many times (O(n2)O(n^2) each)
  • PLU with partial pivoting handles zero pivots and improves stability

Special matrices:

  • Diagonal: All operations in O(n)O(n); the ideal target form
  • Triangular: O(n2)O(n^2) solves via forward/back substitution
  • Symmetric: Real eigenvalues, orthogonal eigenvectors; spectral theorem gives S=QΛQTS = Q\Lambda Q^T
  • Positive definite: All eigenvalues positive; Cholesky S=LLTS = LL^T at half the cost of LU
  • Orthogonal: Q1=QTQ^{-1} = Q^T; preserves lengths and angles; numerically ideal

Answering the Central Question: We compute with matrices efficiently by factoring them into structured pieces (triangular, orthogonal, diagonal) that are cheap to work with. LU decomposition is the workhorse for solving linear systems at O(23n3)O(\frac{2}{3}n^3), Cholesky exploits symmetry and positive definiteness at half the cost, and QR provides numerical stability for least squares problems.


Applications in Data Science and Machine Learning

Solving the Normal Equations

Linear regression requires solving (XTX)β=XTy(X^TX)\beta = X^Ty. The Gram matrix XTXX^TX is symmetric and (when XX has independent columns) positive definite. Cholesky decomposition is the method of choice:

  1. Compute G=XTXG = X^TX and c=XTyc = X^Ty
  2. Factor G=LLTG = LL^T (Cholesky)
  3. Solve Ly=cLy = c (forward substitution)
  4. Solve LTβ=yL^T\beta = y (back substitution)

This costs 13n3+n2\frac{1}{3}n^3 + n^2—half the cost of LU—and exploits the symmetry and positive definiteness of GG. See Matrix Inverse: Linear Regression for when GG is singular.

QR for Numerically Stable Least Squares

When XTXX^TX is ill-conditioned, Cholesky on the normal equations amplifies errors (the condition number squares: κ(XTX)=κ(X)2\kappa(X^TX) = \kappa(X)^2). The QR approach avoids forming XTXX^TX entirely:

  1. Factor X=QRX = QR where QQ is orthogonal, RR is upper triangular
  2. The normal equations become Rβ=QTyR\beta = Q^Ty
  3. Solve by back substitution

Since QQ preserves lengths, no condition number is lost. QR is the default in numpy.linalg.lstsq and similar libraries.

Batch Linear Systems in Neural Networks

A forward pass through a linear layer computes Y=XWT+bY = XW^T + b for a batch of inputs XRB×nX \in \mathbb{R}^{B \times n}. This is a single matrix multiplication—O(Bnm)O(Bnm) for output dimension mm.

Modern deep learning relies on highly optimized BLAS (Basic Linear Algebra Subprograms) implementations that use:

  • Block multiplication for cache efficiency
  • GPU parallelism across the outer product structure
  • Mixed precision (float16 multiply, float32 accumulate) for speed

Cholesky in Gaussian Processes

Gaussian process inference requires computing K1yK^{-1}y and logdet(K)\log\det(K) for a kernel matrix KK. Since KK is symmetric positive definite:

  1. Factor K=LLTK = LL^T (Cholesky)
  2. Solve K1yK^{-1}y via two triangular solves: O(n2)O(n^2)
  3. Compute logdet(K)=2ilogLii\log\det(K) = 2\sum_i \log L_{ii}: O(n)O(n)

The Cholesky factor is the computational workhorse of GP inference.


Guided Problems

Problem 1: Factorization Practice

Compute the LU decomposition of:

A=[121261114]A = \begin{bmatrix} 1 & 2 & 1 \\ 2 & 6 & 1 \\ 1 & 1 & 4 \end{bmatrix}

  1. Find LL and UU such that A=LUA = LU.
  2. Use the factorization to solve Ax=[112]Ax = \begin{bmatrix} 1 \\ -1 \\ 2 \end{bmatrix}.
💡 Solution

Part 1:

Step 1: R2R22R1R_2 \leftarrow R_2 - 2R_1, R3R31R1R_3 \leftarrow R_3 - 1R_1:

[121021013]\begin{bmatrix} 1 & 2 & 1 \\ 0 & 2 & -1 \\ 0 & -1 & 3 \end{bmatrix}

Multipliers: l21=2l_{21} = 2, l31=1l_{31} = 1.

Step 2: R3R3+12R2R_3 \leftarrow R_3 + \frac{1}{2}R_2:

U=[1210210052]U = \begin{bmatrix} 1 & 2 & 1 \\ 0 & 2 & -1 \\ 0 & 0 & \frac{5}{2} \end{bmatrix}

Multiplier: l32=12l_{32} = -\frac{1}{2}.

L=[1002101121]L = \begin{bmatrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 1 & -\frac{1}{2} & 1 \end{bmatrix}

Verification: LU=ALU = A

Part 2:

Solve Ly=bLy = b (forward substitution) with b=[112]b = \begin{bmatrix} 1 \\ -1 \\ 2 \end{bmatrix}:

y1=1,y2=12(1)=3,y3=21(1)(12)(3)=2132=12y_1 = 1, \quad y_2 = -1 - 2(1) = -3, \quad y_3 = 2 - 1(1) - (-\frac{1}{2})(-3) = 2 - 1 - \frac{3}{2} = -\frac{1}{2}

y=[1312]y = \begin{bmatrix} 1 \\ -3 \\ -\frac{1}{2} \end{bmatrix}

Solve Ux=yUx = y (back substitution):

x3=1/25/2=15,x2=3(1)(1/5)2=31/52=85,x1=12(85)1(15)=1+165+15=225x_3 = \frac{-1/2}{5/2} = -\frac{1}{5}, \quad x_2 = \frac{-3 - (-1)(-1/5)}{2} = \frac{-3 - 1/5}{2} = -\frac{8}{5}, \quad x_1 = 1 - 2(-\frac{8}{5}) - 1(-\frac{1}{5}) = 1 + \frac{16}{5} + \frac{1}{5} = \frac{22}{5}

x=[2258515]x = \begin{bmatrix} \frac{22}{5} \\ -\frac{8}{5} \\ -\frac{1}{5} \end{bmatrix}


Problem 2: Cholesky Decomposition

Let S=[4223]S = \begin{bmatrix} 4 & 2 \\ 2 & 3 \end{bmatrix}.

  1. Verify that SS is symmetric and positive definite.
  2. Find the Cholesky factor LL such that S=LLTS = LL^T.
  3. Use it to solve Sx=[87]Sx = \begin{bmatrix} 8 \\ 7 \end{bmatrix}.
💡 Solution

Part 1:

Symmetric: S=STS = S^T

Positive definite: eigenvalues of SS are λ=7±982=7±12\lambda = \frac{7 \pm \sqrt{9-8}}{2} = \frac{7 \pm \sqrt{1}}{2}... Actually, using the trace and determinant: tr(S)=7>0\text{tr}(S) = 7 > 0, det(S)=124=8>0\det(S) = 12 - 4 = 8 > 0. Both eigenvalues are positive. ✓

Alternatively: leading minors are 4>04 > 0 and det(S)=8>0\det(S) = 8 > 0. ✓

Part 2:

S=LLT=[l110l21l22][l11l210l22]S = LL^T = \begin{bmatrix} l_{11} & 0 \\ l_{21} & l_{22} \end{bmatrix}\begin{bmatrix} l_{11} & l_{21} \\ 0 & l_{22} \end{bmatrix}

From (1,1)(1,1): l112=4    l11=2l_{11}^2 = 4 \implies l_{11} = 2

From (2,1)(2,1): l21l11=2    l21=1l_{21} l_{11} = 2 \implies l_{21} = 1

From (2,2)(2,2): l212+l222=3    l22=2l_{21}^2 + l_{22}^2 = 3 \implies l_{22} = \sqrt{2}

L=[2012]L = \begin{bmatrix} 2 & 0 \\ 1 & \sqrt{2} \end{bmatrix}

Part 3:

Forward solve Ly=[87]Ly = \begin{bmatrix} 8 \\ 7 \end{bmatrix}:

y1=4,y2=71(4)2=32y_1 = 4, \quad y_2 = \frac{7 - 1(4)}{\sqrt{2}} = \frac{3}{\sqrt{2}}

Back solve LTx=yL^T x = y:

x2=3/22=32,x1=41(32)2=54x_2 = \frac{3/\sqrt{2}}{\sqrt{2}} = \frac{3}{2}, \quad x_1 = \frac{4 - 1(\frac{3}{2})}{2} = \frac{5}{4}

x=[5432]x = \begin{bmatrix} \frac{5}{4} \\ \frac{3}{2} \end{bmatrix}


Problem 3: Properties of Special Matrices

For each statement, determine if it is True or False. Justify your answer.

  1. If AA is symmetric and BB is symmetric, then ABAB is symmetric.

  2. If QQ is orthogonal, then det(Q)=±1\det(Q) = \pm 1.

  3. If SS is positive definite, then all diagonal entries of SS are positive.

  4. The product of two upper triangular matrices is upper triangular.

💡 Solution

1. False.

(AB)T=BTAT=BA(AB)^T = B^T A^T = BA. This equals ABAB only if AA and BB commute (AB=BAAB = BA), which is not true in general.

Counterexample: A=[1110]A = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}, B=[0111]B = \begin{bmatrix} 0 & 1 \\ 1 & 1 \end{bmatrix}. Both symmetric, but AB=[1201](AB)TAB = \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix} \neq (AB)^T.

2. True.

From QTQ=IQ^T Q = I: det(QT)det(Q)=1\det(Q^T)\det(Q) = 1, so (detQ)2=1(\det Q)^2 = 1, giving detQ=±1\det Q = \pm 1.

3. True.

The diagonal entry Sii=eiTSeiS_{ii} = e_i^T S e_i. Since ei0e_i \neq 0 and SS is positive definite, eiTSei>0e_i^T S e_i > 0.

4. True.

If U1U_1 and U2U_2 are upper triangular, then (U1U2)ij=k(U1)ik(U2)kj(U_1 U_2)_{ij} = \sum_k (U_1)_{ik}(U_2)_{kj}. For i>ji > j: (U1)ik=0(U_1)_{ik} = 0 when k<ik < i and (U2)kj=0(U_2)_{kj} = 0 when k>jk > j. Since i>ji > j, every term has either k<ik < i or k>jk > j (or both), so the sum is zero.


References

  1. Strang, Gilbert - Introduction to Linear Algebra (Chapters 1.4, 1.5, 1.7, 2.6)
  2. Trefethen, Lloyd N. and Bau, David - Numerical Linear Algebra (Chapters 20-23)
  3. Golub, Gene H. and Van Loan, Charles F. - Matrix Computations (Chapters 3-4)