IMG_2349.jpeg
Document Details

Uploaded by OverjoyedNobility1681
Full Transcript
# Lecture 11: Least Squares ## 1. Introduction The linear least squares problem is the most pervasively used problem in all of computational mathematics. ### Definition Given a matrix $A \in \mathbb{R}^{m \times n}$ and a vector $b \in \mathbb{R}^m$, the linear least squares problem seeks to fin...
# Lecture 11: Least Squares ## 1. Introduction The linear least squares problem is the most pervasively used problem in all of computational mathematics. ### Definition Given a matrix $A \in \mathbb{R}^{m \times n}$ and a vector $b \in \mathbb{R}^m$, the linear least squares problem seeks to find $x \in \mathbb{R}^n$ such that: $$ x = \arg \min_{x \in \mathbb{R}^n} ||Ax - b||_2 $$ where $||\cdot||_2$ is the Euclidean norm. **Note:** The linear least squares problem always has a solution. ## 2. Ordinary Least Squares (OLS) ### 2.1. Normal Equations **Theorem:** The vector $x \in \mathbb{R}^n$ is a solution to the linear least squares problem if and only if it satisfies the normal equations: $$ A^T Ax = A^T b $$ **Proof:** Let $r(x) = Ax - b$. We want to minimize $||r(x)||_2^2 = r(x)^T r(x)$. Taking the derivative with respect to $x$ and setting it to zero, we get: $$ \frac{d}{dx} r(x)^T r(x) = 2A^T(Ax - b) = 0 $$ which simplifies to the normal equations. ### 2.2. Solving the Normal Equations The normal equations can be solved by Cholesky decomposition if $A^T A$ is symmetric positive definite (SPD). **Algorithm:** 1. Compute $C = A^T A$. 2. Compute the Cholesky decomposition of $C$, i.e., $C = LL^T$. 3. Solve $Ly = A^T b$ for $y$ by forward substitution. 4. Solve $L^T x = y$ for $x$ by backward substitution. ### 2.3. Full Rank Assumption If $A$ has full column rank (i.e., $\text{rank}(A) = n$), then $A^T A$ is SPD, and the solution to the normal equations is unique. **Proof:** Suppose $A^T A x = 0$ for some $x \neq 0$. Then $x^T A^T A x = (Ax)^T (Ax) = ||Ax||_2^2 = 0$, which implies $Ax = 0$. Since $A$ has full column rank, the null space of $A$ contains only the zero vector. Therefore, $x = 0$, which is a contradiction. ### 2.4. QR Decomposition The QR decomposition is another method to solve the least squares problem. **Theorem:** If $A \in \mathbb{R}^{m \times n}$ with $m \geq n$ has full column rank, then there exists an orthogonal matrix $Q \in \mathbb{R}^{m \times m}$ and an upper triangular matrix $R \in \mathbb{R}^{n \times n}$ such that: $$ A = QR = Q \begin{bmatrix} R_1 \\ 0 \end{bmatrix} $$ where $R_1 \in \mathbb{R}^{n \times n}$ is upper triangular and invertible. **Solving Least Squares using QR:** Given $A = QR$, the least squares problem becomes: $$ \min_{x \in \mathbb{R}^n} ||Ax - b||_2 = \min_{x \in \mathbb{R}^n} ||QRx - b||_2 = \min_{x \in \mathbb{R}^n} ||Q^T(QRx - b)||_2 $$ Since $Q$ is orthogonal, $||Qy||_2 = ||y||_2$, so: $$ \min_{x \in \mathbb{R}^n} ||Rx - Q^T b||_2 = \min_{x \in \mathbb{R}^n} \left|\left| \begin{bmatrix} R_1 \\ 0 \end{bmatrix} x - \begin{bmatrix} b_1 \\ b_2 \end{bmatrix} \right|\right|_2 $$ where $b_1 \in \mathbb{R}^n$ and $b_2 \in \mathbb{R}^{m-n}$. Then, $x$ can be found by solving: $$ R_1 x = b_1 $$ using backward substitution. The minimum residual is $||b_2||_2$. ## 3. Regularization ### 3.1. Tikhonov Regularization Tikhonov regularization adds a penalty term to the least squares problem: $$ \min_{x \in \mathbb{R}^n} ||Ax - b||_2^2 + \lambda ||x||_2^2 $$ where $\lambda > 0$ is a regularization parameter. The normal equations for the Tikhonov problem are: $$ (A^T A + \lambda I)x = A^T b $$ The matrix $A^T A + \lambda I$ is always SPD for $\lambda > 0$, so Cholesky decomposition can be used to solve the regularized problem. ### 3.2. Choice of $\lambda$ The choice of $\lambda$ depends on the problem. A larger $\lambda$ leads to a more regularized solution (i.e., smaller $||x||_2$), while a smaller $\lambda$ leads to a solution closer to the ordinary least squares solution. Cross-validation can be used to choose an appropriate value for $\lambda$.