Numerical Linear Algebra PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document provides a foundational introduction to numerical linear algebra, covering topics such as linear algebraic equations, matrices, system solvability, and Gaussian elimination. It is suitable for undergraduate-level courses in mathematics, science, and engineering.
Full Transcript
1. Linear Algebraic Equations and Matrices Suppose that we have 3 equations and 3 unknowns as follows 𝑎11 𝑥1 + 𝑎12 𝑥2 + 𝑎13 𝑥3 = 𝑏1 𝑎21 𝑥1 + 𝑎22 𝑥2 + 𝑎23 𝑥3 = 𝑏2 𝑎31 𝑥1 + 𝑎32 𝑥2 + 𝑎33 𝑥3 = 𝑏3 (1.1) Thi...
1. Linear Algebraic Equations and Matrices Suppose that we have 3 equations and 3 unknowns as follows 𝑎11 𝑥1 + 𝑎12 𝑥2 + 𝑎13 𝑥3 = 𝑏1 𝑎21 𝑥1 + 𝑎22 𝑥2 + 𝑎23 𝑥3 = 𝑏2 𝑎31 𝑥1 + 𝑎32 𝑥2 + 𝑎33 𝑥3 = 𝑏3 (1.1) This is a called a linear system of equations and is one of the most widely studied and used technique in science and engineering. The systems can be written compactly as the following 𝑎11 𝑎12 𝑎13 𝑥1 𝑏1 𝑎 𝐴 = [ 21 𝑎22 𝑎23 ] , 𝑥 = [𝑥2 ] , 𝑏 = [𝑏2 ] 𝑎31 𝑎32 𝑎33 𝑥3 𝑏3 𝐴𝑥 = 𝑏 The general solution to the system involves inverting the matrix 𝑥 = 𝐴−1 𝑏 Which is a very costly operation for large matrix. 1.1. Solvability of Linear Systems We can write any linear system as 𝐴𝑥 = 𝑏 Where 𝐴 ∈ ℝ𝑚×𝑛 , 𝑥 ∈ ℝ𝑛 and 𝑏 ∈ ℝ𝑚. The solvability of such as system will fall into one of the three following categories a) The system may not admit any solution b) The system may have a single unique solution c) The system may have infinitely many solution To get an insight into the solvability of a system, we shall take a geometric perspective. We know that the matrix vector product 𝐴𝑥 = 𝑏 can be viewed as a linear combination of the columns of 𝐴 Page 1 of 14 with weights from 𝑥. Thus 𝐴𝑥 = 𝑏 is only solvable when 𝑏 ∈ 𝑐𝑜𝑙 𝐴. In a broad perspective, the shape of the matrix 𝐴 bears considerable information about the solvability of 𝐴𝑥 = 𝑏. First let us consider the case when the matrix 𝐴 is wide, i.e., 𝑛 > 𝑚. Each column in a vector in ℝ𝑚. Since 𝑛 > 𝑚, the 𝑛 columns of 𝐴 must be linearly dependent which implies that there exist some weights 𝑥0 ≠ 0 which satisfies 𝐴𝑥0 = 0. If we can solve 𝐴𝑥 = 𝑏 for some 𝑥 then 𝐴(𝑥 + 𝛼𝑥0 ) = 𝐴𝑥 + 𝛼𝐴𝑥0 = 𝑏 + 0 = 𝑏 for any 𝛼 ∈ ℝ. In summary, no wide system admits a unique solution. When 𝐴 is tall, i.e., 𝑚 > 𝑛, then its 𝑛 columns can not possibly span the larger dimensional ℝ𝑚. For this reason, there could be some 𝑏0 ∉ 𝑐𝑜𝑙 𝐴 so by definition 𝐴𝑥 = 𝑏0 cannot be solved exactly for any 𝑥. In summary, for every tall matrix 𝐴, there exists a 𝑏0 such that 𝐴𝑥 = 𝑏0 is not solvable. In this lecture we shall consider the linear systems which are square and thus admits only one unique solution (if it exists). We shall assume that the matrix 𝐴 is not singular i.e., 𝐴−1 exists which implies det 𝐴 ≠ 0. For the other cases, we need to use a technique known as least squares or pseudoinverse 𝐴+ which we shall cover in another lecture. 2. Gaussian Elimination (Naïve) This method is to solve an 𝑛 × 𝑛 system of linear equations 2.1. Forward Elimination of Unknowns he first phase is designed to reduce the set of equations to an upper triangular system. 𝑎11 𝑥1 + 𝑎12 𝑥2 + 𝑎13 𝑥3 + ⋯ + 𝑎1𝑛 𝑥𝑛 = 𝑏1 (1.2) 𝑎21 𝑥1 + 𝑎22 𝑥2 + 𝑎23 𝑥3 + ⋯ + 𝑎2𝑛 𝑥𝑛 = 𝑏2 (1.3) 𝑎𝑛1 𝑥1 + 𝑎𝑛2 𝑥2 + 𝑎𝑛3 𝑥3 + ⋯ + 𝑎𝑛𝑛 𝑥𝑛 = 𝑏𝑛 (1.4) The initial step will be to eliminate the first unknown, 𝑥1 , from the second through the 𝑛-th equations. To do this multiply eq 1.2 by 𝑎21 /𝑎11 to give Page 2 of 14 𝑎21 𝑎21 𝑎21 𝑎21 𝑥1 + 𝑎12 𝑥2 + ⋯ + 𝑎1𝑛 𝑥𝑛 = 𝑏 (1.5) 𝑎11 𝑎11 𝑎11 1 Now this equation can be subtracted from eq 1.3 to give 𝑎21 𝑎21 𝑎21 (𝑎22 − 𝑎12 ) 𝑥2 + ⋯ + (𝑎2𝑛 − 𝑎1𝑛 ) 𝑥𝑛 = 𝑏2 − 𝑏 (1.6) 𝑎11 𝑎11 𝑎11 1 Or, ′ ′ 𝑎22 𝑥2 + ⋯ + 𝑎2𝑛 𝑥𝑛 = 𝑏2′ The procedure is then repeated for the remaining equations. The new system is then 𝑎11 𝑥1 + 𝑎12 𝑥2 + 𝑎13 𝑥3 + ⋯ + 𝑎1𝑛 𝑥𝑛 = 𝑏1 (1.7) ′ ′ ′ 𝑎22 𝑥2 + 𝑎23 𝑥3 + ⋯ + 𝑎2𝑛 𝑥𝑛 = 𝑏2′ (1.8) ′ ′ ′ 𝑎32 𝑥2 + 𝑎33 𝑥3 + ⋯ + 𝑎3𝑛 𝑥𝑛 = 𝑏3′ (1.9) ′ ′ 𝑎𝑛2 𝑥2 + 𝑎𝑛3 𝑥3 + ⋯ + 𝑎𝑛𝑛 𝑥𝑛 = 𝑏𝑛′ (1.10) ′ For the foregoing steps, Eq. 1.2 is called the pivot equation and 𝑎11 is called the pivot coefficient or element. Now repeat the above to eliminate the second unknown from eq 1.9 through eq 1.10 𝑎11 𝑥1 + 𝑎12 𝑥2 + 𝑎13 𝑥3 + ⋯ + 𝑎1𝑛 𝑥𝑛 = 𝑏1 ′ ′ ′ 𝑎22 𝑥2 + 𝑎23 𝑥3 + ⋯ + 𝑎2𝑛 𝑥𝑛 = 𝑏2′ ′′ ′′ 𝑎33 𝑥3 + ⋯ + 𝑎3𝑛 𝑥𝑛 = 𝑏3′′ ′′ 𝑎𝑛3 𝑥3 + ⋯ + 𝑎𝑛𝑛 𝑥𝑛 = 𝑏𝑛′′ ′′ The procedure can be continued using the remaining pivot equations. The final manipulation in the sequence is to use the 𝑛 − 1 th equation to eliminate the 𝑥𝑛−1 term from the 𝑛th equation. At this point, the system will have been transformed to an upper triangular system 𝑎11 𝑥1 + 𝑎12 𝑥2 + 𝑎13 𝑥3 + ⋯ + 𝑎1𝑛 𝑥𝑛 = 𝑏1 (1.11) ′ ′ ′ 𝑎22 𝑥2 + 𝑎23 𝑥3 + ⋯ + 𝑎2𝑛 𝑥𝑛 = 𝑏2′ (1.12) ′′ ′′ 𝑎33 𝑥3 + ⋯ + 𝑎3𝑛 𝑥𝑛 = 𝑏3′′ (1.13) 𝑛−1 𝑎𝑛𝑛 𝑥𝑛 = 𝑏𝑛𝑛−1 (1.14) 2.2. Backward Substitution (𝑛−1) 𝑏𝑛 Equation 1.14 can now be solved for 𝑥𝑛 = (𝑛−1) 𝑎𝑛𝑚 This result can then be back substituted into the (𝑛 − 1) th equation for solve for 𝑥𝑛−1. The procedure which is repeated to evaluate the remaining 𝑥′s can be represented as the following formula (𝑖−1) 𝑏𝑖𝑖−1 − ∑𝑛𝑗=𝑖+1 𝑎𝑖𝑗 𝑥𝑗 𝑥𝑖 = (𝑖−1) 𝑓𝑜𝑟 𝑖 = 𝑛 − 1, 𝑛 − 2, … ,1 (1.15) 𝑎𝑖𝑖 2.3. Pivoting The primary reason that the foregoing technique is called “naive” is that during both the elimination and the back-substitution phases, it is possible that a division by zero can occur. The same problem may also occur if the pivot element is very close to zero. Therefore, before each row is normalized, it is advantageous to determine the coefficient with the largest absolute value in the column below the Page 3 of 14 pivot element. The rows can then be switched so that the largest element is the pivot element. This is called partial pivoting 2.4. Computing the Determinant After the elimination process is done, the determinant of the matrix 𝐴 is as follows 𝑛−1 (𝑖−1) det 𝐴 = ∏ 𝑎𝑖𝑖 𝑖=1 If the pivoting is used then 𝑛−1 (𝑖−1) det 𝐴 = (−1)𝑝 ∏ 𝑎𝑖𝑖 𝑖=1 Where 𝑝 represents the number of times that rows are pivoted. 3. LU Decomposition/Factorization Although the Gaussian elimination is a sound was to the solve the system 𝐴𝑥 = 𝑏 But it becomes inefficient when solving equations with same 𝐴 but different 𝑏. LU factorization methods separate the time-consuming elimination of the matrix 𝐴 from the manipulations of the right- hand side 𝑏. Thus, once 𝐴 has been “factored” or “decomposed,” multiple right-hand-side vectors can be evaluated in an efficient manner. 3.1. Overview of the LU Factorization Just as was the case with Gauss elimination, LU factorization requires pivoting to avoid division by zero. However, to simplify the following description, we will omit pivoting. In addition, the following explanation is limited to a set of three simultaneous equations. The results can be directly extended to 𝑛-dimensional systems. The equation 𝐴𝑥 = 𝑏 can be rearranged to give 𝐴𝑥 − 𝑏 = 0 (1.16) Suppose that 1.16 can be represented as an upper triangular system as 𝑢11 𝑢12 𝑢13 𝑥1 𝑑1 [ 0 𝑢22 𝑢23 ] [𝑥2 ] = [𝑑2 ] (1.17) 0 0 𝑢33 𝑥3 𝑑3 Recognize that this is similar to the manipulation that occurs in the first step of Gauss elimination. That is, elimination is used to reduce the system to upper triangular form. Equation 1.17 can also be expressed as 𝑈𝑥 − 𝑑 = 0 (1.18) Now assume that there is a lower diagonal matrix with 1’s on the diagonal, Page 4 of 14 1 0 0 𝐿 = [𝑙21 1 0] (1.18) 𝑙31 𝑙32 1 Which has the property that 𝐿𝑈𝑥 − 𝐿𝑑 = 𝐴𝑥 − 𝑏 (1.19) 𝐿𝑈 = 𝐴 (1.20) 𝐿𝑑 = 𝑏 (1.21) A two-step strategy can be summarized as follows 1. LU Factorization step: 𝐴 is factored into lower and upper triangular matrices 2. Substitution step: 𝐿 and 𝑈 are used to determine a solution 𝑥 for a right-hand side 𝑏. This step itself consists of two steps. First, Eq. 1.21 is used to generate an intermediate vector 𝑑 by forward substitution. Then, the result is substituted into Eq. 1.17 which can be solved by back substitution for 𝑥. Just as partial pivoting was necessary for Gaussian elimination, LU decomposition may also require partial pivoting. This is done with the help of the permutation matrix 𝑃. The permutation matrix is an identity matrix 𝐼𝑛 where we interchange the 𝑖-th row and 𝑗-th column and left multiplying a matrix 𝐴 by 𝑃 will switch the corresponding the row and right multiplying will switch the corresponding columns. 1. The elimination is represented as 𝑃𝐴 = 𝐿𝑈 2. Solving for 𝑑 is then 𝐿𝑑 = 𝑃𝑏 3. The solution for 𝑥 is simply 𝑈𝑥 = 𝑑 Page 5 of 14 Now we shall see how the Gaussian elimination can be used to compute the LU factorization 3.2. Gaussian Elimination for LU Factorization Although it might appear at face value to be unrelated to LU factorization, Gauss elimination can be used to decompose 𝐴 into 𝐿 and 𝑈. This can be easily seen for 𝑈, which is a direct product of the forward elimination. Recall that the forward-elimination step is intended to reduce the original coefficient matrix 𝐴 to the form 𝑎11 𝑎12 𝑎13 ′ ′ 𝑈=[ 0 𝑎22 𝑎23 ] (1.22) ′′ 0 0 𝑎33 which is in the desired upper triangular format. Though it might not be as apparent, the matrix 𝐿 is also produced during the step. This can be readily illustrated for a three-equation system, 𝑎11 𝑎12 𝑎13 𝑥1 𝑏1 𝑎 [ 21 𝑎22 𝑎23 ] [𝑥2 ] = [𝑏2 ] (1.23) 𝑎31 𝑎32 𝑎33 𝑥3 𝑏3 The first step in Gauss elimination is to multiply row 1 by the factor 𝑎21 𝑓21 = 𝑎11 and subtract the result from the second row to eliminate 𝑎21. Similarly, row 1 is multiplied by Page 6 of 14 𝑎31 𝑓31 = 𝑎11 and the result subtracted from the third row to eliminate 𝑎31. The final step is to multiply the modified second row by ′ 𝑎32 𝑓32 = ′ 𝑎22 ′ and subtract the result from the third row to eliminate 𝑎32 Now we have 𝐴 → 𝐿𝑈 where 𝑎11 𝑎12 𝑎13 ′ ′ 𝑈=[ 0 𝑎22 𝑎23 ] ′′ 0 0 𝑎33 and 1 0 0 𝐿 = [𝑓21 1 0] 𝑓31 𝑓32 1 4. Matrix Inverse The inverse of a matrix 𝐴 is defined as 𝐴𝐴−1 = 𝐴−1 𝐴 = 𝐼𝑛 One of the most important applications of the LU decomposition is computing the inverse of a matrix 𝐴 and this can be done in a column-by-column fashion, i.e., we solve the system 𝐴𝑥 = 𝑏 Using the LU decomposition once for 1 𝑏 = [⋮] 0 Then for 0 𝑏 = [ 1] ⋮ Page 7 of 14 4.1. Error Analysis and System Condition Aside from its engineering and scientific applications, the inverse also provides a means to discern whether systems are ill-conditioned. Three direct methods can be devised for this purpose: 1. Multiply the inverse by the original coefficient matrix and assess whether the result is close to the identity matrix. If not, it indicates ill-conditioning. 2. Invert the inverted matrix and assess whether the result is sufficiently close to the original coefficient matrix. If not, it again indicates that the system is ill-conditioned. 5. The Gauss-Seidel Iterative Method Iterative methods are an iterative way to approximate the solution to the linear system of equations. The Gauss-Seidel method is the most commonly used iterative method for solving linear algebraic equations. Assume that we are given a set of 𝑛 equations: 𝐴𝑥 = 𝑏 Where 𝐴 ∈ ℝ𝑛×𝑛 and 𝑥, 𝑏 ∈ ℝ𝑛. Suppose that for the ease of illustration we limit ourselves to 3 × 3 systems. Then the variables 𝑥1 , 𝑥2 , 𝑥3 are solved as (𝑗−1) (𝑗−1) (𝑗) 𝑏1 − 𝑎12 𝑥2 − 𝑎13 𝑥3 𝑥1 = (1.24) 𝑎11 (𝑗) (𝑗−1) (𝑗) 𝑏2 − 𝑎21 𝑥1 − 𝑎23 𝑥3 𝑥2 = (1.25) 𝑎22 (𝑗) (𝑗) 𝑏3 − 𝑎31 𝑥1 − 𝑎32 𝑥3 = (1.26) 𝑎33 (0) (0) (0) where 𝑗 and 𝑗 − 1 are the present and previous iterations. Where 𝑥1 , 𝑥2 , 𝑥3 are initialized randomly. Convergence can be checked as (𝑗) (𝑗−1) 𝑥𝑖 − 𝑥𝑖 𝜀𝑎,𝑖 = | (𝑗) | 100% ≤ 𝜀𝑠 (1.27) 𝑥𝑖 For the general case, and computational convenience, we can represent the whole process using a matrix- vector based approach which is usually the preferred way. Let the matrix 𝐴 be written in the following form 𝐴 =𝐿+𝑈 𝑎11 𝑎12 𝑎13 ⋯ 𝑎1𝑛 𝑎11 0 … 0 0 𝑎12 … 𝑎1𝑛 [𝑎21 𝑎22 𝑎23 ⋯ 𝑎2𝑛 ] = [𝑎21 𝑎22 … 0 ] + [0 0 … 𝑎2𝑛 ] 𝑎𝑛1 𝑎𝑛2 𝑎𝑛3 ⋯ 𝑎𝑛𝑛 𝑎𝑛1 𝑎𝑛2 … 𝑎𝑛𝑛 0 0 … 0 The system 𝐴𝑥 = 𝑏 can now be written as 𝐴𝑥 = 𝑏 (𝐿 + 𝑈)𝑥 = 𝑏 𝐿𝑥 + 𝑈𝑥 = 𝑏 𝐿𝑥 = 𝑏 − 𝑈𝑥 Page 8 of 14 Then for 𝑥 (0) initialized 𝑥 (𝑘+1) = 𝐿−1 (𝑏 − 𝑈𝑥 (𝑘) ) 𝑘 = 0, 1, 2 … There could be many possibilities to the stopping criteria (check for convergence) such as ‖𝑥 (𝑘+1) − 𝑥 𝑘 ‖ < 𝜀 𝐴𝑥 (𝑘+1) − 𝑏 < 𝜀 5.1. Convergence and Diagonal Dominance The Gauss-Seidel will converge if the matrix is diagonally dominant or the matrix is positive definite. The definition of a positive definite matrix is 𝑥𝐴𝑇 𝑥 ≥ 0 For any vector 𝑥 ∈ ℝ𝑛 − {0}. The convergence criteria for the diagonal dominance can be derived as following: Suppose that we have two equations represented by two functions 𝑢(𝑥, 𝑦) and 𝑣(𝑥, 𝑦). Gauss-Seidel will converge if 𝜕𝑢 𝜕𝑢 | | + | | < 1 (1.28) 𝜕𝑥 𝜕𝑦 𝜕𝑣 𝜕𝑣 | | + | | < 1 (1.29) 𝜕𝑥 𝜕𝑦 Equations 1.24-1.26 can be expressed as following for the case of two unknowns 𝑏1 𝑎12 𝑢(𝑥1 , 𝑥2 ) = − 𝑥 (1.30) 𝑎11 𝑎11 2 𝑏2 𝑎21 𝑣(𝑥1 , 𝑥2 ) = − 𝑥 (1.31) 𝑎22 𝑎22 1 The partial derivatives of these two functions are 𝜕𝑢 𝜕𝑢 𝑎12 = 0, =− 𝜕𝑥1 𝜕𝑥2 𝑎11 𝜕𝑣 𝑎21 𝜕𝑣 =− , =0 𝜕𝑥1 𝑎22 𝜕𝑥2 Which can be substituted into eq 1.28 and 1.29 to get 𝑎12 | | < 1 (1.32) 𝑎11 𝑎21 | | < 1 (1.33) 𝑎22 Rearranging yields, |𝑎11 | > |𝑎12 | Page 9 of 14 And |𝑎22 | > |𝑎21 | That says that the diagonal elements must be greater than the off-diagonal elements. This condition can be easily extended for the case of 𝑛 equations and 𝑛 unknowns. 𝑛 |𝑎𝑖𝑖 | > ∑|𝑎𝑖𝑗 | 𝑗=1 𝑗≠𝑖 6. The Matrix Eigen Analysis Problem The following equation describes the problem that we are trying to solve 𝐴𝑣 = 𝜆𝑣 (1.35) Where 𝑣 is called the eigenvector of 𝐴 and 𝜆 is the associated eigenvalue and in general 𝜆 ∈ ℂ. For an 𝑛 × 𝑛 matrix there are exactly 𝑛 eigenvectors and 𝑛 eigenvalues. Any square matrix 𝐴 can be diagonalized as following 𝐴 = 𝑄Λ𝑄 −1 Page 10 of 14 Where 𝑄 contains all the eigenvectors as its columns and Λ = 𝑑𝑖𝑎𝑔{𝜆1 , 𝜆2 , … 𝜆𝑛 }. This decomposition gives a nice way to compute 𝐴𝑝 as 𝐴𝑝 = 𝑄Λ𝑝 𝑄 −1 𝑝 𝑝 𝑝 Where Λ𝑝 ≝ 𝑑𝑖𝑎𝑔{𝜆1 , 𝜆2 , … , 𝜆𝑛 } Previously we dealt with the system of the following form: 𝐴𝑣 = 𝑏 Which are called nonhomogeneous systems because of the arbitrary vector 𝑏. On the other hand, the following system is called a homogeneous system 𝐴𝑣 = 0, 𝑣 ≠ 0 The engineering problems associated with eigen analysis is of the form (𝐴 − 𝜆𝐼)𝑣 = 0, 𝑣 ≠ 0 (1.36) For nontrivial solutions to be possible, the determinant of the matrix must equal zero: |𝐴 − 𝜆𝐼| = 0 (1.37) Expanding the determinant yields a polynomial in 𝜆 which is called the characteristic polynomial of 𝐴. The roots of this polynomial are the eigenvalues. Let us understand this for the case of two equations (𝑎11 − 𝜆)𝑣1 + 𝑎12 𝑣2 = 0 𝑎21 𝑣1 + (𝑎22 − 𝜆)𝑣2 = 0 Expanding for the determinant of the coefficient matrix gives 𝑎11 − 𝜆 𝑎12 | | = 𝜆2 − (𝑎11 + 𝑎22 )𝜆 − 𝑎12 𝑎21 𝑎21 𝑎22 − 𝜆 Which is the characteristic polynomial and the quadratic formula can then be used to solve for the two eigenvalues 𝜆1 and 𝜆2 𝜆1 (𝑎11 − 𝑎22 ) ± √(𝑎11 − 𝑎22 )2 − 4𝑎12 𝑎21 = 𝜆2 2 Now substituting 𝜆1 into eq 1.36 shall give us the first eigen vector 𝑣1 and for 𝜆2 it will give us the second eigen vector 𝑣2. 6.1. QR Decomposition and the Iteration Given a square 𝑛 × 𝑛 real symmetric matrix 𝐴, we can compute all the eigenvalues of 𝐴 (which are real) by decomposing 𝐴 = 𝑄𝑅 Where 𝑄 is an orthonormal matrix and 𝑅 is an upper triangular matrix. We shall discuss the decomposition without any proof. The projection of a vector 𝑎 on 𝑢 is defined as 〈𝑢, 𝑎〉 𝑃𝑟𝑜𝑗𝑢 𝑎 = 𝑢 〈𝑢, 𝑢〉 Page 11 of 14 Where 〈𝑥, 𝑦〉 = 𝑥 𝑇 𝑦 is the dot product of 𝑥 and 𝑦. Then we can compute an orthonormal basis from the columns of 𝐴 as following (𝑎𝑘 are the columns of the matrix 𝐴): 𝑢1 = 𝑎1 𝑘−1 𝑢𝑘 𝑢𝑘 = 𝑎𝑘 − ∑ 𝑃𝑟𝑜𝑗𝑢𝑗 𝑎𝑘 , 𝑢̂𝑘 = , 𝑘 = 2,3, … ‖𝑢𝑘 ‖ 𝑗=1 We can now express the columns of 𝐴 over our newly computed orthonormal basis as 𝐴 = 𝑄𝑅 where, 𝑄 = [𝑢̂1 , 𝑢̂2 , … , 𝑢̂𝑛 ] And, 〈𝑢̂1 , 𝑎1 〉 〈𝑢̂1 , 𝑎2 〉 … 〈𝑢̂1 , 𝑎𝑛 〉 0 〈𝑢̂2 , 𝑎2 〉 … 〈𝑢̂2 , 𝑎𝑛 〉 𝑅= 0 0 … 〈𝑢̂3 , 𝑎𝑛 〉 ⋮ ⋮ ⋮ [ 0 0 ⋯ 〈𝑢̂𝑛 , 𝑎𝑛 〉] Now the eigenvalues and eigenvectors of an SPD matrix 𝐴 can be computed according to the following iterative criteria called the QR iteration algorithm: 1. Begin with the initial matrix 𝐴0 = 𝐴 and 𝑉0 = 𝐼 2. Repeat for 𝑘 = 0, 1, 2 … until 𝐴𝑘 becomes an upper triangular matrix 3. Factorize 𝐴𝑘 = 𝑄𝑘 𝑅𝑘 4. Compute 𝐴𝑘+1 = 𝑅𝑘 𝑄𝑘 5. 𝑉𝑘+1 = 𝑉𝑘 𝑄𝑘 After the algorithm converges, the eigenvalues are on the diagonal of 𝐴 and the eigenvectors are the columns of 𝑉. 6.2. The Singular Value Decomposition (SVD) The singular value decomposition (SVD) of a matrix is a central matrix decomposition method in linear algebra. It has been referred to as the “fundamental theorem of linear algebra” because it can be applied to all matrices, not only to square matrices, and it always exists. Let 𝐴 ∈ ℝ𝑚×𝑛 with rank 0 ≤ 𝑟 ≤ min (𝑚, 𝑛). The rank 𝑟 of a matrix 𝐴 is defined as the number of linearly independent columns of 𝐴. A set of vectors 𝑥1 , 𝑥2 , … 𝑥𝑛 are linearly independent if ∑ 𝛼𝑗 𝑥𝑗 ≠ 0, 𝛼𝑗 ∈ ℝ 𝑗 𝑋𝛼 ≠ 0 Page 12 of 14 For example, if 𝑥1 = [4, 5, 2]𝑇 , 𝑥2 = [−2, 3, 1]𝑇 then for 𝛼1 = 2, 𝛼2 = −1 𝑥3 = 𝛼1 𝑥1 − 𝑥2 = [10, 7, 3]𝑇 Thus 𝑥3 is linearly dependent of 𝑥1 and 𝑥2 satisfying the equation 𝑥3 − 𝛼1 𝑥1 + 𝑥2 = 0 When a set of vectors are linearly independent, 𝑏 ∈ 𝑠𝑝𝑎𝑛{𝑥1 , 𝑥2 , … 𝑥𝑛 } for any 𝑏. Geometrically this means that any vector 𝑏 can be written as 𝑏 = ∑ 𝛼𝑗 𝑥𝑗 𝑗 This has a great impact on the solvability of a general linear system. The SVD (Singular Value Decomposition) of a rectangular matrix 𝐴 is a decomposition of the form 𝐴 = 𝑈Σ𝑉 𝑇 Where 𝑈 and 𝑉 with columns 𝑢𝑖 and 𝑣𝑖 are orthonormal. Σ is a matrix with Σ𝑖𝑖 = 𝜎𝑖 ≥ 0 and Σ𝑖𝑗 = 0 if 𝑖 ≠ 𝑗. The diagonal elements 𝜎𝑖 , 𝑖 = 1,2, … , 𝑟 are called the singular values, 𝑢𝑖 and 𝑣𝑖 are called left and right singular vectors respectively. By convention 𝜎1 ≥ 𝜎2 ≥ ⋯ 𝜎𝑟 ≥ 0. If 𝑚 > 𝑛 then the matrix has diagonal structure up to row 𝑛 and then consists of 0𝑇 row vectors from 𝑛 + 1 to 𝑚. If 𝑚 < 𝑛, then the matrix has diagonal structure up to column 𝑚 and columns that consist of 0 from 𝑚 + 1 to 𝑛 6.2.1. The Construction of Singular Value Decomposition (SVD) We know that any symmetric positive definite matrix 𝐴 can be decomposed as 𝐴 = 𝑄Λ𝑄 𝑇 To compute the SVD, we make use of the following two facts: Let 𝐴 ∈ ℝ𝑚×𝑚 be a square matrix, then the following two results are true: If 𝑣 is a unit length eigenvector of 𝐴𝑇 𝐴 with non-zero eigenvalue 𝜆, then 𝐴𝑣 is an eigenvector of 𝐴𝐴𝑇 with same eigenvalue 𝜆. Furthermore, the norm of 𝐴𝑣 is √𝜆. If 𝑢 is a unit length eigenvector of 𝐴𝐴𝑇 with non-zero eigenvalue 𝜆, then 𝐴𝑇 𝑢 is an eigenvector of 𝐴𝑇 𝐴 with the same eigenvalue 𝜆. Furthermore, the norm of 𝐴𝑇 𝑢 is √𝜆. From the above two results, it can be stated that the matrices 𝐴𝑇 𝐴 and 𝐴𝐴𝑇 have the same eigenvalues 𝜆1 , 𝜆2 , … 𝜆𝑚. If the 𝑚 orthonormal eigenvectors of 𝐴𝑇 𝐴 are 𝑣1 , 𝑣2 , … 𝑣𝑚 and the 𝑚 orthonormal Page 13 of 14 eigenvectors of 𝐴𝐴𝑇 are 𝑢1 , 𝑢2 , … 𝑢𝑚 each having same eigenvalues 𝜆1 , 𝜆2 , … 𝜆𝑚 then the following identity holds 𝐴𝑣𝑖 = √𝜆𝑢𝑖 , 𝑖 = 1, 2, … 𝑚 Theorem (Existence of SVD): Let the columns of the 𝑚 × 𝑚 matrix 𝑉 contain all the 𝑚 orthonormal eigenvectors of 𝐴𝑇 𝐴 and let Σ be an 𝑚 × 𝑚 diagonal matrix with diagonal entries containing the square roots of the corresponding eigenvalues. By convention, the columns of 𝑉 and Σ are ordered so that the singular values are in nonincreasing order. Then it is possible to find an 𝑚 × 𝑚 orthonormal matrix 𝑈 containing all the orthonormal eigenvectors of 𝐴𝐴𝑇 , such that the following holds 𝐴 = 𝑈Σ𝑉 𝑇 SVD of a Rectangular Matrix: Consider an 𝑚 × 𝑛 matrix 𝐴 with real entries. Such as matrix can always be factorized as following 𝐴 = 𝑈Σ𝑉 𝑇 Where 𝑈 is an 𝑚 × 𝑚 matrix with orthonormal columns containing the left singular vectors, Σ is an 𝑚 × 𝑛 rectangular “diagonal” matrix with diagonal entries containing the nonnegative singular values in decreasing order and 𝑉 is a 𝑛 × 𝑛 matrix with orthonormal columns containing the right singular vectors. The following are some important properties of the right and left singular vectors: 1. The 𝑚 columns of 𝑈 are referred to as the left singular vectors corresponding to 𝑚 eigenvectors of the 𝑚 × 𝑚 matrix 𝐴𝐴𝑇 2. The 𝑛 columns of 𝑉 which are the right singular vectors, corresponds to the 𝑛 eigenvectors of the 𝑛 × 𝑛 matrix 𝐴𝑇 𝐴. 3. The diagonal entries of the 𝑚 × 𝑛 matrix Σ contain the singular values, which are the square roots of the min(𝑚, 𝑛) largest eigenvalues of 𝐴𝑇 𝐴 or 𝐴𝐴𝑇. 4. By convention, the columns of 𝑈, 𝑉 and Σ are ordered by decreasing singular values. Computing SVD of a Rectangular Matrix: The columns of 𝑉 are the eigenvectors of 𝐴𝑇 𝐴, so they can be computed using algorithms like for example the QR iteration. Rewriting 𝐴 = 𝑈Σ𝑉 𝑇 as 𝐴𝑉 = 𝑈Σ, the columns of 𝑈 corresponding to non-zero singular values in Σ are the normalized columns of 𝐴𝑉 (here the 1 normalization factor is ). The remaining columns of 𝑈 satisfies 𝐴𝐴𝑇 𝑢𝑖 = 0 which can be computed √𝜆𝑖 using methods like the LU decomposition. The diagonal entries of the matrix Σ are computed as following , 𝑖=𝑗 Σ𝑖𝑗 = { √𝜆𝑖 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 Readings for this Chapter Chapter 3, section 3.1 of the book Numerical Algorithms. Chapter 9, 10 and 11 of the book Numerical Methods for Engineers. For section 6 of this lecture note, see chapter 4-6 of the book Numerical Algorithms. Page 14 of 14