Linear Mathematics Past Paper Notes - MT2501 - 2024-2025 PDF
Document Details
2024
James Mitchell and Louis Theran
Tags
Summary
These are lecture notes from MT2501: Linear Mathematics, for the 2024–25 academic year. The notes cover concepts like matrices, vector operations, solving systems of linear equations, Gaussian elimination, and vector spaces. The authors are James Mitchell and Louis Theran.
Full Transcript
MT2501: Linear Mathematics — Version 0.4.0 James Mitchell and Louis Theran Semester 1, 2024–25 Contents 1 Matrices 5 1.1 The definition............
MT2501: Linear Mathematics — Version 0.4.0 James Mitchell and Louis Theran Semester 1, 2024–25 Contents 1 Matrices 5 1.1 The definition..................................... 5 1.2 Multiplying a matrix by a scalar........................... 8 1.3 What is a vector?................................... 8 2 Binary operations on matrices 10 2.1 Matrix addition.................................... 10 2.2 Matrix multiplication................................. 11 3 Other operations on matrices 15 3.1 The transpose of a matrix.............................. 15 3.2 Left and right invertible matrices.......................... 16 4 Systems of linear equations 20 4.1 Definition and first examples............................. 20 4.2 Linear systems as matrix-vector and vector equations............... 24 5 Gaussian elimination 27 5.1 Elementary row operations.............................. 27 5.2 Gaussian elimination — definition.......................... 29 5.3 Gaussian elimination — examples.......................... 29 5.4 Gauss-Jordan elimination — reduction to RREF.................. 30 6 Solving linear systems 36 6.1 EROs and solutions to Ax = b............................ 36 6.2 Solving homogeneous linear systems with row reduction.............. 37 6.3 Solving inhomogeneous linear systems with row reduction............. 39 6.4 Analysing solutions to a linear system — pivots.................. 41 6.5 General solutions................................... 44 6.6 Examples....................................... 46 7 Inverses of matrices 48 7.1 Finding inverses using row-reduction........................ 48 7.2 An alternative approach............................... 51 8 Vector Spaces 58 8.1 Definition and examples............................... 58 8.2 Subspaces....................................... 62 A Fields 64 B Proofs 66 1 List of Figures 1.1 Examples of vectors in R2 visualized as arrows................... 9 1.2 Example of adding two vectors in R2......................... 9 1.3 Example of scaling a vector in R2........................... 9 4.1 A plot of the system of linear equations in Example 4.1............... 21 4.2 A plot of the system of linear equations in Example 4.2............... 21 4.3 A plot of the system of linear equations in Example 4.3............... 22 4.4 A plot of the system of linear equations in Example 4.4............... 23 2 Change log This year we decided to update the notes for MT2501, and will only be releasing them incre- mentally over the course of the semester. We will publish the relevant parts of the notes before the lectures covering the material. We will also update the version number of the notes every time we change them, and a list of the changes will be provided below. v0.4.0 (14/10/2024) The following major changes were made in this version of the notes (there were also a number of other minor typos fixed): – Chapter 8 about vector spaces has been added. v0.3.0 (07/10/2024) The following major changes were made in this version of the notes (there were also a number of other minor typos fixed): – the definition of Gauss-Jordan elimination (Section 5.4) was changed to be the same as the definition given and used in the actual lectures; – Theorem 6.4 and Example 6.5 have been removed (they aren’t required); – Example 6.6 and Example 6.7 were improved with more explanation in the notes at the end; – The recipe was added to Example 6.11 for reading off one solution to an inhomoge- neous linear system from the augmented RREF matrix; – In Section 6.4 corollaries were added, stating that for a matrix to be left-invertible the number of rows must be at least the number of columns, and similar statements for right-invertible, and invertible matrices; – Example 6.25 was added; – The statement of Theorem 6.15 was corrected (in part (a) it used to say k > 0, when it should have said k ≥ 0); – Appendix B with some information about proofs and logic has been added; – Chapter 7 about inverses has been added. v0.2.0 (26/09/2024) The following major changes were made in this version of the notes (there were also a number of other minor typos fixed): – there was one question on Problem Sheet 2 that mysteriously disappeared, so there was a bit of confusion about the numbers of questions in some tutorials. This has been resolved. 3 – the theorem referred to in Question 7(b) on Problem Sheet 2 was Theorem 2.8 but should have been Theorem 2.10; – the steps in the row reductions in Examples 5.12 and 5.15 were not given in the previous version, they are now; – Chapter 6 (Solving linear systems) was added. v0.1.0 (20/09/2024) The following major changes were made in this version of the notes (there were also a number of other minor typos fixed): – we wrote diag(a11 ,... , ann ) to mean two different things in Section 1.1. This was fixed by only ever writing diag(A) to mean the diagonal entries of the matrix A; – Theorem 2.10 had a typo: the last column of B in the second displayed equation was indexed k but should have been indexed by n; – the definition of a left invertible matrix Definition 3.10 was the same as the definition of a right invertible matrix; – another example of a non-right invertible matrix was added to Section 3.2 immedi- ately after Example 3.9; – Chapter 5 (Gaussian elimination) was added. 4 Chapter 1 Matrices 1.1 The definition Definition 1.1 (Matrix). A rectangular array A of mn entries with m rows and n columns is called an m × n matrix. It has the form 1 2 ··· n 1 a11 a12... a1n 2 a21 a22... a2n 3 a31 a32... a3n ............ .... m am1 am2... amn The entry aij in the ith row and jth column is called the (i, j)th entry of A. We sometimes write A = [aij ]m×n or simply A = [aij ]. Example 1.2. Here’s an example of a 3 × 5 matrix: −2 2 0 −1 0 A = 0 1 −3 1 0. 0 0 0 1 0 The entries in the first row of this matrix are a11 = −2, a12 = 2, a13 = 0, a14 = −1, and a15 = 0. The rows of A are: h i h i h i −2 2 0 −1 0 , 0 1 −3 1 0 , 0 0 0 1 0 and the columns of A are: −2 2 0 −1 0 0 , 1 −3 , 1 , 0. 0 0 0 1 0 The entries of a matrix will always be assumed to come from a field (see Appendix A for the definition and more details). Throughout this course this field will be one of: the rational numbers Q = {p/q | p, q ∈ Z, q ̸= 0}; the real numbers R; or the complex numbers √ C = {a + bi | a, b ∈ R} (where i = −1). When we don’t want to specify which one, we will use F (for field). The elements of F will be called scalars (we explain why below). 5 Definition 1.3 (Column vectors). If n ∈ N is at least 1, then the set of n × 1 matrices with entries in F is denoted by F n. The elements v ∈ F n are called column vectors. We write a1 a2 n v= .. ∈ F. . an Example 1.4. The columns of the matrix A from Example 1.2 are column vectors: −2 2 0 −1 0 0 , 1 −3 , 1 , 0. □ 0 0 0 1 0 Example 1.5. If F = R in Definition 1.3 and n = 3, then we write x 3 R = y : x, y, z ∈ R. z Vectors are just a special kind of matrix. Some other special kinds of matrices include the following. Definition 1.6 (Square matrix). An m × n matrix A = [aij ] is square if m = n. The main diagonal of A is diag(A) = (a11 , a22 ,... , ann ). Example 1.7. The matrix A from Example 1.2 is not a square matrix. The matrix −1 0 1 B= 2 2 1 4 −1 2 is a square matrix, and diag(B) = (−1, 2, 2). Definition 1.8 (Diagonal matrix.). A square matrix such that every entry not on the main diagonal is 0 is called a diagonal matrix. In other words, every diagonal matrix is of the following form: a11 0 · · · 0 0 a22 · · · 0 A= ....... and diag(A) = (a11 ,... , ann ). ..... 0 0 · · · ann Equivalently, a square matrix A is diagonal if and only if aij = 0 for all i ̸= j. Example 1.9. The matrix B from Example 1.7 is not a diagonal matrix, because the entry b13 = 1 is not on the main diagonal and is not zero. The matrix 0 0 0 C = 0 0 0 0 0 −1 is a diagonal matrix and diag(C) = (0, 0, −1). 6 Definition 1.10 (Upper triangular matrix.). A square matrix such that every entry below the main diagonal is 0 is called an upper triangular matrix. In other words, every upper triangular matrix is of the following form: a11 a12 · · · a1n 0 a22 · · · a2n A= ....... and diag(A) = (a11 ,... , ann ). ..... 0 0 · · · ann Equivalently, a square matrix A is diagonal if and only if aij = 0 for all i ̸= j. Example 1.11. Every diagonal matrix is upper triangular. So, the matrix 0 0 0 C = 0 0 0 0 0 −1 from Example 1.9 is upper triangular too. Here’s another example of an upper triangular matrix: −2 2 0 0 0 0. 0 0 1 Definition 1.12 (Identity matrix). The n × n identity matrix In is a diagonal matrix of 1’s: 1 0 ··· 0 0 1 · · · 0 In = ......... ... 0 0 ··· 1 and diag(In ) = (1, 1,... , 1). Example 1.13. If n = 3, then 1 0 0 I3 = 0 1 0. 0 0 1 Definition 1.14 (Zero matrix). The m × n matrix with every entry equal to 0 is called the zero matrix and denoted by 0m×n : 0 0 ··· 0 0 0 · · · 0 0m×n = ... ... . ...... 0 0 ··· 0 Example 1.15. " # 0 0 0 0 0 0 0 0 02×4 = , and 04×2 = . 0 0 0 0 0 0 0 0 7 1.2 Multiplying a matrix by a scalar It is possible to combine a scalar and a matrix using multiplication. Definition 1.16 (Scalar multiplication). For an m × n matrix A = [aij ] and a scalar λ, the scalar multiple of A by λ is the matrix λA = [λaij ]. √ Example 1.17. If A is the matrix from Example 1.2 and λ = 2, then √ √ √ −2 2 0 −1 0 −2 2 2 2 0 − 2 0 √ √ √ √ λA = 2 0 1 −3 1 0 = 0 2 −3 2 2 0. √ 0 0 0 1 0 0 0 0 2 0 If λ = 0, then −2 2 0 −1 0 λA = 0 0 1 −3 1 0 = 03×5. 0 0 0 1 0 Here’s another example: " # " # 3 1 −1 −3 −1 1 −1 =. −1 0 4 1 0 −4 If λ = −1 in Definition 1.16 and A is a matrix, then we write −A = [−aij ] instead of −1A. We also refer to −A as the negative of A. Theorem 1.18. For any m × n matrices A and B over a field F and any scalars λ, µ ∈ F the following hold: (a) λ(µA) = (λµ)A; (b) 1A = A. Example 1.19. Suppose that A and B are the following 2 × 3 matrices: " # " # 1 0 0 0 0 2 A= , B=. −1 −3 −4 1 −2 0 λ = −2, and µ = 5. Then " #! " # " # 1 0 0 5 0 0 −10 0 0 λ(µA) = −2 5 = −2 =. −1 −3 −4 −5 −15 −20 10 30 40 Also " # " # 1 0 0 −10 0 0 (λµ)A = −10A = −10 = −10 −1 −3 −4 10 30 40 and so λ(µA) = (λµ)A. This is much as you’d expect, there are no “gotchas” here. 1.3 What is a vector? In some sense a vector is an arrow pointing in space, see Fig. 1.1. Vectors (and matrices more generally) of equal dimensions have two essential features: (a) you can add vectors together; see Fig. 1.2; (b) you can multiply a vector by a scalar; see Fig. 1.3. As you can see in Fig. 1.3: multiplying a vector by a scalar, has the effect of scaling it. This is why scalars are called scalars, they scale. Multiplying by a negative scalar changes the direction of a vector (to the opposite), and then scales it. 8 y " # −2 3 " # 1 2 x Figure 1.1: Examples of vectors in R2 visualized as arrows. y " # −2 3 " # " # " # 1 −2 −1 + = 2 3 5 " # 1 2 x Figure 1.2: Example of adding two vectors in R2. y " # " # 1 2 2 = 2 4 " # 1 2 x Figure 1.3: Example of scaling a vector in R2. 9 Chapter 2 Binary operations on matrices In this section we define the two fundamental operations (i.e. ways of combining) matrices: addition and multiplication. 2.1 Matrix addition Definition 2.1 (Sum). If A = [aij ] and B = [bij ] are m × n matrices over the same field F , then we define the sum A + B of A and B by a11 a12 · · · a1n b11 b12 · · · b1n a21 a22 · · · a2n b21 b22 · · · b2n A + B = [aij ] + [bij ] = ...... +... ...... .... ..... am1 am2 · · · amn bm1 bm2 · · · bmn a11 + b11 a12 + b12 · · · a1n + b1n a21 + b21 a22 + b22 · · · a2n + b2n =........ .... am1 + bm1 am2 + bm2 · · · amn + bmn = [aij + bij ]. (The sum is undefined when A and B have different shapes.) Example 2.2. " # " # " # −2 2 0 -1 1 3 2 -2 −2 + 1 2+3 0+2 -1 + -2 + = 0 0 1 −3 -1 0 1 −3 0 + -1 0+0 1+1 −3 + −3 " # −1 5 2 -3 = -1 0 2 −6 " # " # " # 3 1 −1 4 3 1 7 4 0 + =. −1 0 4 0 −5 2 −1 −5 6 Addition of matrices has many of the same properties as addition of scalars. Theorem 2.3. Suppose that A,B, and C are m × n matrices over the same field F. Then the following hold: (a) λ(A + B) = λA + λB; 10 (b) (λ + µ)A = λA + µA; (c) A + B = B + A (commutativity of addition); (d) (A + B) + C = A + (B + C) (associativity of addition); (e) A + 0m×n = 0m×n + A = A (additive identity); (f) A + (−A) = 0m×n (additive inverses). Proof. If you have looked at the definition of a field in Appendix A, then you’ll notice that (c), (d), (e), and (f) are essentially the same as (A1) to (A4) (except of course here the summands are ma- trices and in Appendix A the summands are scalars). Here’s a proof that (c) holds: (2.1) (A1) (2.1) A + B = [aij ] + [bij ] = [aij + bij ] = [bij + aij ] = [bij ] + [aij ] = B + A. The proofs of the other parts of the theorem are similar. 2.2 Matrix multiplication The second operation we define in this section is matrix multiplication. This is a bit more complicated than you might expect if you only just learned the definition of matrix addition. There’s a good reason for this, that we’ll discuss later in the course. Definition 2.4 (Matrix multiplication). Suppose A = [aij ] is an m × n matrix and B = [bij ] is an n × p matrix. The product AB = [cij ] is the m × p matrix whose (i, j)th entry is n X cij = aik bkj = ai1 b1j + ai2 b2j + · · · + ain bnj. k=1 If the number of columns of A is not equal to the number of rows of B, then the product AB is not defined. The values ai1 , ai2 ,... , ain are the entries in the ith row of A, and the values b1j , b2j ,... , bnj are the entries in the jth column of B. Here’s a picture: b1j h i b2j cij = ai1 ai2 · · · ain .. . . bmj Here’s an example of matrix multiplication with all the gory details. Example 2.5. Suppose that " # #" 0 −3 −1 −2 A= and B =. −4 0 2 1 We want to find: " # c11 c12 AB = [cij ] =. c21 c22 11 To find the entry c 1 1 in the product AB, we “multiply” row 1 of A by column 1 of B: " #" # 0 −3 −1 −2 AB =. −4 0 2 1 So, # " −6 c12 c1 1 = ( 0 × −1 ) + ( −3 × 2 ) = −6 and AB =. c21 c22 To find c 1 2 , “multiply” row 1 of A by column 2 of B: " #" # 0 −3 −1 −2 AB = −4 0 2 1 and so # " −6 −3 c1 2 = ( 0 × −2 ) + ( −3 × 1 ) = −3 and AB =. c21 c22 To find c 2 1 “multiply” row 2 of A by column 1 of B: " #" # 0 −3 −1 −2 AB = −4 0 2 1 and so " # −6 −3 c2 1 = ( −4 × −1 ) + ( 0 × 2 ) = 4 and AB =. 4 c22 Finally to find c 2 2 “multiply” row 2 of A by column 2 of B: " #" # 0 −3 −1 −2 AB = −4 0 2 1 and so # " −6 −3 c2 1 = ( −4 × −2 ) + ( 0 × 1 ) = 8 and AB =. 4 8 Here are some more examples with the details omitted. Example 2.6. 0 1 1 2 1 6 −3 4 0 −1 11 2 2 −1 0 0 0 3 1 −2 2 9 −7 10 = −2 0 −2 3 0 −4 0 −4 −2 −4 21 12 −1 1 0 −2 0 0 5 4 −1 −3 −6 −14 " # −2 0 " # 3 2 0 −2 6 2 3 = 5 1 0 −8 3 −3 2 2 1 3 1 7 1 2 −1 −1 = −3. 1 2 0 2 −1 12 The product 2 1 3 " # 3 1 −1 −1 3 4 −1 0 4 0 2 5 is not defined because the number of columns of the matrix on the left must be the same as the number of rows of the matrix on the right. In this example the left matrix has 3 columns and the right matrix has 2 rows. There are other ways of thinking about and computing the product of two matrices. Example 2.7 (Matrix-vector multiplication mixes columns). We can check using the definition that " # 2 " # 3 1 −1 5 −1 =. −1 0 4 −2 0 We also have " # " # " # " # 3 1 −1 5 2 + −1 + 0 =. −1 0 4 −2 Roughly speaking, one interpretation of the above is that the result of multiplying a matrix by a vector is a weighted sum of the columns. This holds in general. Theorem 2.8. Let A = [aij ] be an m × n matrix with entries in the field F. Then x1 a11 a12 · · · a1n x1 a11 a1n x2 a21 a22 · · · a2n x2 a21 a2n A.. = ...... .. = x1 .. + · · · + xn .. ... . .... . . . xn am1 am2 · · · amn xn am1 amn Example 2.9. If " # 2 1 3 3 1 −1 A= and B = −1 3 4. −1 0 4 0 2 5 Again from the definition: " # 2 1 3 " # 3 1 −1 5 4 8 AB = −1 3 4 =. −1 0 4 −2 7 17 0 2 5 Using Theorem 2.8 we can compute the product AB column-by-column in B. We computed the first column of AB in Example 2.7. The second column of AB is obtained by multiplying A by the second column of B: " # 1 " # " # " # " # 3 1 −1 3 1 −1 4 3 = 1 +3 +2 = −1 0 4 −1 0 4 7 2 and the third column is found in a similar way. 13 Theorem 2.10. Let A = [aij ] be an m × p matrix and let B = [bij ] be a p × n matrix over the field F. Then the kth column of the product AB = [cij ] is the product of A with the kth column of B: c1k a11 a12 · · · a1p b1k c2k a21 a22 · · · a2p b2k . =...... .. ... . . . ... . cmk am1 am2 · · · amp bpk If we denote the kth column of B by bk for every k, then b1k b2k h i bk = . and B = b1 b2 · · · bn , .. bpk and so the equations above can be written more concisely as: h i h i AB = A b1 b2 · · · bn = Ab1 Ab2 · · · Abn. We will denote by Mm×n (F ) the set of all m × n matrices over the field F. Here are the key properties of matrix multiplication. Theorem 2.11. The following hold: (a) (AB)C = A(BC) for all A ∈ Mm×n (F ), B ∈ Mn×p (F ), C ∈ Mp×q (F ) (associativity); (b) A(B + C) = AB + AC for all A ∈ Mm×n (F ), B, C ∈ Mn×p (F ) (distributivity); (c) (A + B)C = AC + BC for all A, B ∈ Mm×n (F ), C ∈ Mn×p (F ), (distributivity); (d) A(λB) = (λA)B = λ(AB) for all A ∈ Mm×n (F ), B ∈ Mn×p (F ), λ ∈ F ; (e) Im A = AIn = A for all A ∈ Mm×n (F ). Example 2.12 (Matrix multiplication is not commutative). Multiplication of matrices is not commutative. For example, suppose that " # " # 0 0 0 1 A= and B =. 0 1 0 0 Then "#" # " # " #" # " # 0 0 0 1 0 0 0 1 0 0 0 1 AB = = and BA = =. 0 1 0 0 0 0 0 0 0 1 0 0 Hence AB ̸= BA. Example 2.12 also shows that the product of two non-zero matrices can be a zero matrix. 14 Chapter 3 Other operations on matrices In Chapter 2 we showed how to combine two matrices into a new matrix, by addition or matrix multiplication. In this section we introduce a number of operations that can be applied to a single matrix to transform it into a new matrix. 3.1 The transpose of a matrix The first such operation is the transpose, which is defined as follows. Definition 3.1 (Transpose of a matrix). The transpose of an m × n matrix A = [aij ], denoted by AT , is the n × m matrix whose (i, j)th entry is aji. Example 3.2 (Transpose of a matrix). If A is the 2 × 3 matrix: " # −2 2 0 A= , −1 0 0 then the transpose AT of A is the 3 × 2 matrix: −2 −1 AT = 2 0 . 0 0 The rows of A h i h i −2 2 0 , −1 0 0 become the columns of AT −2 −1 2 , 0 , 0 0 Similarly, the columns of A become the rows of AT. Theorem 3.3 (Reversal rule). If A ∈ Mm×n (F ) and B ∈ Mn×p (F ), then: (AB)T = B T AT. Proof. Let A = [aij ]m×n , B = [bij ]n×p , AB = [cij ]m×p , (AB)T = [cji ]p×m , B T AT = [dij ]p×m. 15 We must show that cji = dij for all i and j. By the definition of the transpose (Definition 3.1) AT = [a′ij ]n×m and B T = [b′ij ]p×n , where a′ij = aji and b′ij = bji for all values of i and j. By the definition of matrix multiplication (Definition 2.4) applied to AB and B T AT : n X cij = aik bkj (3.1) k=1 X n X dij = b′ik a′kj = bki ajk. (3.2) k=1 k=1 So, by swapping i and j in (3.1): n X cji = ajk bki. (3.3) k=1 Therefore n n (3.2) X (M 1) X (3.3) dij = bki ajk = ajk bki = cji , k=1 k=1 as required. Another property of transposes is given in the next theorem. Theorem 3.4. If A, B ∈ Mm×n (F ), then: (A + B)T = AT + B T. We will meet various special families of matrices in this course, such as the following. Definition 3.5. A square matrix A is (a) symmetric if AT = A. (b) skew-symmetric if AT = −A. (c) idempotent if A2 = A. Question 3.1. For each of the above properties write down a matrix that has the property and a matrix that doesn’t. Question 3.2. Do we really need to assume that A is square in Definition 3.5? 3.2 Left and right invertible matrices Definition 3.6 (Right invertible matrix). Suppose that A ∈ Mm,n (F ). Then A is called right invertible if there exists B ∈ Mn,m (F ) such that the product AB of A and B is the m × m identity matrix Im (Definition 1.12), that is, AB = Im. The matrix B is called a right inverse for A. 16 Example 3.7 (A not right invertible matrix). It is not the case that every matrix is right invertible. For example, the 2 × 3 zero matrix 02×3 has no right inverses because " # a11 a12 " # 0 0 0 0 0 02×3 A = a21 a22 = = 02×2 0 0 0 0 0 a31 a32 for every 3 × 2 matrix A = [aij ]. Example 3.8 (Right invertible matrix). If " # 1 2 1 X= 0 1 1 and 2 3 −1 1 Y = 0 , 3 − 13 1 then "# 2 −1 " # 1 2 1 31 1 0 XY = 0 =. 3 0 1 1 1 0 1 −3 1 Hence X is right invertible and Y is a right inverse for X. Example 3.9 (Right invertible with multiple right inverses). Suppose that h i X= 1 0. Find all of the right inverses of X. Solution: Suppose that " # y11 Y = y21 is a right inverse for X. Then " # h i y h i h i 1 XY = 1 0 = y1 = 1 = I1. y2 So, if y2 ∈ F is any scalar, then " # 1 Y = y2 is a right inverse of X, and every right inverse of Y is of this form. Example. Show that " # 1 0 A= , 0 0 is not right invertible. 17 Solution: Suppose that " # b11 b12 B=. b21 b22 Then " #" # " # " # 1 0 b11 b12 b11 0 1 0 AB = = ̸ = = I2 0 0 b21 b22 0 0 0 1 Hence A is not right invertible. At this stage in the course, given a matrix A it is not clear whether or not such a B exists, or how you’d go about finding B if it did exist. This is intentional, and we’ll explore this more in ??. Definition 3.10 (Left invertible matrix). Suppose that A ∈ Mm,n (F ). Then A is called left invertible if there exists B ∈ Mn,m (F ) such that the product BA of B and A is the n × n identity matrix In (Definition 1.12), that is, BA = In. The matrix B is called a left inverse for A. Example 3.11 (Left invertible matrix). If X and Y are the matrices from Example 3.8, then we already showed in Example 3.8 that X is a left inverse for Y and so Y is left invertible. On the other hand, the matrix X is not left invertible, although why this is the case will only become clear later (??). If we combine right and left invertibility, we get the following definition. Definition 3.12 (Invertible matrix). Let A be a matrix. Then A is invertible if it is both left and right invertible. Theorem 3.13. If A ∈ Mm,n (F ) is an invertible matrix, then its right and left inverses B ∈ Mn,m (F ) and C ∈ Mn,m (F ), respectively, are unique and B = C. Proof. We start by showing that B = C. By the definition of right and left inverses (Defini- tion 3.6 and Definition 3.10), AB = Im and CA = In. Then 2.11(e) B = In B = (CA)B = C(AB) = CIm = C. For uniqueness, suppose that X ∈ Mn,m (F ) is a right inverse for A. Then AX = Im = AB, and so CAX = CAB. But then X = In X = CAX = CAB = In B = B and hence B is unique. The proof that C is unique is similar. Theorem 3.13 can be reformulated as follows. Corollary 3.14. If A ∈ Mm,n (F ) is an invertible matrix, then there exists a unique matrix A−1 ∈ Mn,m (F ) such that AA−1 = Im and A−1 A = In. What is less clear at this point is the following, which we will prove later in the course (??). Theorem 3.15 (Invertible matrices must be square). If A is an invertible matrix, then A is a square matrix and AA−1 = I = A−1 A. It is not the case that all square matrices have inverses. 18 Example 3.16 (Non-invertible matrix). Any non-square matrix is non-invertible by Theo- rem 3.15. If n ≥ 1, then the n × n zero matrix 0n×n is not right invertible (see Example 3.7). Hence by the definition of an invertible matrix (Definition 3.12) 0n×n is not an invertible matrix. Example 3.17 (Invertible matrix). The n × n identity matrix In is invertible for every n ≥ 1, because In In = In (which makes (In )−1 = In ). If an angle θ is fixed, then the matrix " # cos θ − sin θ A= sin θ cos θ is invertible because " #" # " #" # cos θ − sin θ cos −θ − sin −θ cos θ − sin θ cos θ sin θ =1 sin θ cos θ sin −θ cos −θ sin θ cos θ − sin θ cos θ " # cos2 θ + sin2 θ cos θ sin θ − sin θ cos θ = sin θ cos θ − cos θ sin θ sin2 θ + cos2 θ " # 1 0 =2 , 0 1 and similarly, " #" # " # cos −θ − sin −θ cos θ − sin θ 1 0 =. sin −θ cos −θ sin θ cos θ 0 1 Alternatively, you might notice that A rotates R2 around the origin by θ. Hence rotating by θ and then −θ has the same effect as doing nothing (i.e. rotating by 0 or applying the identity matrix). Theorem 3.18. If A and B are invertible n×n matrices, then A−1 and AB are also invertible, with (AB)−1 = B −1 A−1 and (A−1 )−1 = A. Proof. To show that (AB)−1 = B −1 A−1 , first note that AB · B −1 A−1 = A(BB −1 )A−1 = AIA−1 = AA−1 = I and similarly B −1 A−1 · AB = I. Hence B −1 A−1 is an inverse of AB and since inverses are unique it follows that (AB)−1 = B −1 A−1. The proof that (A−1 )−1 = A is similar. Corollary 3.19 (A product of invertible matrices is invertible). If A and B are invertible matrices, then AB is also invertible. Proof. This follows immediately from Theorem 3.18 because we showed that the inverse of AB is B −1 A−1 ! 1 Remember that cos −θ = cos θ and sin −θ = − sin θ. 2 Also remember that cos2 θ + sin2 θ = 1. 19 Chapter 4 Systems of linear equations Now that we’ve mastered the basics of matrices, in this section we move onto another of the main topics of this course, linear equations. Linear equations also lend their name to the title of this module “linear mathematics”! 4.1 Definition and first examples Let’s begin with some simple examples. Example 4.1 (No solutions). Suppose that we want to solve: x+y =2 x + y = −2. The aim when considering a system of linear equations is to find (one or all) values of x and y such that the equations hold, or to show that there are no solutions. This clearly has no solutions, since it’s not possible for x + y to be equal to both 2 and −2; see Fig. 4.1. □ Example 4.2 (No solutions). Suppose that we want to solve: x+y =2 x−y =0 y = −2. Then the second equation implies that x = −2, and the first equation implies that x = 4, which is impossible. Hence this system also has no solutions. □ Example 4.3 (Unique solution). The following is a system of linear equations: x−y =5 1 2 − y = −2 x This example is easy to solve, rearranging the equations yields y = −x + 5 y = 12 x + 2. See Fig. 4.3 for a plot. The fact that these equations describe lines is why they are called “linear”. This implies that −x + 5 = 21 x + 2 and so x = 2. It follows that y = −2 + 5 = 3, and so the (unique) solution is x = 2 and y = 3. This is also the point where the lines in Fig. 4.3 intersect (which given that we have the picture already would have been an easier way of reading off the solution). □ 20 y x + y = 2 x x + y = − 2 Figure 4.1: A plot of the system of linear equations in Example 4.1. y x + 0 y = = y 2 − x x y = −2 Figure 4.2: A plot of the system of linear equations in Example 4.2. 21 Figure 4.3: A plot of the system of linear equations in Example 4.3. Example 4.4 (Infinitely many solutions). If we want to solve: x − y = 2, then there are infinitely many solutions, every point in the line shown in Fig. 4.4 is a solution. □ Definition 4.5 (System of linear equations). Suppose that F is a field. Then a system of linear equations (or linear system for short) with m equations and n unknowns is: a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2............ am1 x1 + am2 x2 + · · · + amn xn = bm where aij , bk ∈ F are fixed scalar coefficients and the xl are variables. A solution to a system of linear equations is an assignment of the variables x1 , x2 ,... , xn to scalar values in F such that the equations hold. Definition 4.6 (Homogeneous linear system). A linear system of the form: a11 x1 + a12 x2 + · · · + a1n xn = 0 a21 x1 + a22 x2 + · · · + a2n xn = 0............ am1 x1 + am2 x2 + · · · + amn xn = 0 is called homogeneous. 22 y x + y = 2 x Figure 4.4: A plot of the system of linear equations in Example 4.4. Given a system of linear equations the aim is to find (one or all) solutions, or show that there are none. When m and n are small, you can do this using the kind of ad hoc or geometric argument from Example 4.3. When m and n get larger this becomes more or less impossible (maybe impractical is a better word). So, we require a more systematic means for trying to solve larger systems of linear equations. Some systems are easier to solve than others. Example 4.7. The following system of linear equations is not in a particularly easy to solve form: x1 −2x2 +12x3 = 1 x1 +x2 +4x3 = −1 −2x1 −x3 = 1. The next system of linear equations is easier to solve: x1 −2x2 +12x3 = 1 3x2 −16x3 = −2 5x3 = 1, You get the value of x3 immediately from the 3rd equation, then the value of x2 using the value of x3 and the 2nd equation, then the value of x1 using the 1st equation and the values of x2 and x3 : x3 = 15 , x2 = −2+16x 3 3 = 25 , x1 = 1 + 2x2 − 12x3 = −3 5. (This process is called back substitution.) The next system of linear equations is even easier to solve: −3 x1 = 5 2 x2 = 5 □ 1 x3 = 5. 23 The three systems of equations in Example 4.7 are all equivalent in the sense that x1 , x2 , x3 ∈ F is a solution to any of these systems if and only if it is a solution to all the other systems. We will discuss this more later in Corollary 6.2. 4.2 Linear systems as matrix-vector and vector equations You might have noticed that we used the notation aij , bk , and xl in the definition of a linear system (Definition 4.5). We also used this notation extensively in Chapters 1 to 3. This is not a coincidence! Example 4.8. We can write x1 −2x2 +12x3 = 1 1 −2 12 x1 1 x1 +x2 +4x3 = −1 as 1 1 4 x2 = −1 , −2x1 −x3 = 1 −2 0 −1 x3 1 and we are back to matrices again. □ Suppose that F is a field. Then the system of linear equations: a11 x1 + a12 x2 + · · · + a1n xn = b1 a21 x1 + a22 x2 + · · · + a2n xn = b2............ am1 x1 + am2 x2 + · · · + amn xn = bm can be written as a11 a12 · · · a1n x1 b1 a21 a22 · · · a2n x2 b2 . = . . ..... .. . .... .. .. am1 am2 · · · amn xn bm If we set A = [aij ] ∈ Mm,n (F ), x = [xl ] ∈ F n , and b = [bk ] ∈ F m , then the system becomes the matrix-vector equation: Ax = b. We refer to the matrix A as the coefficient matrix of the linear system. If the columns of A are a1 , a2 ,... , an ∈ F m , then we can write Ax = b as the vector equation: x 1 a1 + x 2 a2 + · · · + x n an = b by Theorem 2.8. Example 4.9. The three linear systems from Example 4.7 correspond to: −3 1 −2 12 x1 1 1 −2 12 x1 1 1 0 0 x1 5 1 1 4 x2 = −1 , 0 3 −16 x2 = −2 , 0 1 0 x2 = 25 . 1 −2 0 −1 x3 1 0 0 5 x3 1 0 0 1 x3 5 Matrices of the second and third form have special names: row echelon form (REF), and reduced row echelon form (RREF). □ To define REF and RREF we require the following notion. 24 Definition 4.10 (Pivot). If A = [aij ] is an m × n matrix with entries in a field F , then the pivot of a row in A is the first non-zero entry (if it exists) in that row reading from left to right. The row containing a pivot aij : h i ai1 ai2 · · · ain is called the pivot row ; and the column containing a pivot aij : a1j a2j . . . anj is called the pivot column. A row with every entry equal to 0 has no pivot. Every non-zero row has a unique pivot. Example 4.11 (Pivot, pivot row, pivot column). In the third row of the following matrix A = [aij ] the pivot is a34 = 10: 1 2 3 4 5 0 6 7 8 9 A= ; 0 0 0 10 11 0 0 0 0 12 the pivot is highlighted in black, its pivot row in red, and its pivot column in blue. □ Example 4.12. The pivots in each row of the following matrices are those highlighted: 1 2 3 4 1 2 3 4 5 1 −2 12 1 0 0 5 0 0 10 0 6 7 8 9 0 3 −16 , 0 1 0 , , and . □ 0 6 7 8 0 0 0 10 11 0 0 5 0 0 1 0 0 0 0 0 0 0 0 12 Definition 4.13 (Row echelon form). A matrix A is said to be in row echelon form (REF) if it has the following properties: (a) if a row has a pivot, then the pivot in the next row (if any) is strictly to the right; (b) zero rows, if any, occur as the last rows in the matrix.1 Example 4.14. The following matrices are in REF (the pivots are highlighted again): 1 2 3 4 5 1 −2 12 1 0 0 0 6 7 8 9 0 3 −16 , 0 1 0 , and 0 0 0 10 11 0 0 5 0 0 1 0 0 0 0 12 but the following are not in REF: 1 2 3 4 0 0 5 5 " # 1 0 0 0 0 10 0 0 1 −2 12 , , , and 0 0 1 . □ 0 6 7 8 1 0 0 3 −16 0 1 0 0 0 0 0 1 This actually follows from part (a)! 25 Definition 4.15 (Reduced row echelon form). A matrix is said to be in reduced row echelon form RREF if it is in REF and (a) every pivot is 1; and (b) every entry in every pivot column, except the pivots themselves, is 0. Example 4.16. The only RREF matrices in Example 4.14 is: 1 0 0 0 1 0 . 0 0 1 Here are some other matrices in RREF: 1 0 2 0 3 1 0 " # 1 0 −1 0 1 0 2 0 1 4 0 0 0 1 , , 0 1 2 0 , 0 1 −1 0 0 0 1 5 0 0 0 0 0 1 0 0 0 0 0 0 0 One reason the following REF matrices are not in RREF is highlighted below: 1 2 3 4 5 1 2 3 4 5 " # 0 6 7 8 9 0 1 7 8 9 1 −1 0 , ,. 0 0 0 10 11 0 0 0 1 11 0 1 0 0 0 0 0 12 0 0 0 0 1 Non-REF matrices can’t be in RREF by definition. □ One of the main aims of this course is to relate properties of the matrix A to solutions of the equation Ax = b. 26 Chapter 5 Gaussian elimination In this section we will provide an algorithm for solving (or showing there are no solutions) to arbitrary systems of linear equations. This algorithm is called Gaussian elimination. The input to the algorithm is an m × n matrix A and the output is a matrix in REF (or possibly a matrix in RREF, depending on when you stop). 5.1 Elementary row operations In this section we describe the individual steps in Gaussian elimination as follows. Definition 5.1 (Elementary row operations). Let A be an m × n matrix, and let r1 ,... , rm be the rows of A. Elementary row operations (EROs) on A are the following: (ERO1) interchanging of two rows ri , rj (ri ↔ rj , i ̸= j); (ERO2) multiplying a row by a non-zero scalar λ (ri → λri , λ ̸= 0); (ERO3) adding to one row a non-zero scalar multiple of another row (ri → ri + λrj , i ̸= j). Example 5.2. Suppose that 0 0 2 −1 0 A = 2 −2 0 3 0. −1 1 −3 0 1 Then applying r1 ↔ r3 (ERO1) to A gives us the matrix: −1 1 −3 0 1 2 −2 0 3 0 r1 ↔ r3. 0 0 2 −1 0 Applying r1 → 2r1 (ERO2) to the previous matrix gives us: −2 2 −6 0 2 2 −2 0 3 0 r1 → 2r1 0 0 2 −1 0 and applying r2 → r2 + r1 (ERO3) gives: −2 2 −6 0 2 0 0 −6 3 2 r2 → r2 + r1. 0 0 2 −1 0 27 Definition 5.3 (Row-equivalent matrices). If A, B ∈ Mm,n (F ), then A is row-equivalent to B if there is a sequence of EROs that transforms A into B. We write A ∼ B to indicate that A and B are row-equivalent. Example 5.4 (Row equivalent matrices). All of the matrices from Example 5.2 are row- equivalent: 0 0 2 −1 0 −1 1 −3 0 1 2 −2 0 3 0 ∼ 2 −2 0 3 0 −1 1 −3 0 1 0 0 2 −1 0 −2 2 −6 0 2 ∼ 2 −2 0 3 0 0 0 2 −1 0 −2 2 −6 0 2 ∼ 0 0 −6 3 2. 0 0 2 −1 0 What about 0 2 −1 −1 1 −3 0 2 −2 and 0 1 0 ? 0 3 0 −1 3 1 We will see in Theorem 5.14 how to answer this question (or more generally how to show that two matrices are or are not row-equivalent). (It will turn out that these matrices are not row-equivalent; see Example 5.15.) It turns out that matrices being row-equivalent can be reformulated as follows. We will not prove the next theorem at this point in the course, but by the end of the course you will be equipped to prove this theorem (and we might even prove it for you). You will also hopefully understand the true meaning of row-equivalence. Theorem 5.5 (Reformulation of row-equivalence). Suppose that A and B are m × n matrices over the field F. Then A ∼ B if and only if there exists an m × m invertible matrix C such that CA = B. Example 5.6. We showed in Example 5.2 that 0 0 2 −1 0 −1 1 −3 0 1 A= 2 −2 0 3 0 ∼ 2 −2 0 3 0 = B. −1 1 −3 0 1 0 0 2 −1 0 If we set 0 0 1 C = 0 1 0 , 1 0 0 then 0 0 1 0 0 2 −1 0 −1 1 −3 0 1 CA = 0 1 0 2 −2 0 3 0 = 2 −2 0 3 0 = B. 1 0 0 −1 1 −3 0 1 0 0 2 −1 0 28 Also 0 0 1 0 0 1 1 0 0 CC = 0 1 0 0 1 0 = 0 1 0 = I3 1 0 0 1 0 0 0 0 1 and so C is invertible, and C −1 = C. 5.2 Gaussian elimination — definition The input to Gaussian elimination is an m × n matrix A. The output of Gaussian elimination is an m × n REF matrix B that is row-equivalent to A. Recall from Definition 4.10 that the first non-zero entry in any row (if it exists) is called the pivot; its row is called the pivot row ; and its column the pivot column. Roughly speaking, Gaussian elimination consists of: (1) Choose a row ri whose pivot α is as far left as possible (there might be multiple such rows, choose any one of them). Apply (ERO1) (r1 ↔ ri ) to make the pivot row the first row in the matrix. (2) Use the pivot α to transform every entry below the pivot in the pivot column to 0 (using (ERO3) repeatedly). (3) Repeat steps (1) and (2) ignoring the first row. This isn’t super precise, and making it precise is unlikely to make it any clearer, so we’ll do some examples instead. Gaussian elimination is sometimes called row reduction. We will use the terms “Gaussian elimination” and “row reduction” interchangeably in this course. 5.3 Gaussian elimination — examples Example 5.7 (Gaussian elimination). Suppose that 0 2 4 −1 A = 1 1 1 3 . 1 0 −1 6 Find an REF matrix that is row-equivalent to A. Solution: We perform Gaussian elimination to find an REF matrix that is row-equivalent to A (the pivot is shaded grey, non-zero entries in the pivot column are highlighted in red): 0 2 4 −1 A=1 1 1 3 1 0 −1 6 1 1 1 3 ∼ 0 2 4 −1 r1 ↔ r2 1 0 −1 6 1 1 1 3 ∼ 0 2 4 −1 r3 → r3 − r1 0 −1 −2 3 29 1 1 1 3 ∼ 0 2 4 −1 r3 → r3 + 12 r2. 0 0 0 25 Hence one REF matrix row equivalent to A is 1 1 1 3 0 2 4 −1. 0 0 0 52 For any given matrix A, the REF matrix obtained from A by Gaussian elimination is not unique – it depends on the particular sequence of EROs chosen. Example 5.8 (REF is not unique). Let’s do a different sequence of EROs to the matrix 0 2 4 −1 A = 1 1 1 3 1 0 −1 6 from Example 5.7. Again we perform Gaussian elimination to find an REF matrix that is row- equivalent to A (the pivot is shaded grey, non-zero entries in the pivot column are highlighted in red): 0 2 4 −1 A=1 1 1 3 1 0 −1 6 1 0 −1 6 ∼1 1 1 3 r1 ↔ r3 0 2 4 −1 1 0 −1 6 ∼ 0 1 2 −3 r2 → r2 − r1 0 2 4 −1 1 0 −1 6 ∼ 0 1 2 −3 r3 → r3 − 2r2. 0 0 0 5 Hence A is row equivalent to both of the REF matrices: 1 1 1 3 1 0 −1 6 0 2 4 −1 and 0 1 2 −3. 0 0 0 25 0 0 0 5 5.4 Gauss-Jordan elimination — reduction to RREF We saw in Section 5.2 that it is possible to transform every matrix A into a matrix in REF using Gaussian elimination. In this section, we show how to transform a matrix into RREF using a simple extension of Gaussian elimination called Gauss-Jordan elimination. The ideas is to transform A into REF first, then to do the following: 30 (4) Start with the pivot in the last row (the one lowest down and most to the right in the matrix); (5) Multiply this pivot α by 1/α using (ERO2) to transform it into 1; (6) Use the pivot α = 1 in the last row to transform every entry above the pivot in the pivot column to 0 (using (ERO3) repeatedly). (7) Repeat steps (4), (5), and (6) ignoring the last row. Again it isn’t actually necessary to perform the steps exactly in the order we have defined above but let’s explore that later. Example 5.9. Suppose that 0 2 4 −1 A = 1 1 1 3 . 1 0 −1 6 Find an RREF matrix row-equivalent to A. Solution: In Example 5.7 we showed that the matrix is row-equivalent to the REF matrix 1 1 1 3 0 2 4 −1. 0 0 0 25 We perform the steps above: 1 1 1 3 0 2 4 −1 5 0 0 0 2 1 1 1 3 r2 → 12 r2 to get pivot 1 ∼ 0 1 2 − 12 r3 → 25 r3 to get pivot 1 0 0 0 1 1 1 1 0 r1 → r1 − 3r3 to get 0 in entry (1, 4) ∼ 0 1 2 0 r2 → r2 + 12 r3 to get 0 in entry (2, 4) 0 0 0 1 1 0 −1 0 ∼ 0 1 2 0 r1 → r1 − r2 to get 0 in entry (1, 2). 0 0 0 1 Hence an RREF matrix row-equivalent to A is: 1 0 −1 0 0 1 2 0. 0 0 0 1 Example 5.10. Let’s find an RREF matrix row-equivalent to 0 2 4 −1 A = 1 1 1 3 1 0 −1 6 31 again, this time using the REF matrix we found for A in Example 5.8: 1 0 −1 6 0 1 2 −3. 0 0 0 5 Here we go: 0 2 4 −1 1 0 −1 6 1 1 1 3 ∼ 0 1 2 −3 1 0 −1 6 0 0 0 5 1 0 −1 6 ∼ 0 1 2 −3 r3 → 15 r3 to get pivot 1 0 0 0 1 1 0 −1 0 r1 → r1 − 6r3 to get 0 in entry (1, 4) ∼ 0 1 2 0 r2 → r2 + 3r3 to get 0 in entry (2, 4). 0 0 0 1 Hence an RREF matrix row-equivalent to A is: 1 0 −1 0 0 1 2 0. 0 0 0 1 You might notice that this is the exact same matrix that we got in Example 5.7. This is not a coincidence! Theorem 5.11. If A is an m × n matrix with entries in a field F , then A is row-equivalent to a unique matrix in RREF; we denote this matrix by RREF(A). Example 5.12 (RREF for square matrices). Suppose that 1 4 −1 0 0 2 A= 2 0 0 and B = −1 0 2. −1 −1 0 −2 0 3 Find RREF(A) and RREF(B). Solution: We perform Gauss-Jordan elimination for A: 1 4 −1 1 4 −1 A= 2 0 0 ∼ 0 −8 2 r2 → r2 − 2r1 −1 −1 0 −1 −1 0 1 4 −1 ∼ 0 −8 2 r3 → r1 + r3 0 3 −1 1 4 −1 ∼ 0 −8 2 r3 → −3r2 − 8r3 0 0 2 32 1 4 −1 ∼ 0 −8 2 r3 → 21 r3 0 0 1 1 4 −1 ∼ 0 −8 0 r2 → r2 − 2r3 0 0 1 1 4 0 ∼ 0 −8 0 r1 → r1 + r3 0 0 1 1 4 0 ∼ 0 1 0 r2 → − 18 r2 0 0 1 1 0 0 ∼ 0 1 0 r1 → r1 − 4r2 0 0 1 = I3 So, RREF(A) equals the identity matrix I3. We also perform Gauss-Jordan elimination for B: 0 0 2 −1 0 2 B = −1 0 2 ∼ 0 0 2 r1 ↔ r2 2 0 3 2 0 3 −1 0 2 ∼ 0 0 2 r3 → −2r1 − r3 0 0 −7 −1 0 2 ∼ 0 0 2 r3 → 7r2 + 2r3 0 0 0 −1 0 2 ∼ 0 0 1 r2 → 12 r2 0 0 0 −1 0 0 ∼ 0 0 1 r1 → r1 − 2r2 0 0 0 1 0 0 ∼ 0 0 1 r1 → −r1 0 0 0 In this case, RREF(B) has an all-zero row. Example 5.12 generalises to RREF for arbitrary square matrices. Corollary 5.13. If A is an n × n (square) matrix, then either RREF(A) has an all-zero row or RREF(A) equals the identity matrix In. 33 Proof. In any REF matrix (including RREF), every row either contains a pivot or is all zeros. Similarly, in any REF matrix, each column contains the pivot of at most one row. Since RREF(A) is n × n, to avoid a row of zeros, each of its n rows, and so each of its n columns must contain a pivot. Since RREF(A) is upper triangular, each of its n diagonal entries are pivots, so non-zero. We can now answer the question as to which matrices are row-equivalent: Theorem 5.14. If A, B ∈ Mm,n (F ), then A ∼ B if and only if RREF(A) = RREF(B). Example 5.15 (Non-row equivalence matrices). In Example 5.4 we asked if the matrices 0 2 −1 −1 1 −3 A = 0 2 −2 and B = 0 1 0 0 3 0 −1 3 1 are row-equivalent? Solution: We compute RREF(A) and RREF(B) and compare the results: 0 2 −1 0 2 −1 A= 0 2 −2 ∼ 0 0 −2 r2 → 2r2 − 2r1 0 3 0 0 3 0 0 2 −1 ∼ 0 0 −2 r3 → 2r3 − 3r1 0 0 3 0 2 −1 ∼ 0 0 −2 r3 → −3r2 − 2r3 0 0 0 0 2 −1 ∼ 0 0 1 r2 → − 12 r2 0 0 0 0 2 0 ∼ 0 0 1 r1 → r1 + r2 0 0 0 0 1 0 1 ∼ 0 0 1 r1 7→ r1 2 0 0 0 and −1 1 −3 −1 1 −3 B= 0 1 0 ∼ 0 1 0 r3 → r1 − r3 −1 3 1 0 −2 −4 −1 1 −3 ∼ 0 1 0 r3 → 2r2 + r3 0 0 −4 34 −1 1 −3 ∼ 0 1 0 r3 → − 14 r3 0 0 1 −1 1 0 ∼ 0 1 0 r1 → r1 + 3r3 0 0 1 −1 0 0 ∼ 0 1 0 r1 → r1 − r2 0 0 1 1 0 0 ∼ 0 1 0 r1 → −r1. 0 0 1 Hence RREF(A) ̸= RREF(B) and so, by Theorem 5.14, A ̸∼ B. 35 Chapter 6 Solving linear systems In this chapter we will show how to use Gauss-Jordan elimination to solve linear systems. 6.1 EROs and solutions to Ax = b In this section, we will show how transforming a matrix A using EROs impacts the solutions to the equation Ax = 0 (TL;DR it doesn’t). The situation for equations of the form Ax = b is more subtle. We start with a simple theorem showing that left multiplication by an invertible matrix preserves solutions to Ax = 0. Theorem 6.1 (Left multiplication by an invertible matrix preserves solutions). Suppose that A ∈ Mm,n (F ) and suppose that the matrix B ∈ Mm,m (F ) is invertible. Then, for every x ∈ F n , BAx = 0 if and only if Ax = 0. Proof. (⇒) Suppose that x ∈ F n. If BAx = 0, then since B is invertible, we get Ax = Im Ax = B −1 BAx = B −1 0 = 0. (⇐) If x ∈ F n and Ax = 0, then BAx = B0 = 0. Next, we show that EROs preserve solutions to equations of the form Ax = 0. Corollary 6.2. Suppose that A, B ∈ Mm,n (F ), that A ∼ B, and that x ∈ F n. Then Ax = 0 if and only if Bx = 0. Proof. Since A ∼ B, by Theorem 5.5 there exists an invertible matrix C ∈ Mm,m (F ) such that CA = B. It follows by Theorem 6.1 that Ax = 0 if and only if CAx = C0 = 0 if and only if BAx = 0. Corollary 6.3. Suppose that A, B ∈ Mm,n (F ), that A ∼ B, and that b ∈ F m. If x ∈ F n and C ∈ Mm,m (F ) is an invertible matrix such that CA = B, then Ax = b if and only if Bx = Cb. Proof. It follows from Theorem 5.5 that the matrix C in the statement exists because A ∼ B. (⇒) Suppose that x ∈ F n is such that Ax = b. Then, multiplying both sides of Ax = b on the left by C, we get CAx = Cb. Hence Bx = CAx = Cb, as required. (⇐) Note that since CA = B and C is invertible, it follows that A = C −1 B. Suppose that x ∈ F n is such that Bx = Cb. Then left multiplying both sides of the equality Bx = Cb by C −1 yields C −1 Bx = C −1 Cb = Im b = b. Finally since A = C −1 B it follows that Ax = b. It is not immediately clear how to use the previous results to actually solve equations of the form Ax = 0 or Ax = b, or what the relationship to row reduction might be. That’s explained by the examples in the following sections. Theorem 6.4 and Example 6.5 in an earlier version of the notes weren’t actually needed, and so they have been removed. To preserve the later theorem, and exam- ple numbers, there is a jump in the numbers from Corollary 6.3 to Example 6.6. 36 6.2 Solving homogeneous linear systems with row reduc- tion We start with an example showing that solving homogeneous linear systems whose coefficient matrix is