Linear Algebra 2 Lecture Notes PDF

Linear Algebra 2 Understanding linear maps λ   1 0 0  0 λ 1 0   −1 A =S S   0 0 λ 0 ...

Linear Algebra 2 Understanding linear maps λ   1 0 0  0 λ 1 0   −1 A =S S   0 0 λ 0  0 0 0 µ Kathrin Hövelmanns Last updated on November 8, 2024 Faculty of Mathematics and Computer Science Welcome! Linear Algebra 1 introduced you to vector spaces and maps that respect (or ‘play nicely with’) their linear structures (hence called linear maps). LinA 2 takes a deeper dive concerning linear maps: We will discuss how to describe and deal with linear maps by representing them via matrices, and how to determine the nature of a linear map by analysing its matrix representation. (For instance, how to recognize that a matrix represents a reflection.) This involves techniques to switch between different vector bases; and to find a particularly useful basis (called eigenvector basis) by analysing characteristic proper- ties of the matrix (called eigenvalues). Eigenvectors allow to describe linear maps in a simpler and more economical way. We will use this to study interesting special classes of maps like orthogonal maps (maps that ‘preserve’ shapes, like reflections and rotations) and symmetric maps (which can be used to analyse quadratic curves and surfaces like ellipses and hyperboloids). How does this relate to other courses, and why do we care about linear maps? In general, eigen- vectors/-values characterise transformations and hence play an important role in all areas that apply linear algebra (from geology to quantum mechanics). Here are some application examples: Numerical methods. In many numerical methods, an important first step is to bring a matrix into a simpler form (like diagonal or triangular). This often uses the techniques developed in this course. You might encounter this in Introduction to Numerical Analysis or Numerical Linear Algebra. Analysing quadratic surfaces. Imagine you are handed a convoluted quadratic equation of which you are told that it describes some geometric object (like a curve or a surface), and your goal is to determine the object’s geometric properties. In Section 3.2, we will use the techniques developed in Chapter 2 to end up with a nice description of the object from which the most important geometric properties can be easily extracted. Evolutions of physical systems - differential equations. The concept of eigenvalues was devised in the context of dealing with systems of linear differential equations. It hence comes as no surprise that Theory and Application of Ordinary Differential Equations draws heavily on the techniques and language developed in this course. Quantum mechanics. In quantum mechanics, linear maps play a mayor role: physical quan- tities (like velocity and momentum) are described by linear maps in vector spaces of func- tions. In this context, eigenvalues/vectors have a special physical interpretation. Analysis 2, tensor calculus and differential geometry. In Analysis 2, linear maps appear as first–order approximations of differentiable multi-variable functions: Say f : Rn → Rn is such a function and p ∈ Rn is a vector. Then the matrix of this first–order approximation in p is the n ×n matrix whose (i , j )-th element is ∂ f i /∂x j. This is also useful to study curved spaces. A more general view - algebra. Vector spaces are only one example for sets with an addi- tional mathematical structure, and there are many more examples. (E.g., fields, rings, vector- space-like structures over rings,...) Studying maps that play nicely with the given structure will prove very useful also in these other examples. This theme will play a huge role in your courses on Algebra. As I will try to improve my lecture notes with each year, any constructive feedback is more than welcome! :) Contents 1 Setting the stage: fundamental definitions/concepts for linear maps 2 What happens in Chapter 1?..................................... 2 1.1 Recap: Linear maps (as seen in Linear Algebra 1), plus another nice example...... 3 1.1.1 The new example for linear maps: the quotient map................ 8 1.2 Recap: Connecting matrices and linear maps (as seen in Linear Algebra 1)....... 10 1.2.1 Computing a matrix inverse via row reduction.................... 13 1.3 Representation matrices for finite-dimensional spaces 6= Kn , using coordinates.... 15 1.3.1 Coordinates........................................ 16 1.3.2 Coordinate transformations from a matrix POV: The basis transition matrix.. 17 1.3.3 Generalising the map-matrix connection for spaces that aren’t Kn........ 20 1.3.4 How do base changes affect matrices of linear maps?................ 22 2 ’Simpler descriptions’ for maps/matrices 25 What happens in Chapter 2?..................................... 25 2.1 Diagonalisation via eigenvalues................................ 26 2.1.1 Computing eigenvalues and -spaces.......................... 28 2.1.2 Linear independence of eigenvectors......................... 37 2.1.3 Diagonalisation of a square matrix........................... 38 2.2 Using subspaces that are stable under the mapping (invariant subspaces)....... 39 2.2.1 Nice results for combinations of invariant subspaces................ 44 2.2.2 How to get invariant subspaces from non-real roots................ 45 2.3 Alternative perspectives on diagonalisability: dimension criteria and a matrix POV.. 48 2.3.1 ‘How many vectors do we get?’ - eigenvalue multiplicity as a diagonalisability criterion.......................................... 48 2.3.2 A matrix view on diagonalisability........................... 50 2.4 The ‘next best thing’ to being diagonal - Jordan Normal Form............... 51 2.4.1 Annihilating matrices using minimal polynomials................. 53 2.4.2 Generalised eigenvectors and -spaces......................... 56 2.4.3 Finding a basis that creates the Jordan Normal Form (Jordan bases)....... 59 2.4.4 Determining the JNF from the polynomials and the eigenspaces......... 64 3 Neat tricks for special linear maps: Orthogonal and symmetric maps 70 What happens in Chapter 3?..................................... 70 3.1 Orthogonal maps......................................... 71 3.1.1 Orthogonal matrices and their connection with orthogonal maps, Rn case... 75 3.1.2 Orthogonal matrices and their connection with orthogonal maps, general case 77 3.1.3 Classification of orthogonal maps........................... 80 3.2 Symmetric maps......................................... 90 3.2.1 Symmetric matrices and their connection with symmetric maps......... 91 3.2.2 We can always diagonalise symmetric matrices................... 92 3.2.3 Application: quadratic forms and analysing curves................. 96 4 Another application: linear differential equations 104 What happens in Chapter 4?..................................... 104 4.1 A solving recipe for linear differential equations....................... 105 4.2 Linear differential equations solving recipe, step 1: using diagonalisation........ 106 4.3 Linear differential equations solving recipe, step 2: tricks to find particular solutions. 110 A Notation used in this course 114 A.1 Table of frequently used notation................................ 114 A.2 Sets and maps........................................... 115 A.2.1 Sets............................................. 115 A.2.2 Maps............................................ 115 B Prerequisites: Vector spaces as seen in Linear Algebra 1 117 B.1 Fields and complex numbers.................................. 117 B.2 Vector spaces........................................... 120 B.2.1 Inner product spaces................................... 121 B.2.2 Quotient spaces...................................... 123 B.3 Matrices.............................................. 123 C Additional helpers 125 C.1 Identifying Greek letters..................................... 125 C.2 Geometric shapes: ellipse, hyperbola, parabola....................... 126 C.3 Trigonometric formulas..................................... 127 1 Chapter 1 Setting the stage: fundamental definitions/concepts for linear maps What happens in Chapter 1? " This course assumes familiarity with the knowledge on linear maps that was introduced in Linear Algebra I. To jog your memory (and simplify referencing), this chapter starts by summarising the most impor- tant concepts related to linear maps in Section 1.1. Something that is very important (and useful!) is the close connection between matrices and linear maps, recapped in Section 1.2. If you’re very comfortable with the material covered in Linear Algebra 1, you can skip sections 1.1 and 1.2 except for the following two parts: a new example (example 1.1.1) that mixes linear maps with quotient spaces; and a new remark (remark 1.2.4) that translates our result on particular solutions for vector equa- tions (Theorem 1.1.14) to the setting where we deal with systems of linear equations. After these recaps, we begin tackling new material in Section 1.3: we dig deeper into the connection between matrices and linear maps. Learning Goals of Chapter 1: You can explain and work with the concept of a linear map between vectors spaces – for concrete vector spaces like Rn , Cn , Kn as well as – for vector spaces that are handled as abstract objects; reason about the relation between linear maps and matrices; set up matrix representations of linear maps; analyse how the coordinates of a vector change when switching to another basis; 2 perform basis transformations to switch between representations for different bases and compute matrices that represent such a basis transition. Why do we care? The techniques developed in this chapter lay the groundwork for the rest of this course: In Chapter 2, our main goal will be to develop tricks to transform matrices into a ‘nicer’ (simpler) shape. On a high level, the main trick will be to find a certain basis for which the matrix transforms into something nice. This will use the observations we made in this section. 1.1. Recap: Linear maps (as seen in Linear Algebra 1), plus another nice example In this recap section, we recall the most fundamental concept of this course, linear maps. Besides recapping the notion itself, this section covers: that linear maps can be composed, added, multiplied (with each other and with scalars), and sometimes also inverted; the null space N and the range R of a linear map; how N and R relate to injectivity and surjectivity; how to specify a linear map on a given basis; and (very important) the dimension theorem. This section requires familiarity with the following concepts which you can look up in the ap- pendices when needed: fields (see Definition B.1.1), vector spaces over fields (see Definition B.2.1), bases (see Definition B.2.5), inner product spaces (see Definition B.2.7), matrix multiplication (see Definition B.3.3), and basic notions concerning maps like injectivity/surjectivity (see the table in Appendix A.2). Let V and W be K-vector spaces for some field K. (E.g., think of real vector spaces, meaning K = R, or complex vector spaces, meaning K = C.) A map A : V → W associates to each vector v in V exactly one vector A(v) (or short: A v) in W. Definition 1.1.1 (Linear map). Let V and W be two K-vector spaces. A map A : V → W is called K-linear (or simply linear) if for all vectors x, y ∈ V and all field elements λ (’scalars’) the following holds: i) A(0) = 0 0 is mapped to 0 ii) A(x + y) = A x + A y addition works linearly iii) A(λx) = λA x scalar multiplication works linearly We list some additional naming conventions: Isomorphism Linear map A : V → W that is bijective Endomorphism Linear map A : V → V that maps some vector space V into itself Automorphism Linear map from some vector space V into itself that is bijective (so it’s both an endomorphism and an isomorphism) 3 Equivalently, linearity can be defined through the following single requirement: A is linear if for all x, y ∈ V and all scalars α, β ∈ K, one has A(αx + βy) = αA x + βA y. This means that for linear maps, the image of a linear combination of two vectors is the same as the linear combination of their image vectors. In other words, we can switch the order of linearly combining and applying A. (We will use this in this course very, very often.) Example 1.1.2 (Orthogonal projections) Let V be a real inner product space, and let l = span〈a〉 be a line through the origin in V. The map that associates to each vector in V the orthogonal projection on l we will call P. If a has length one, then P is given by the formula: P v = (v, a)a. v Pv span〈a〉 O More generally, we saw that orthogonal projections on general linear subspaces of a real inner product space are linear. Example 1.1.3 (Trivial examples) For every vector space V , we have the so-called identity map (or just identity) I : V → V given by I v := v. It is very straightforward to verify that I is linear. When V and W are two vector spaces that are either both real or both complex, then the so-called zero map O : V → W given by O v := 0 is also linear. Example 1.1.4 (Multiplication with a matrix) Let A be a real m × n-matrix with entries in R. In this example, we treat elements from Rn and Rm as columns. We defined the map L A : Rn → Rm by L A (v) := Av , and saw that L A is linear. Note that the argument you saw in LinA I would have worked for any other base field K just the same, meaning if the matrix A has entries in an arbitrary field K instead of R, then the map v7→ Av Kn −−−−→ Km , is also linear. 4 Example 1.1.5 (Differentiation is a linear map.) The derivative D is defined by f 0 = D f. We saw that D behaves in a linear manner. " To call D a linear map, we still need to specify which vector space V we’re talking about. For D, we can choose V depending on context: for instance, D maps the space of polynomi- als into itself, and the space of differentiable functions on R into the space of all functions on R. Now that we’ve recalled some examples, we recall another very useful observation: we learnt that switching the order of linearly combining and using A does work for any number of vectors: Theorem 1.1.6. A : V → W is linear if and only if Ã ! n n αi v i = αi A v i X X A (1.1) i =1 i =1 for all vectors v 1 ,... , v n in V and all scalars α1 ,... , αn ∈ K. We will also recall that compositions, inverses, sums and scalar multiples of linear maps are again linear maps. Definition/theorem 1.1.7 (Composition or product). Let A : V → W and B : U → V be linear maps. Take a vector u ∈ U. First applying B to u ∈ U , we get B u ∈ V. Now applying A to B u, we get A(B )u ∈ W. In a diagram: B A U −−−−→ V −−−−→ W (1.2) u −−−−→ B u −−−−→ A(B u) The composition A B : U → W is defined by (AB )u := A(B u) for all u ∈ U. It is itself linear. " Cave: The notation AB suggests that the composition can be understood as a ‘product’ of A and B. Remember, however, that this is not a product in the sense as we know it, e.g., from fields: Say U , the ‘target space’ of B , also is the ‘starting space’ of A, meaning we can indeed compose A and B into AB. This does not mean that BA exists - we have no guarantee that the ‘target space’ of A also is the ‘starting space’ of B. And even if BA also exists, it is not necessarily equal to AB. Definition/theorem 1.1.8 (Inverse map). If the linear map A : V → W is a bijection, then the inverse map w7→v s. th. Av=w A−1 : W −−−−−−−−−−−−→ V is also linear. For maps that map from some vector space into itself (endomorphisms), we can define their pow- ers: 5 Definition 1.1.9 (Powers of endomorphisms). For A : V → V , we define A2 := AA. More generally, for n = 2, 3,..., we define An := An−1 A. We take for A0 the identity map I. If A is invertible (so if the map has an inverse A−1 ), then we define for positive integers n the map A−n as the composite ¢n map A−1. ¡ Definition/theorem 1.1.10 (Sum, scalar multiple). Let A : V → W and B : V → W be two linear maps. Then the sum A + B : V → W is defined by (A + B )v := A v + B v. If α is a scalar, then the scalar multiple αA : V → W is defined by (αA)v := α(A v). Both constructions are linear. Since linear maps are also ordinary maps, we can talk about the image of a vector or a subset of the vector space (notation: A(D) if A is the linear map and D the subset), and about the (complete) inverse image of a subset. We recall two very important linear subspaces that are associated to linear maps: the first is a generalization of the solution space of a homogeneous system of linear equations, the second is a generalization of the column space of a matrix. Definition/theorem 1.1.11 (Null space and range). Let A : V → W be a linear map. We define N , the null space (or kernel) of A, by N (A) := {v ∈ V | A v = 0} , and R, the range of A, by R(A) := {A v ∈ W | v ∈ V }. The range can also be denoted by A(V ). If the context is clear, we will simply write N /R instead of N (A)/R(A). N is a linear subspace of V and R is a linear subspace of W. R 0 N V W Notice that the null space is precisely the inverse image A−1 {0} of the origin under A. 6 Example 1.1.12 (Multiplication with a matrix) We first revisit example 1.1.4: The null space of the linear map L A consists of the vectors v that satisfy L A (v) = 0, so all solutions of the homogeneous system Av = 0. The range of L A consists of all vectors of the form L A (v). If the columns of A are a 1 ,... , a n , then this is the set {x 1 a 1 + · · · + x n a n | x 1 ,... , x n arbitrary} , which is the column space of A. So null space and range generalise two notions from the matrix world. Example 1.1.13 (Orthogonal projections) We also revisit example 1.1.2, the orthogonal pro- jection P on a line ` = span〈a〉: Geometrically, we see that the range R(P ) (so the collection of vectors that occur as an image of P ) is ` itself. The null space N (P ) consists of all vectors that are mapped to 0, so all vectors in `⊥ , the orthogonal complement. We now recall how injectivity, surjectivity and the inverse image are connected to the null space and the range: Theorem 1.1.14. Consider a linear map A : V → W. 1. N = { 0 } ⇔ A is injective. 2. R = W ⇔ A is surjective. 3. Let b ∈ R. Then there is a vector p satisfying A p = b; we say that p is a particular solution of the vector equation A x = b. Together all solutions of the vector equation A x = b are given by the set p + N := {p + w | w ∈ N } , the coset. In particular, the equation A x = b has exactly one solution if N = {0}. In particular, Theorem 1.1.14 tells us how to we get all solutions of a vector equation A x = b: find a single particular solution, and then add to that all solutions of the corresponding homogeneous equation A x = 0. If we know how A acts on a basis (meaning we know the images for all basis vectors), it is possible to determine the image of any vector simply from the basis vector images, due to the linearity of A, We also learnt that any basis-images combination yields a unique linear map: Theorem 1.1.15. Let V and W be vector spaces, let {a 1 ,... , a n } be a basis for V , and let w 1 ,... , w n be an n–tuple of vectors in W. Then there exists a unique linear map A : V → W satisfying A a i = w i for i = 1,... , n.... and that the range of a map hence can also be written in terms of basis vector images, by using the linear span (which was recalled in Definition B.2.3): 7 Theorem 1.1.16. Consider a linear map A : V → W with V = span〈a 1 ,... , a n 〉. Then R = span〈A a 1 ,... , A a n 〉. While it is not easy to give a comparably simple characterisation of N , we have at least the follow- ing important result that says something about the dimension: Theorem 1.1.17 (Dimension Theorem). Let A : V → W be a linear map with dim(V ) < ∞. Then dim(V ) = dim(N ) + dim(R). = R = 0 N V W Example 1.1.18 (Orthogonal projections) Revisiting orthogonal projections P on a sub- space W of the inner product space V (example 1.1.2) once more: We can easily convince ourselves that the range of P equals W , while the null space consists of all vectors perpen- dicular (orthogonal) to W , so N = W ⊥. The dimension theorem above therefore implies that dimV = dimW + dimW ⊥. Lastly, we recall a nice criterion that tells us when a linear map with finite-dimensional starting space is invertible: Theorem 1.1.19. Let V be a vector space with dim(V ) < ∞, and let A : V → W be linear map. A has an inverse if and only if dim(V ) = dim(W ) and N = {0}. If we do not feel like checking the condition N = {0} (for whatever reason), we can alternatively replace this condition with the condition that R = W. 1.1.1 The new example for linear maps: the quotient map In LinA 1, you learned about quotient spaces modulo a subspace U (recalled in Appendix B.2.2), where two vectors v and w lie in the same residue class [v] if v − w ∈ U. We will now revisit the following example from LinA 1 and connect it to linear maps: Example 1.1.20 Let V = R 3 and let U = span〈(1, 0, 0)〉. Then any two vectors (x, a, b) and (y, a, b) are equivalent modulo U , since their difference (x−y, 0, 0) ∈ U. So the residue class is 8 [(x, a, b)] = [(y, a, b)] = [(0, a, b)], meaning V /U looks a bit like the linear subspace spanned by e 2 , e 3. This will be made precise in Linear Algebra 2 using linear maps. This section will make the meaning of the emphasised sentence in the example more precise by proving a fundamental theorem about linear maps (Theorem 1.1.21 below). Originally, Theo- rem 1.1.21 was devised by Emmy Noether, a last-century mathematician who made many impor- tant contributions to abstract algebra and mathematical physics, but wasn’t allowed to officially teach for a long time due to her gender. (She did it anyways, under David Hilbert’s name!) We quickly introduce a concept needed for the theorem: Since for any linear map A : V → W , the null space N is a linear subspace of V , we can also define the respective quotient space V /N. Theorem 1.1.21 (Noether’s fundamental theorem on homomorphisms, vector space edition). For any linear map A : V → W , there exists a linear bijection between its range R and the quotient space V /N , where N is the null space of A. In other words, R and V /N are isomorphic (or even shorter: R ∼ = V /N ). How does this relate to example 1.1.20? To make the connection, we use as A the projection unto the subspace span〈e 2 , e 3 〉: (x,a,b)7→(0,a,b) P : R3 −−−−−−−−−−−→ R3. As we reminded ourselves in example 1.1.18, the null space of a projection is the orthogonal com- plement of the subspace unto which we project. So N (P ) = span〈e 2 , e 3 〉⊥ = span〈(1, 0, 0)〉 = U , and the range of P is R(P ) = span〈e 2 , e 3 〉. Plugging this into Theorem 1.1.21, we get that span〈e 2 , e 3 〉 is equivalent to V /U , up to an isomorphism. To see that Theorem 1.1.21 is true and how the isomorphism looks, we’ll go through the proof: (I interspersed the proof with our example to illustrate the main ideas.) Proof of Theorem 1.1.21. For any linear map A with null space N , we can define a map π that maps vectors to their residue classes: v7→[v] π : V −−−−−→ V /N. (Applying this to example 1.1.20: π maps any vector (x, a, b) to [(0, a, b)] ∈ V /U.) To practice a bit with the involved concepts, your homework will include convincing yourself that the map π always is linear, surjective (for the definition, see the table in Appendix A.2.2); and that its null space is exactly the null space of A. We can also define a ‘quotient version’ of A, by that we mean a map A that maps residue classes to the image of an element of the residue class: [v]7→A v A : V /N −−−−−−→ W. (Applying this to example 1.1.20: P maps any residue class [(0, a, b)] to the projection P (0, a, b) = (0, a, b).) 9 But wait, is our mapping rule well-defined? There can be many different elements v 1 , v 2 ,... in the same residue class [v] – which image A v 1 , A v 2 ,... do we pick for A[v]? Luckily, the definition still makes sense because all elements sharing the same residue class actually are mapped to the same image: If v and w are in the same residue class, then v − w ∈ N , meaning A(v − w) = 0. Due to the linearity of A, we get A(v) = A(w). So A is a well-defined map from V /N to R(A) – we can restrict the ‘target space’ of A to the actual range rather than using the whole space W – that is naturally surjective due to the restriction. (For example 1.1.20, this means restricting the target space to the subspace span〈e 2 , e 3 〉.) We get the following diagram: A V R(A) π A V /N What’s missing to verify that A is an isomorphism? We still need to show that A always is linear and injective (the definition of injectivity is also recalled in Appendix A.2.2) , which you will do in your homework. So A is a linear bijection from V /N to R(A), which proves Theorem 1.1.21. −1 Given that A is a bijection, there also exists an inverse A : R(A) → V /N. But how does the inverse look, concretely? Let w ∈ R(A). Since w is in the range of A, there always exists a pre- −1 image v ∈ A−1 (w), so we can set A (w) := [v]. 1.2. Recap: Connecting matrices and linear maps (as seen in Linear Algebra 1) In this recap section, we recall the close connection between matrices and linear maps that are of the form Kn → Km , where K is an arbitrary field. (E.g., you can think of K = R, or of K = C, but K can actually be any set that satisfies the rules for fields.) This section covers: how every such matrix is connected to a linear map, and vice versa; how to compute the respective matrix for a given map; the connection with systems of linear equations; 10 that the connection between maps and matrices behaves natural: we can easily switch be- tween maps and matrices even when dealing, e.g., with compositions or inverses This section requires familiarity with matrices and matrix multiplication which you can revisit in Appendix B.3 when needed. In example 1.1.4, we saw that any matrix A defines a linear map v7→ Av A : Kn −−−−→ Km. In a bit more detail, we also showed in LinA I by substitution that the columns of the matrix are the images of the standard base vectors: Let e 1 ,... , e n be the standard basis of Kn , and let c 1 ,... , c n be the columns of the matrix A. Then Ae 1 = Ae 1 = c 1 , Ae 2 = c 2 ,... , Ae n = c n. We will now recall that vice versa, we can also associate to each linear map a matrix that ‘repre- sents’ the map, meaning we found a new (and very important) use for matrices: Definition/theorem 1.2.1. Every linear map A : Kn → Km is determined by an m × n-matrix A, called the matrix of the linear map A, whose columns are Ae 1 ,... , Ae n. The image of the vector v under A can be computed as the matrix product Av. Proof. Let’s take an arbitrary linear map A : Kn → Km , and an arbitrary vector v ∈ Kn. We can always write a v as a linear combination of the standard basis vectors: v = x 1 e 1 + · · · + x n e n. Then A v = x1 Ae 1 + · · · + xn Ae n due to the linearity of A. Collecting all image vectors Ae 1 ,... , Ae n as columns in an m ×n–matrix A, we obtain that A v = x 1 Ae 1 +· · ·+x n Ae n equals the matrix product Av. Remark 1.2.2 (How to compute the matrix if we only know the images on some basis?) Assume we know how A acts on a basis α = a 1 ,... , a n for Kn , meaning we are given a description of A that only tells us what A a 1 ,... , A a n are. As mentioned in Section 1.1, such a description is sufficient to uniquely determine A, but how do we determine the corresponding matrix? If α is the standard basis, we can simply write down the matrix using the previous Defini- tion/theorem 1.2.1. If α is not the standard basis, we learnt in LinA how we can apply row reduction techniques to find the image vectors of the standard basis, which leads us back to the first case, as in the example below. Example 1.2.3 The linear map A : R3 → R3 is given by A(− 1, 0, 1) = (− 4, 2, 4), A(1, 1, 0) = (1, − 1, − 1), A(0, 1, 2) = (− 5, 4, 4). We put this data in three (vector, image)-rows (for clarity, we visually separate the vector 11 and its image with a vertical bar):  ¯  −1 0 1 ¯¯ −4 2 4  1 1 0 ¯ 1 −1 −1 .  ¯  ¯ 0 1 2 ¯ −5 4 4 Performing row-reduction, we obtain the (row-reduced) normal form  ¯  1 0 0 ¯¯ 2 1 −3  0 1 0 ¯ −1 −2 2  ,  ¯  ¯ 0 0 1 ¯ −2 3 1 in which the left rows are the standard basis vectors e 1 , e 2 , e 3 and the right rows are their im- ages Ae 1 , Ae 2 , Ae 3. Using Definition/theorem 1.2.1 (and translating the rows on the right- hand side back into columns), we find that the matrix of A is   2 −1 −2 A =  1 −2 3 .   −3 2 1 It is easy to verify correctness of our computations by using this matrix to compute the images of the vectors (− 1, 0, 1), (1, 1, 0) and (0, 1, 2). Remark 1.2.4 (Connection with systems of linear equations) Consider the following system of lin- ear equations a 11 x 1 + a 12 x 2 + · · · + a 1n x n = b 1............ a m1 x 1 +a m2 x 2 + · · · +a mn x n =b m with m × n–coefficient matrix A. Then A determines a linear map A : Kn → Km , so systems of equations can be written as a vector equation Ax = b. Let’s look how we can use what we learnt so far to find all solutions of the equation A x = b (if they exist): Theorem 1.1.14 tells us that we find all solutions to A x = b by finding one particular solution p, and adding to it all vectors from the null space, meaning the solution set is of the form p + N := {p + w | w ∈ N }. But does such a particular solution p exist? To decide this, we first identify the range R of A, using Theorem 1.1.16: R = span〈Ae 1 ,... , Ae n 〉 , which is exactly the column space of the matrix A. The equation A x = b has a solution (at least one) if and only if b ∈ R, so if and only if b is contained in the column space of the matrix. 12 Now that we’ve found a criterion to determine whether particular solutions exist, it only remains to quickly identify the null space N : N consists of all vectors v with A v = 0, so all solutions of the homogeneous system Ax = 0. The rest of this section briefly recalls that matrices of compositions/sums/inverses can be found in a very natural way: e.g., the matrix of a sum A + B of linear maps A and B is exactly the sum of their two representing matrices A and B. This makes it very easy to switch between matrix-POVs and map-POVs! Theorem 1.2.5 (Matrix of sums and scalar multiples). Let A and B be two matrices of the same size, and let A and B be the corresponding linear maps. Then A+B is the matrix of the sum A + B. For every scalar α ∈ K, the matrix of the linear map αA is αA. Theorem 1.2.6 (Matrix of compositions). Let A and B be two matrices such that the composi- tion AB exists, and let A and B be the corresponding linear maps. Then AB is the matrix of the composition map AB. Theorem 1.2.7 (Matrix of the inverse). Let A : Kn → Kn be a linear map with matrix A. Then the map A has an inverse if and only if its matrix A has an inverse, in which case the matrix for the map’s inverse A−1 is exactly the inverse A −1 of the matrix A. Remark 1.2.8 (Applying polynomials to maps/matrices) We have seen that we can add and com- pose linear maps, and that switching between maps and their matrices preserves addition and multiplication. In Definition 1.1.9, we defined powers of linear maps that map from some space V into itself. If we have such a linear map A : V → V with matrix A, and if we have a polynomial p(X ) = a n X n + a n−1 X n−1 +· · ·+ a 1 X + a 0 whose scalars a n , · · · , a 0 are taken from V ’s scalar field, we can hence also define the linear map p(A) := a n An + a n−1 An−1 + · · · + a 1 A + a 0 · I , whose matrix is p(A) := a n A n + a n−1 A n−1 + · · · + a 1 A + a 0 · I. 1.2.1 Computing a matrix inverse via row reduction Lastly, we recall how our accumulated knowledge was used to close a gap that had remained so far: When matrix inverses were introduced, it was mentioned without proof that AB = I implies B A = I (for square matrices A and B ). We will now revisit how we used the connection between linear maps and matrices to prove that AB = I implies B A = I : Theorem 1.2.9. Let A and B be n × n-matrices. If AB = I , then B A = I. 13 Proof. Instead of proving the claim by dealing with matrices, we’ll look at it through a linear maps lens: Let A, B : Kn → Kn be the linear maps corresponding to A and B. (Like at the beginning of 1.2.) Now, Theorem 1.2.6 tells us that the matrix for the composition AB is the composition of the maps A and B (so the identity matrix), hence AB = I. If we could prove that also BA = I is true, then we’d already be done, as we could translate this equation back into matrix terms and get that B A = I as desired. How do we prove that BA = I is true? Since AB = I , we have in other words that AB v = v for all v ∈ Kn , meaning that A is the left-hand inverse of B. For bijective maps, the right-hand inverse and the left-hand inverse is the same, meaning that also BA = I , and we’re done. The connection between maps and matrices also explains why/how we can do row reduction to compute the inverse of square matrices (if it exists), since we can compute the matrix of a map’s inverse by working with base vector images (and doing row reduction). We saw the following two examples: Example 1.2.10 (Computing the inverse via row reduction) Consider the matrix   1 0 3 A= 0 1 2 .   4 −3 8 For the corresponding linear map A, we have Ae 1 = (1, 0, 4), Ae 2 = (0, 1, −3) and Ae 3 = (3, 2, 8). For the inverse map A−1 , it follows that A−1 (1, 0, 4) = e 1 , A−1 (0, 1, 3) = e 2 and A−1 (3, 2, 8) = e 3. (Because, e.g., A−1 (1, 0, 4) = A−1 Ae 1 = e 1.) We can now determine the matrix of A−1 by computing A−1 e 1 , A−1 e 2 and A−1 e 3 , since we know that these vectors constitute the column vectors of A−1. We get A−1 e 1 , A−1 e 2 and A−1 e 3 from A−1 (1, 0, 4),... , A−1 (3, 2, 8) exactly as in example 1.2.3:  ¯  1 0 4 ¯¯ 1 0 0  0 1 −3 ¯ 0 1 0 .  ¯  ¯ 3 2 8 ¯ 0 0 1 Row reduction yields the normal form  ¯  1 0 0 ¯ 7 4 −2 ¯  0 1 0 ¯ − 9 −2 3   ¯   ¯ 2 2  0 0 1 ¯ − 32 −1 12 ¯ Translating the rows on the right-hand side back into columns gives us the inverse: 14 −9 −3   1 A −1 =  8 −4 −2 . 2 −4 3 1 14 Example 1.2.11 (Showing that there is no inverse via row reduction) We consider a similar matrix that differs only in the last row:   1 0 3 A= 0 1 2 .   4 −3 6 For the inverse A−1 of the corresponding linear map A, (provided it exists), we must have A−1 (1, 0, 4) = e 1 , A−1 (0, 1, − 3) = e 2 , A−1 (3, 2, 6) = e 3 , yielding the system  ¯  1 0 4 ¯¯ 1 0 0  0 1 −3 ¯ 0 1 0 .  ¯  ¯ 3 2 6 ¯ 0 0 1 Partial row reduction gives ¯   1 0 4 ¯¯ 1 0 0  0 1 −3 ¯ 0 1 0 .  ¯  ¯ 0 0 0 ¯ −3 −2 1 This shows that the columns of A are dependent, and that there hence is no inverse matrix. 1.3. Representation matrices for finite-dimensional spaces 6= Kn , using coordinates What will we do? So far, we saw that linear maps from Kn → Km can be described using matrices. (K was an arbitrary field – e.g., you can think of K = R, or of K = C, but K can actually be any set that satisfies the rules for fields.) What helped us there was that these spaces have standard bases, which allows us to identify the base images with the column vectors of the matrix. What happens if we have an abstract K-vector space instead of Kn , meaning we might not have such a nice standard basis? The result of Section 1.3 is that we still can come up with matrices representing the map! Our main idea will be: Pick a basis and translate the whole setting back to the Kn → Km case by working with coordinates. In more detail, we’ll analyse how the coordinates of a vector change when switching to another basis; connect linear maps to matrices for arbitrary vector spaces (depending on the choice of ba- sis); use the results to analyse how these matrices transform when switching to another basis. Why do we care? As mentioned in the ‘Why do we care?’ paragraph at the beginning of this chapter, we will use these techniques in the later chapters a lot, essentially whenever we find and use ‘nicer’ bases to bring matrices into a ‘nicer’ (simpler) shape. 15 1.3.1 Coordinates We start by recalling coordinates from LinA I: Definition 1.3.1 (Coordinates). Let V be an n-dimensional vector space with basis α = {a 1 ,... , a n }. Every vector v ∈ V can be expressed as a linear combination of the basis vectors in exactly one way: v = x1 a 1 + x2 a 2 + · · · + xn a n. The numbers x 1 ,... , x n are the coordinates of v with respect to the basis α and (x 1 ,... , x n ) is the coordinate vector of v with respect to the basis α. " Clearly, the coordinates depend on the choice of the basis α. Example 1.3.2 (Coordinates for the real-polynomial vector space) Consider the vector space V of real polynomials of degree at most 2, and take the polynomial p := 1 + 2x + 3x 2 (which is a vector in V ). We pick the basis α := {1, x, x 2 }. The α-coordinates of p are 1, 2, 3, so the α-coordinate vector of p is (1, 2, 3). Convince yourself that the map associating to each vector v the corresponding coordinate vector is a bijection. We make an even stronger observation: The following theorem tells us that mapping vectors to their coordinate vectors yields an invertible linear map. Definition/theorem 1.3.3. Let V be an n-dimensional vector space with basis α = {a 1 ,... , a n }. We will denote the map sending each vector v to its coordinates with respect to basis α also by α. Then α is an invertible linear map from V to Kn. With this notation, α(v) is the coordinate vector of the vector v ∈ V with respect to the basis α. Proof of Theorem 1.3.3. As we learnt in LinAI, the coordinates of the sum v 1 + v 2 are the sum of the coordinates of v 1 and of v 2 , and the coordinates of αv are precisely α times the coordinates of v. What happens if we switch from one basis α to another basis β? Then any vector v ∈ V corre- sponds to two different sets of coordinates: α(v) with respect to the basis α, and β(v) with respect to the basis β. How are the α-coordinates and the β-coordinates of v related? Starting with the α-coordinates (viewed as a vector in Kn ), we first apply the map α−1 which gives us v ∈ V , and then we apply the map β which gives us β(v) ∈ Kn. Definition 1.3.4 (Coordinate transformation (map)). Let α and β be bases of an n-dimensional vector space V. The map β α−1 : Kn → Kn is called the coordinate transformation (map) from α to β. The coordinate transformation β α−1 is itself a linear map: We already know that α and β are linear, hence linearity of β α−1 follows directly from the linearity of compositions/inverses. Like for any linear map, we can hence associate β α−1 with a corresponding matrix. 16 V ‘abstract vectors’ level α β βα−1 coordinate level n n K K 1.3.2 Coordinate transformations from a matrix POV: The basis transition matrix Given that the coordinate transformation is a linear map from Kn to itself, it corresponds to an n × n–matrix. We will use this matrix a lot in the following chapters, therefore we fix notation: Definition 1.3.5 (Transition matrix). Let α and β be bases of an n-dimensional vector space V. We call the n × n–matrix associated to the linear map β α−1 the transition matrix from basis α to basis β, and denote it by β S α. The following theorem states that - as one might expect - multiplication with the matrix β S α translates α- into β-coordinates, and gives a direct description of how the matrix looks, entry- wise: Theorem 1.3.6. Let α and β be bases of an n-dimensional vector space V , and let β S α be the basis transition matrix, i.e., the matrix of β α−1. Let x := α(v) be the α–coordinate vector of a vector v ∈ V. Then the β–coordinate vector of v is equal to the product β S α x. Furthermore, the columns of matrix β S α are the β–coordinate vectors of the α–basis vectors. Proof. To show that the product β S α x yields the β–coordinate vector of v, we can simply use that ? applying the map and taking the matrix product is equivalent: β S α x = (βα−1 )(x) = (βα−1 )(α(v)) = β(α−1 (α(v))) = β(v), where ? used associativity of map compositions. To show that the columns of β S α indeed are the β–coordinate vectors of the α–basis vectors, we recall that the columns of a matrix are the images of the standard base vectors e 1 ,... , e n under the corresponding linear map. The i -th column of β S α hence is equal to β S α e i = (βα−1 )(e i ). Since α−1 (per definition) maps coordinate vectors back into the respective linear combination of the base vectors in α, we have that α−1 (e i ) = a i , where a i is the i -th vector in the base α. Plugging this in, we get that β(α−1 (e i )) = β(a i ), the β–coordinate vector of a i. Example 1.3.7 (Basis transition matrix for real-polynomial vector space) We again con- sider the vector space V of real polynomials of degree at most 2. We pick the two bases α := {x − 1, x 2 − 1, x 2 + 1} and β := {1, x, x 2 }. We can easily express the basis vectors of α in the basis vectors of β: x − 1 = (−1)·1 + 1·x + 0·x 2 x 2 − 1 = (−1)·1 + 0·x + 1·x 2 x2 + 1 = 1·1 + 0·x + 1·x 2. 17 We therefore know the β–coordinates of the vectors of α. We can use them to give the basis transition matrix, since its columns are the β–coordinate vectors of the α–basis vectors:   −1 −1 1 βSα =  1 0 0 .   0 1 1 It is of course also possible to switch from β–coordinates to α–coordinates, which is done with the matrix α S β. Matrix α S β is the inverse of β S α : when using α S β after β S α , we switch the coordinate system back to the original one, so effectively we didn’t do anything to the vector. Hence, we get the following statement about the product α S β β S α of the two matrices: αSβ βSα = I, thus αSβ = β S α −1. Example 1.3.8 (Example 1.3.7, continued) So the transition matrix α S β is the inverse of matrix β S α. Computing the inverse, we find   0 2 0 −1 1 αSβ = βSα =  −1 −1 1 .  2 1 1 1 The columns of this matrix should consist of the α–coordinates of the base vectors of β. Let’s quickly verify this: The first column vector is (0, − 12 , 12 ), translating this back via α−1 corresponds to vector 0 (x − 1) − 12 (x 2 − 1) + 12 (x 2 + 1) = 1. X In the same way, we can verify that the second column consists of the α–coordinates of x and that the third column consists of the α–coordinates of x 2. X Computing the α–coordinates of some vector: As an example, let’s pick the vector v := 2x 2 − 3x + 4. The β-coordinates of this vector are (4, −3, 2). We transform them to α–coordinates using the transition matrix α S β :        4 0 2 0 4 −3  −3  = 1  −1 −1 1   −3  =  1        αSβ .   2    2  3 2 1 1 1 2 2 We verify the result: 1 3 −3(x − 1) + (x 2 − 1) + (x 2 + 1) = 2x 2 − 3x + 4. X 2 2 If a third basis γ enters the picture, we could transform α– to β–coordinates, and these subse- quently to γ–coordinates. The following theorem tells us that doing the β-detour is unnecessary and we can directly switch from α to γ: 18 Theorem 1.3.9. Let α, β and γ be bases of an n-dimensional vector space V , with respective basis transition matrices β S α , γ S α and γ S β. Then γSβ βSα = γSα Proof. The product of the two matrices is the matrix of their composition (remember Theorem 1.2.6). For the composition under consideration, we have (γβ−1 )(βα−1 ) = γ(β−1 β)α−1 = γα−1. " It is important to distinguish between calculating with vectors (so elements of the vector space V ) and calculating with coordinates (so sequences of elements from Kn ). In example 1.3.7, the separation between the two concepts was clear: the vectors were polynomials and the coordi- nate vectors were sequences of real numbers. We need to be more careful with this distinction if our vector space is Kn itself, as a sequence could either represent a vector from the vector space Kn or a sequence of coordinates. We note that this strict separation is less crucial if we talk about coordinates w.r.t. the standard basis ε = {e 1 ,... , e n }: (1, 2, 3) = 1e 1 + 2e 2 + 3e 3. While having Kn as the vector space itself might make it more complicated to separate between vectors and their coordinate vectors, the following example shows that Kn allows for a nice short- cut when you need to compute basis transition matrices. Example 1.3.10 (Computing a basis transition matrix in R3 ) Let’s consider the following two bases: α = {(1, 0, 2), (−1, 1, 0), (0, −2, 1)} ; β = {(0, 1, 1), (1, 2, −1), (1, 0, 1)}. We are looking for the transition matrix that transforms α– into β–coordinates, so β S α. Method like in example 1.3.7. We’ll first solve the problem directly, by finding a way to write each of the α-vectors as a linear combination of the β-vectors. This means having to solve three systems of equations in three unknowns:  ¯   ¯  0 1 1 ¯ 1 −1 0 1 0 0 ¯ 12 1 − 12 ¯ ¯  1 2 0 ¯ 0 1 −2  −row reduction ¯ 1 0 − 34   ¯    −−−−−−−−→   0 1 0 ¯ −4 . ¯  ¯  ¯ ¯ 5 3 1 −1 1 ¯ 2 0 1 0 0 1 ¯ 4 −1 4 Now the last three columns are the β-coordinates of the three vectors of α, hence 2 4 −2   1 βSα = −1 0 −3 . 4 5 −4 3 Method with shortcut. Our alternative method uses that in Kn , the matrix for switching to the standard basis ε is super-easy to compute: We know the standard-basis coordinates of both bases. E.g., the first vector of α is (1, 0, 2) = 1e 1 + 0e 2 + 2e 3 , meaning the first column of ε S α must be (1,0,2), so the first base vector from α itself, and so on. So, without any 19 computational work we already know the transition matrices ε S α and ε S β : 1 −1 0 0 1 1     εSα =  0 1 −2  , ε S β =  1 2 0 . 2 0 1 1 −1 1 To be as lazy as possible on our way to finding the α–to–β transition matrix, we now use that we can do a detour via basis ε: β S α = β S ε ε S α (due to Theorem 1.3.9). We’ll also use that switching the direction of the base transition is the same as taking the inverse: β S ε = ε S β −1. Combining both observations, we get the identity β S α = ε S β −1 ε S α. So, we first determine the inverse of ε S β and find −2 2 2   −1 1 βSε = εSβ =  1 1 −1 . 4 3 −1 1 We can now multiply this with ε S α from the right and get −2 2 2 1 −1 0 2 4 −2      1 1 βSα = 1 1 −1   0 1 −2  =  −1 0 −3 . 4 4 3 −1 1 2 0 1 5 −4 3 1.3.3 Generalising the map-matrix connection for spaces that aren’t Kn So far, we saw that linear maps from A : Kn → Km can be described using matrices. We will now generalise the connection: we will replace Kn /Km with arbitrary finite-dimensional K-vector spaces V /W. Our main idea will be to translate the whole thing back to the Kn → Km case by picking a basis and working with coordinates. So, let V be a finite-dimensional vector space with basis α, let W be a finite-dimensional vector space with basis β, and let A : V → W be a linear map. Consider the following diagram: A V W α β Kn Km βAα−1 To every vector v ∈ V , there corresponds a unique coordinate vector α(v). To A v, the image of v under map A, there corresponds a unique coordinate vector β(A v). 20 This diagram might look complicated at a first glance, but in fact, I already tricked you into using the coordinate concept in the last chapter! We did this in examples 1.3.7 and 1.3.8, where the bases were polynomials instead of vectors in Rn. The composite linear map βAα−1 : Kn → Km maps the coordinate vector α(v) to the coordinate vector β(A v). Since βAα−1 is a linear map between Kn and Km , we already know how to represent it with a matrix (remember Definition/theorem 1.2.1). This gives us a representation matrix for the linear map A on the coordinate level: Definition 1.3.11 (Matrix of a linear map). Let V , W , α, β, A be as above. We denote the matrix of the linear map βAα−1 by β A α and call it the matrix of A w.r.t. the bases α and β. If V = W and β = α, then we simplify notation by denoting the corresponding matrix by A α ; we call it the matrix of A w.r.t. basis α. In other words: The matrix A α maps the α–coordinates of a vector v to the α–coordinates of A v. We will spend a lot of time with the special case V = W and β = α during this course, but let us first say a few things about the general case. How does the matrix look? According to Definition/theorem 1.2.1, the columns of β A α are (βAα−1 )(e i ) = β(A a i ) , i = 1,... , n , meaning the i –th column consists of the β–coordinates of the image A a i of the i –th base vector in α. This indeed gives us a description of A on the coordinate level: for example, to find the image of a vector v ∈ V , we can determine the coordinate vector α(v) of v; multiply α(v) with the representation matrix β A α , yielding the coordinate vector of A v; and translate the coordinate vector of A v back to the corresponding vector in W. The last thing we’ll do before we look at some examples is to make sure that our new, more general view does not clash with our definition in the old, less general case where V = Kn and W = Km with their respective standard basis: Observation 1.3.12 (Connection with the matrix). If A : Kn → Km is a linear map, and α and β are the standard bases for these spaces, then β A α , the matrix of A w.r.t. these bases as defined in Definition 1.3.11, is just the matrix of A as defined in Section 1.2. In this case, the coordinate maps α and β both are the identity maps. Example 1.3.13 (Determining a vector image from the representation matrix) Say V is two- dimensional with basis α = {a, b}, and the linear map A : V → V has the matrix Ã ! 1 4 Aα = −2 3 w.r.t. α. The matrix tells us that A a = 1 · a − 2 · b (first column) and Ab = 4 · a + 3 · b (second 21 column). To compute the image of some vector v = λa + µb, we multiply A α with (λ, µ)> : Ã !Ã ! Ã ! 1 4 λ λ + 4µ = −2 3 µ −2λ + 3µ Hence the image of v is (λ + 4µ)a + (−2λ + 3µ)b. Example 1.3.14 (Representation matrix for differentiation.) Consider the vector space V of real polynomials of degree at most 2, and the linear map D : V → V defined by D p = p 0. We already saw that this is a linear map (remember example 1.1.5). Take for V the basis α = {1, x, x 2 }. The derivatives of the basis vectors are: D 1 = 0 = 0·1 + 0·x + 0·x 2 , D x = 1 = 1·1 + 0·x + 0·x 2 , D x 2 = 2x = 0·1 + 2·x + 0·x 2. The matrix D α is therefore   0 1 0 Dα =  0 0 2 .   0 0 0 To illustrate how we can now differentiate using the matrix D α , take the polynomial p(x) = 2x 2 − 3x + 5: The coordinate-vector of p w.r.t. α is (5, −3, 2) and        5 0 1 0 5 −3 D α  −3  =  0 0 2   −3  =  4 .        2 0 0 0 2 0 (−3, 4, 0) is the coordinate vector of 4x − 3 and this is indeed the derivative of p. 1.3.4 How do base changes affect matrices of linear maps? The rules for coordinate transformations allow us to switch between bases. The following obser- vation concerns how the different representation matrices are related to each other, and will be used in the following sections a lot: Theorem 1.3.15 (Effect of change of basis). Choose in a finite-dimensional space V two bases α and β, and suppose A : V → V is linear. Then Aβ = βSα Aα αSβ. In other words: To compute how the β–coordinates of a vector v are mapped to the β–coordinates of A v, we can first transform the β–coordinates of v into α–coordinates; 22 compute the α–coordinates of A v using the matrix A α (we’ll see in example 1.3.16 below that this can sometimes be easier than computing A β directly); and lastly, transform the result back to β–coordinates. Proof of Theorem 1.3.15. Per definition, A β is the matrix for the composed map βAβ−1 and A α is the matrix for the composed map αAα−1. We now use an ‘insert-1’-trick to relate the two com- posed maps to each other: We have (?) βAβ−1 = β(α−1 α)A(α−1 α)β−1 = (βα−1 )(αAα−1 )(αβ−1 ) , where (?) used that the factors α−1 and α cancel (so we inserted a ‘matrix-1’), and we afterwards reordered the brackets. We now get our claim by identifying βα−1 and αβ−1 with their matrices β S α and α S β and using that the matrix of a composition is the matrix product of its factors’ matrices (Theorem 1.2.6). Example 1.3.16 (Representation matrix for an orthogonal projection) We once more re- visit the projection example 1.1.2, but this time we make it less abstract by picking V := R2 with the standard inner product, and letting P be the orthogonal projection unto the line ` with equation 2x − 3y = 0. Our goal is to determine the matrix P ε of P w.r.t. the standard basis ε. Determining the co- ordinates of P e 1 and P e 2 directly, however, is a bit unwieldy. Therefore, we start by picking a basis α = {a 1 , a 2 } such that P has an easy description in terms of α-coordinates. We take a 1 = (3, 2) ∈ ` and a 2 = (2, −3) ⊥ ` and write the projection images in terms of the α-vectors: P a 1 = a 1 = 1 · a 1 + 0 · a 2 and P a 2 = 0 = 0 · a 1 + 0 · a 2. So the matrix P α is Ã ! 1 0 Pα =. 0 0 Second, we ‘translate P α into ε-coordinates’: Per definition, α maps vectors in V to their coordinate vectors w.r.t. α. In our special case where V = R2 , α is the map from R2 to R2 that simply translates ε-coordinates to α-coordinates (so, e.g., (3, 2) is mapped to (1, 0)), and we can identify its matrix with the basis transition matrix α S ε. Accordingly, the matrix for α−1 is ε S α. Like we already observed in example 1.3.10, computing these transition matrices is quite easy: Ã ! Ã ! 3 2 −1 1 3 2 εSα = and α S ε = ε S α =. 2 −3 13 2 −3 Using Theorem 1.3.15, we get Ã ! 1 9 6 Pε = εSα Pα αSε =. 13 6 4 Alternatively, the result could be obtained with the technique in remark 1.2.2: Put the infor- mation P a 1 = a 1 and P a 2 = 0 in ε-coordinates in the rows of a matrix Ã ¯ ! Ã ¯ ! 9 6 3 2 ¯¯ 3 2 row reduction 1 0 ¯¯ 13 13 ¯ −−−−−−−−−→ ¯ 6 4 2 −3 ¯ 0 0 0 1 ¯ 13 13 23 yielding µ ¶ 1 9 6 Pε =. 13 6 4 Example 1.3.17 (Representation matrix for differentiation.) We look back to example 1.3.14, where we gave the representation matrix D α w.r.t. basis α = {1, x, x 2 }. We will take a different basis, β = {x 2 − x, x 2 + 3, x 2 − 1}, and determine the matrix D β of D. We can now use: D β = βSα D α αSβ. Using x 2 − x = 0 · 1 − 1 · x + 1 · x 2 and so on, we get   0 3 −1 α S β =  −1 0 0 ,   1 1 1 so   0 −4 0 −1 1 βSα = αSβ =  1 1 1   4 −1 3 3 and   −8 −8 −8 1 D β = βSα D α αSβ =  1 2 2 .  4 7 6 6 As an exemplary sanity check, we look at p(x) = 2x 2 − 3x + 5: Since 2x 2 − 3x + 5 = 3(x 2 − x) + (x 2 + 3) − 2(x 2 − 1), p has β–coordinates (3, 1, −2), and        3 −8 −8 −8 3 −16  1   1 Dβ  1  =  1 2 2   1  =  1 .    4 4 −2 7 6 6 −2 15 These now should be the β–coordinates of the derivative of p(x): − 4(x 2 − x) + 14 (x 2 + 3) + 15 2 0 4 (x + 1) = 4x − 3 = p (x). X 24 Chapter 2 ’Simpler descriptions’ for maps/matrices What happens in Chapter 2? A major reason why linear maps between finite-dimensional vector spaces are well-understood is because they can be analysed using specific values, called eigenvalues. The theory of eigenvalues allows to describe such maps via matrices that have a pleasantly simple form. Eigenvalue theory was first developed to study the rotational motion of rigid bodies, but as it turns out, they have much wider range of applications: you can find them, e.g., in vibration analysis to detect equip- ment faults, stability analysis, atomic theory, quantum mechanics, and facial recognition. In Section 2.1, we introduce the theory of eigenvalues which is central to the course and will ac- company us throughout. We proceed by slightly generalising the theory in Section 2.2, where we look at subspaces that are mapped into itself (so-called ‘invariant’ subspaces) and see how they also help simplify matrices. Based on the knowledge we gathered in these two sections, we’ll de- velop some criteria in Section 2.3 that let us decide whether a quadratic matrix is diagonalisable. We finish this chapter with Section 2.4 that discusses the ‘next best thing’ to being diagonal, to deal with non-diagonalisable matrices. Learning Goals of Chapter 2: When we are finished with Chapter 2, you can work with eigenvalues: you can – restate the definition of eigenvalues/vectors/spaces; – state their fundamental properties; – demonstrate your understanding of these concepts, e.g., by giving (counter)exam- ples and deciding whether a concrete value/vector at hand is an eigenvalue/vector; – apply the discussed ‘algorithm’ to compute eigenvalues/vectors/spaces; work with invariant subspaces: you can – explain what invariant subspaces are; – how we can use them to bring matrices into a simpler form; and 25 – for linear maps A : V → V where V is real and the characteristic equation has a non–real root: how to get an invariant subspace from the root. decide if a quadratic matrix is diagonisable; deal with the Jordan Normal Form (JNF) of a matrix: you can – compute and work with the JNF of a matrix – reason about the JNF, using certain characteristic properties; – determine the relation between two matrices, using their JNFs; prove (simple) statements about all involved definitions/concepts. Why do we care? In conclusion, all sections help us bring matrices into simpler form. As men- tioned before, you might encounter the respective techniques throughout your studies, e.g., when studying numerical methods or differential equations. Chapters 3 and 4 serve as examples that illustrate how what we can use the techniques to simplify several mathematical problems. 2.1. Diagonalisation via eigenvalues What will we do? Let A : V → V be a linear map, mapping a finite-dimensional K- vector space V into itself. So far, we saw that for every choice of a basis α for V , the map A is determined by a matrix A α. We also saw the connection between the two matrices A α and A β for different bases α and β (in Theorem 1.3.15). In the two examples 1.3.14 and 1.3.16, we have already seen that the matrix A α sometimes has a pleasantly simple form for a suitably chosen basis α. It turns out that there exist tricks to find such simple forms systematically, and this section develops these tricks/techniques. In more detail, we’ll learn the most central concepts of the whole course, called eigenvalues, eigenvectors and eigenspaces; how these concepts help us bring maps into a ‘pleasantly simple’ form; and how to compute eigenvalues/vectors/spaces for any given linear map via an ‘algorithm’. Why do we care? In general, eigenvectors/-values characterise transformations and hence play an important role in all areas that apply linear algebra. Often, a system is represented by a linear transformation whose outputs are then re-purposed as new input. In this case, the largest eigen- value has particular importance because it governs the long-term behaviour of the system (when applying the transformation many times), and the associated eigenvector is the steady state of the system. We start by making precise what we mean by ‘simple form’: Definition 2.1.1. A square matrix A has diagonal form (or shorter, is diagonal) if all elements a i j with i 6= j are equal to zero. Diagonality of a matrix can also be expressed through how it acts on bases: 26 Theorem 2.1.2. Let A : V → V be a linear map and let α = {a 1 ,... , a n } be a basis for V. The matrix A α has the diagonal form λ1 0... 0     0 λ....  2..  Aα = .   .....  ... 0   0... 0 λn if and only if A a i = λi a i for i = 1,... , n. Proof. We consider both directions: ⇒: If the matrix A α has the diagonal form above, then multiplication by A α maps the vector e i to λi e i. So the α–coordinates of a i (e i ) are mapped to to λi e i (the α–coordinates of λi a i ). In other words, A maps a i to λi a i. ⇐: The argument is quite similar to the ⇒ direction, just the other way around: the i -th column of A α is the image of a i , in α-coordinates. If A maps a i to λi a i , then the i -th column of A α is λi e i. We are now ready to give the central definition of this course: eigenvalues and eigenvectors. Definition 2.1.3 (Eigenvector and eigenvalue). Let A : V → V be a linear map from a K-vector space V to itself. A vector v 6= 0 is called eigenvector of A with eigenvalue λ ∈ K if A v = λv. We denote the set of all eigenvalues of A by spec(A) and call it the spectrum of A. The prefix ‘eigen’ is adopted from the German word for ‘characteristic’/‘own’. So ‘eigenvalues/vec- tors’ roughly means ‘own’ values/vectors. These names were chosen since eigenvectors are mapped unto a multiple of themselves (with the multiple being the eigenvalue), meaning they are mapped into their own linear span. You will re-encounter the spectrum in a more general setting, e.g., in the course on ordinary differential equations. Example 2.1.4 (Example 1.3.16, contd) In example 1.3.16, we looked at the projection unto a line. The vectors a 1 and a 2 we defined there are eigenvectors: a 1 has eigenvalue 1 (since line elements are not changed at all by projections) a 2 has eigenvalue 0 (since the orthogo- nal complement is mapped to 0). The matrix w.r.t. the basis {a 1 , a 2 } is therefore Ã ! 1 0 , 0 0 a diagonal matrix with the eigenvalues on the diagonal. Notice the order of the two eigen- values 1 and 0 on the diagonal: this corresponds to the order of the eigenvectors. To get used to this definition, we now reformulate the diagonality criterion (Theorem 2.1.2) in eigenvalue/-vector terms: 27 Theorem 2.1.5. Let A : V → V be a linear map with representation matrix A α for basis α. A α is in diagonal form if and only if α is a basis of eigenvectors. In that case, the diagonal elements are the eigenvalues. Example 2.1.6 Consider in the euclidean plane E 2 a rotation around the origin by 90◦. No vector different from 0 is mapped to a multiple of itself. So this linear map has no eigen- vectors (let alone a basis of such). There is hence no choice of basis for which the rotation matrix could possibly be diagonal. We give another central definition of this course: eigenspaces, i.e., the space of eigenvectors for an eigenvalue. Definition 2.1.7 (Eigenspace). Let A : V → V be a linear map from a K-vector space V to itself. For any scalar λ ∈ K, we denote E λ := N (A − λI ). Since null spaces are subspaces, E λ is a subspace, called the eigenspace of A for eigenvalue λ. What do eigenspaces have to do with eigenvectors? We will now see that the eigenspace for eigen- value λ is exactly the space you get from collecting all eigenvectors, and additionally including 0 (otherwise, this wouldn’t be a proper subspace): E λ is the null space of the linear map A − λI. So any vector v lies in E λ if and only if (A −λI )v = 0, which is equivalent to A v −λv = 0. Hence v ∈ E λ if and only if A v = λv, so if and only if v is eigenvector for eigenvalue λ or v = 0. (Note that we ruled out 0 as a possible eigenvector by definition, but any subspace needs to contain it.) A few more remarks on the definition. E λ only is interesting if λ is an eigenvalue. (Otherwise, E λ only contains 0.) The other way around, λ is an eigenvalue if and only if E λ contains a non-trivial vector, that is, a vector v 6= 0. (So if and only if dim(E λ ) > 0.) Some authors only use the word eigenspace if λ is an eigenvalue. We can also write the null space of A as an eigenspace: E 0 consists of vectors that are mapped to 0 times itself, so to 0. So E 0 is the null space of A. 2.1.1 Computing eigenvalues and -spaces We will need to compute eigenvalues, eigenvectors and eigenspaces a lot during this course. So how do we do this? Assume you’re given a map A, and you’re tasked with the following two things: determine the eigenvalues, i.e., all values λ such that dim(E λ ) > 0; and find the eigenvectors for such a λ, i.e., the vectors in E λ different from 0. The generic recipe we’ll develop to do this is based on the following theorem: 28 Theorem 2.1.8. Let α = {a 1 ,... , a n } be a basis for V , and let   a 11 a 12... a 1n   a..   21 a 22.  A=.  .....  ...   a n1...... a nn be the matrix of a map A : V → V w.r.t. this basis. λ is an eigenvalue for the map A if and only if det(A − λI ) = 0. Then the eigenvectors for eigen- value λ, in α-coordinates, are the non-zero solutions of the system a 11 − λ   a 12... a 1n    v1 0 ..  a 22 − λ v2 0     a 21.     .  =  (2.1)   ...... ......   ...        a n1...... a nn − λ vn 0 Proof. We fix a value λ ∈ K. When looking for eigenvectors for λ, we are concerned with solutions to the system (A − λI )v = 0. The matrix of A − λI w.r.t. basis α is a 11 − λ   a 12... a 1n ..   a 21 a 22 − λ.  A − λI =  .  .........     a n1...... a nn − λ Let v be a vector in V , written in α-coordinates as v = v 1 a 1 + · · · + v n a n. Per definition, v is eigen- vector for eigenvalue λ if and only if v 6= 0 and (A − λI )v = 0, so if and only if (v 1 ,... , v n ) 6= (0,... , 0) and a 11 − λ   a 12... a 1n    v1 0 ..    v2  a 22 − λ  0   a   21.  = .. .   ...... ..      ...   .  .  a n1...... a nn − λ vn 0 This means that there exist eigenvectors for λ if and only the dimension of the solution space of the homogeneous system (A − λI )x = 0 is bigger than 0. This is the case if and only if the rank of A − λI is less than n, which happens if and only if det(A − λI ) = 0. If λ is an eigenvalue, then the solutions of the system Eq. (2.1) give us the coordinate vectors of the elements of E λ. Example 2.1.9 (Example 1.3.16, cont’d) In example 2.1.4, the continuation of 1.3.16, we saw that the vectors a 1 and a 2 are eigenvectors of the considered projection P belonging to 29 eigenvalues 1 and 0, and that the matrix for P w.r.t. the basis {a 1 , a 2 } is Ã ! 1 0 P :=. 0 0 We’ll now use Theorem 2.1.8 to determine all eigenvalues of the projection and their respec- tive eigenspaces. We first use the determinant criterion to find the eigenvalue candidates: Ã ! 1−λ 0 det(P − λI ) = = −λ(1 − λ) , 0 0−λ which is 0 if and only if λ is 0 or 1, meaning the only possible eigenvalues are the ones we al- ready found. To determine E 1 in α-coordinates, we solve the corresponding homogeneous solution, plugging in 1 for λ: Ã !Ã ! Ã ! Ã ! Ã ! 0 0 v1 0 0 0 = ⇔ = 0 −1 v2 0 −v 2 0 So the eigenvectors for eigenvalue 1 are the vectors of the form v = v 1 ·a 1 +0·a 2 for arbitrary scalars v 1. We can put this shorter (and get rid of the coordinate description): E 1 = span〈a 1 〉. To determine E 0 in α-coordinates, we do essentially the same thing, plugging in 0 for λ: Ã !Ã ! Ã ! Ã ! Ã ! 1 0 v1 0 v1 0 = ⇔ =. 0 0 v2 0 0 0 yields v = 0 · a 1 + v 2 · a 2 for arbitrary scalars v 2 , so E 0 = span〈a 2 〉. Example 2.1.10 (Example 2.1.9, generalised) If you feel nerdy, you can try to generalise the result of example 2.1.9 by proving the following claim: If P is a projection unto a line ` in R2 , then P has exactly the two eigenvalues 0 and 1, and the eigenspaces are E 1 = ` and E 0 = `⊥. (The general case works because you always can get a basis like in example 2.1.9 by picking one vector from ` and one from its complement.) We now translate this into a generic recipe to compute eigenspaces from a given eigenvalue, by means of an ‘algorithm’: Algorithm 2.1.11 [Computing the eigenspace for a given eigenvalue, as a linear span] 30 Input: Map A, eigenvalue λ, basis α Output: Eigenspace E λ (A) in span〈〉 form Step 1: Compute matrix A α of A w.r.t α Step 2: Compute space of solutions to the equation (A α − λI )v = 0 // Solution to (A α − λI )v = 0 = coordinates of a vector in E λ. Row-reduce matrix (A α − λI ) (if necessary) Solve (A α − λI )v = 0 Find span〈w 1 ,... , w n 〉 description of solutions // (like in ex. 2.1.9) Step 3: if V = Km and α is standard basis then output ‘E λ (A) = span〈w 1 ,... , w n 〉’ Step 4: else // {w 1 ,... , w n } are still coordinate vectors, translate back. for i = 1 to n v i := α−1 (w i ) // Translate w i into vector v i ∈ V , e.g., using ε S α output ‘E λ (A) = span〈v 1 ,... , v n 〉’ In the example above, we computed the function det(A − λI ) as −λ(1 − λ), and used the central role played by det(A − λI ) in determining possible eigenvalues. We will develop and use many nice results about this function, we hence name it: Definition 2.1.12 (Characteristic polynomial). Let A : V → V be a linear map and let A α be the matrix of A w.r.t. a basis α. We call the equation det(A α − λI ) = 0 the characteristic equation of A α , and the left-hand side of this equation, det(A α − λI ), the characteristic polynomial of A α. We will also call these objects the characteristic equation/polynomial of A, and denote the charac- teristic polynomial by χA. You might ask yourself whether the term χA is well-defined (as the function might not only depend on A, but also on our choice of basis). The following theorem states that the term χA is well- defined because χA actually doesn’t depend on the choice of basis. Theorem 2.1.13. Let A : V → V be a linear map, let α and β be two bases for V , and let A α /A β be the matrix of A w.r.t. α/β. Then det(A α − λI ) = det(A β − λI ). Proof. We prove the equation by using the ‘coordinate-switching relation’ between A α and A β , rewriting I , and using that the determinant is multiplicative: det(A β − λI ) = det(β S α A αα S β − λβ S αα S β ) = det(β S α (A α − λI )α S β ) = det(β S α ) det(A α − λI ) det(α S β ) = det(A α − λI ) det(β S αα S β ) = det(A α − λI ) det(I ) = det(A α − λI ). We now make some useful observations about how the characteristic polynomial looks: 31 Theorem 2.1.14. Let A : V → V be a linear map on a vector space V of dimension n, and let   a 11 a 12... a 1n   a..   21 a 22.  A=.  .....  ...   a n1...... a nn be the matrix of A w.r.t. some basis. Then the characteristic polynomial χA is a polynomial of degree (exactly) n, and of the following shape: χA = (−1)n λn + (−1)n−1 (a 11 + a 22 + · · · + a nn )λn−1 + · · · + c 1 λ + c 0 for some coefficients c 0 , c 1 ,... ∈ K. Proof. We know that the characteristic polynomial χA is the determinant ¯ a 11 − λ ¯ ¯ ¯ a 12... a 1n ¯¯ ¯.. ¯ ¯ a 21 a 22 − λ. ¯ ¯. ¯ ¯ ¯......... ¯ ¯ ¯ ¯... a nn − λ ¯

Linear Algebra 2 Lecture Notes PDF

Document Details

Tags

Related

Summary

Full Transcript