6CS012 - Linear Algebra & Machine Learning Lecture - PDF
Document Details

Uploaded by RoomierWilliamsite1199
Herald College Kathmandu
2025
Siman Giri
Tags
Related
- Math Deep: Algebra, Topology, Differential Calculus, and Optimization for Computer Science
- Math for Machine Learning Linear Algebra - Week 1 PDF
- Linear Algebra and Deep Learning Lecture PDF
- 6CS012 PDF: Linear Algebra for AI and ML Lecture - 2025
- 6CS012 Artificial Intelligence Lecture: Linear Algebra PDF
- Linear Algebra for Deep Learning PDF
Summary
These lecture slides for the 6CS012 module in Artificial Intelligence and Machine Learning cover foundational math skills, specifically linear algebra and derivatives. Topics include vectors, matrices, and their application to machine learning. The module leader is Siman Giri.
Full Transcript
6CS012 – Artificial Intelligence and Machine Learning. Lecture – 01 Foundational Math Skills for AI and ML. A quick revision on Linear Algebra and Derivative. Siman Giri {Module Leader – 6CS012} 2/21/2025...
6CS012 – Artificial Intelligence and Machine Learning. Lecture – 01 Foundational Math Skills for AI and ML. A quick revision on Linear Algebra and Derivative. Siman Giri {Module Leader – 6CS012} 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 1 Learning Outcomes!!! Review and revise some fundamental concepts from Mathematics – Linear Algebra and Derivative we will be using through out the course. Cautions !!! We will omit many important topics in Linear algebra and Matrix Calculus, which we believe are not essential for understanding deep learning. In Particular we will discuss: Why do we need Linear Algebra for Machine/Deep Learning? {Almost} Everything we need to know about vector and matrices for Machine/Deep Learning. A very big picture on Definition of Derivative and Matrix Calculus. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 2 A. Why do we need Linear Algebra for ML/DL? {Why to study Vector and Matrices?} 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 3 A.1 What is Linear Algebra? Linear Algebra is the branch of mathematics concerning linear equations such as: 𝐚𝟏 𝐱 𝟏 + ….. +𝐚𝐧 𝐱 𝐧 = 𝐛; linear maps such as: 𝐱 𝟏 , …. , 𝐱 𝐧 ↦ 𝐚𝟏 𝐱 𝟏 + ⋯ + 𝐚𝐧 𝐱 𝐧 ; and their representations in vector spaces and through matrices. – Wikipedia. Linear algebra is a branch of mathematics that deals with vectors, vector spaces (also known as linear spaces), and linear transformations between these spaces. It involves operations on matrices and vectors, solving systems of linear equations, and understanding geometric concepts like lines, planes, and subspaces. – “chatgpt.” 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 4 A.2 Why Linear Algebra for Machine Learning? Representation of Data: In machine learning, data is typically represented as vectors and matrices. For example, a dataset might be stored as a matrix where each row is a data point (vector), and each column is a feature. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 5 A.2.1 Why Linear Algebra for Machine Learning? Efficient Computing: Matrix operations allow for efficient computations on large datasets. Libraries like NumPy, TensorFlow, and PyTorch leverage linear algebra for operations on large matrices and tensors {Vectorizations}, which makes machine learning models faster and more scalable. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 6 A.2.2 Why Linear Algebra for Machine Learning? Understanding {Machine or Deep Learning} Algorithms: Training machine or deep learning models often involves solving systems of linear equations. Linear algebra provides the necessary tools to solve these systems efficiently. Many machine learning algorithms are based on linear algebra concepts. For instance: Linear Regression involves finding a line (or hyperplane) that best fits the data. Neural Networks use matrix multiplication for forward and backward propagation. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 7 B. Summary : Linear Algebra for Machine Learning. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 8 Understanding Vector and Matrices. {Basic Concepts, Definition and Notations.} 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 9 1.1 What are Vectors? Interpretation – 1: Point in Space. Interpretation – 2: Direction in Space. E.g., in 2D{dimension} E.g., the vector 𝐯 = 𝟑, 𝟐 𝐓 has a direction of 3 steps we can visualize the data points with respect to to the right and 2 steps up a coordinate origin The notation 𝐯 is sometimes used to indicate that the vectors have a direction All vectors in the figure have the same direction 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 10 1.2 Vector formal Definition. In Linear Algebra and Applied Mathematics, we define vector with in n-dimensional vector space. Vector Space: If n is a positive integer, then an ordered n-tuple is a sequence of n real numbers 𝐧𝟏 , 𝐧𝟐 , … , 𝐧𝐧 The set of all ordered n-tuples is called n – space or n – dimensional vector space and is denoted by ℝ𝐧. Vectors in ℝ𝒏 : Let ℝ𝐧 = 𝐱 𝟏 , … , 𝐱 𝐧 : 𝐱 𝐣 ∈ ℝ 𝐟𝐨𝐫 𝐣 = 𝟏, … , 𝐧. Then, 𝐱 = 𝐱 𝟏 , … , 𝐱 𝐧 is called a vector in vector space ℝ𝒏. The number 𝐱 𝐣 → 𝐱 𝟏 , … 𝐱 𝐧 are called the components of 𝐱 ∈ ℝ𝒏. Examples: 𝐛 = 𝐛𝟏 , 𝐛𝟐 , 𝐛𝟑 ∈ ℝ 𝟑 𝐚 = 𝐚𝟏 , 𝐚𝟐 ∈ ℝ𝟐 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 11 1.3 Vector in Vector – Space. Vector Space: A set V of n-dimensional vectors (with a corresponding set of scalars) such that the set of vectors is: “closed” under vector addition. “closed” under scalar multiplication. Origins are defined and fixed {0 vector must exist} In other words: For addition of two vectors: takes two vectors 𝐮, 𝐯 ∈ ℝ𝟐 , and it produces the third vector 𝐮 + 𝐯 ∈ ℝ𝟐. (addition of vectors – gives another vector in the same set) For scalar Multiplication: Takes a scalar 𝐜 ∈ 𝐅 and a vector 𝐯 ∈ ℝ𝐧 produces a new vector 𝐜𝐯 ∈ ℝ𝐧. (multiplying a vector by a scalar – gives another vector in the same set) 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 12 1.4.1 Axioms of Vector – Space. If V is a set of vectors satisfying the above definition of a vector space, then it satisfies the following axioms: 𝐄𝐱𝐢𝐬𝐭𝐞𝐧𝐜𝐞 𝐨𝐟 𝐚𝐧 𝐀𝐝𝐝𝐢𝐭𝐢𝐯𝐞 𝐈𝐝𝐞𝐧𝐭𝐢𝐭𝐲: any vector space V must have a zero vector. 𝐄𝐱𝐢𝐬𝐭𝐞𝐧𝐜𝐞 𝐨𝐟 𝐍𝐞𝐠𝐚𝐭𝐢𝐯𝐞 𝐕𝐞𝐜𝐭𝐨𝐫: for any vector v in V its –ve must also be in V. Has Arthematic /Algebraic Properties – We can perform valid mathematical operations. {details in course note} 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 13 1.5 Matrices: Introduction. In general: A matrix is a rectangular array of numbers. The Overview of notation for discussing matrices: numbers in the array are called the entries in the matrix. Given a set 𝐂 ∈ ℝ, we let 𝐂𝐦×𝐧 denote the set of all Array of numbers are an “ordered collection of vectors”. matrices of m rows and n columns consisting of items from set C. Like vectors matrices are also fundamentals in machine learning/AI, as matrices are the way For matrix: 𝐀 ∈ 𝐂𝐦×𝐧 : we let 𝐚𝐢𝐣 denote the item computer interact with data in practice. at the 𝐢𝐭𝐡 row and 𝐣𝐭𝐡 column of A. A matrix is represented with italicized upper-case letter For matrix 𝐀 ∈ 𝐂𝐦×𝐧 : we let 𝐚𝐢∗ denote the like “A”. 𝐢𝐭𝐡 𝐫𝐨𝐰 𝐯𝐞𝐜𝐭𝐨𝐫 of A. For two dimensions: we say the matrix A has: For matrix 𝐀 ∈ 𝐂𝐦×𝐧 : we let 𝑎∗𝑖 denote the m rows and n columns. 𝐣𝐭𝐡 𝐜𝐨𝐥𝐮𝐦𝐧 𝐯𝐞𝐜𝐭𝐨𝐫 of A. Each entry/element of A is defined as 𝐚𝐢𝐣. Thus, a matrix 𝐀𝐦×𝐧 is define as: 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 14 1.6 Special Matrices. Rectangular Matrix: Upper triangular matrix: Matrices are said to be rectangular when the Square matrices are said to be upper triangular number of rows is ≠ to the number of columns, i.e. when the elements below the main diagonal are 𝐦×𝐧 𝐀 with 𝐦 ≠ 𝐧. For instance: zero i.e. For 𝐃 = 𝐝𝐢𝐣 , we have 𝐝𝐢𝐣 = 𝟎, for 𝐢 > 𝐣. For instance: Square Matrix: Lower triangular matrix: Matrices are said to be square when the number of rows = the number of columns, i.e. 𝐀𝐦×𝐧. For Square matrices are said to be lower triangular instance: when the elements above the main diagonal are zero. i.e. 𝐃 = 𝐝𝐢𝐣 , we have 𝐝𝐢𝐣 = 𝟎 , for 𝐢 < 𝐣. For instance: Diagonal Matrix: Square matrices are said to be diagonal when each Identity Matrix: of its non-diagonal elements is zero, i.e. for A diagonal matrix is said to be the identity when 𝐃 = 𝐝𝐢𝐣 ,we have ∀𝐢, 𝐣 ∈ 𝐧 𝐢 ≠ 𝐣 ⇒ 𝐝𝐢𝐣 = 𝟎. the elements along its main diagonal are equal to For instance: one. For instance: 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 15 1.6.1 Special Matrices. Symmetric Matrix: Null or Zero Matrix: Square matrices are said to be symmetric its Matrices are said to be null or zero matrices equal to its transpose, i.e. 𝐀 = 𝐀𝐓. For instance: when all its elements equal to zero, which is denoted as 𝟎𝐦×𝐧. For instance: Scalar Matrix: Diagonal matrices are said to be scalar when all Equal Matrix: the elements along its main diagonal are equal, Two matrix are said to be equal if i.e. 𝐃 = 𝛂𝐈. For instance: 𝐀 𝐚𝐢𝐣 = 𝐁 𝐛𝐢𝐣. For instance: 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 16 1.7 Interpretation of a Matrix: Collection of Vectors. A matrix can be thought of as a set of vectors. For example, for the following matrix: 𝟏 𝟐 𝟏 𝐀≔ can be thought of as 𝟎 𝟏 −𝟏 a two three-dimensional row vectors i.e. 𝒂𝟏∗ ≔ [𝟏 𝟐 𝟏] and 𝒂𝟐∗ ≔ 𝟎 𝟏 −𝟏 Or as a three two-dimensional column vectors: 𝟏 𝟐 𝟏 𝐚∗𝟏 ≔ ; 𝐚∗𝟐 ≔ and 𝐚∗𝟑 ≔ 𝟎 𝟏 −𝟏 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 17 1.7.1 Interpretation of Matrix: As a table of data. The simplest interpretation of matrix is as a two – dimensional array of values. For example: A numerical dataset represented as a matrix. The pixels of an image can be represented as a matrix. Let’s say we have an image of 𝐦 × 𝐧 pixels. Let 𝐗 be a matrix representing this image where 𝐱 𝐢,𝐣 represents the intensity of the pixel at row 𝐢 and 𝐣. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 18 1.7.2 Interpretation of Matrix: As a Function. A matrix can also be viewed as a function that maps vectors in one vector space to vectors in another vector space. These special kind of matrix – defined function are also called Linear Transformation and written as: 𝐓 𝐱 ≔ 𝐀𝐱 A very simple visualization of such function is matrix – vector multiplication. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 19 Good to Know!!! A tensor is a multidimensional array and a generalization of the concepts of a vector and a matrix. Tensors can have many axes, here is a tensor with three axes: 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 20 Tensor → Example. Tensors in DL are Used to represent an image. image_shape := Height × Width × Color Channel (RGB) 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 21 2. The Geometry of Vectors. {Operations, Linear Dependence, and Basis} 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 22 2.1 Understanding Dot Products. Dot product: Given two vectors u, v ∈ ℝ𝒏 , the quantity 𝐮𝐓 𝐯, sometimes called the inner product or dot product of the vectors, is a real number given by: 𝐯𝟏 𝐯𝟐 𝐮𝐓 𝐯 ∈ ℝ = 𝐮𝟏 , 𝐮𝟐 , … , 𝐮𝐧 ⋅ … = σ𝐧𝐢=𝟏 𝐮𝐢 × 𝐯𝐢 𝐯𝐧 Orthogonal Vectors: A pair of vectors u and v are orthogonal if their dot product is zero i.e. 𝐮, 𝐯 = 𝟎. Notation for a pair of orthogonal vectors is 𝐮 ⊥ 𝐯 {i.e. Vector are perpendicular to each other}. In the ℝ𝐧 ; this is equal to pair of vector forming a 𝟗𝟎𝟎 angle. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 23 2.2 Linear Combinations of Vectors. Idea Combining two or more than vectors to form a new vector. Definition: A vector v is a linear combination of a set of vectors 𝐯𝟏 , 𝐯𝟐 , … , 𝐯𝐧 , if it can be expressed as: 𝐯 = 𝐜𝟏 𝐯𝟏 + 𝐜𝟐 𝐯𝟐 + ⋯ + 𝐜𝐧 𝐯𝐧 where: 𝐜𝟏 , 𝐜𝟐 , … , 𝐜𝐧 are scalars (coefficients). 𝐯𝟏 , 𝐯𝟐 , … , 𝐯𝐧 are vectors in a vector space. Example in ℝ𝟐 : Let 𝐯𝟏 = 𝟏, 𝟐 and 𝐯𝟐 = 𝟑, 𝟏 , If we take scalars 𝐜𝟏 = 𝟐 and 𝐜𝟐 = −𝟏, then their linear combination will produce a new vector 𝒗 in same vector space. 𝟏 𝟑 𝐯=𝟐× + −𝟏 × ∎ 𝟐 𝟏 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 24 2.3 Span of a Set of vectors. Span is a consequences of Linear combination of vectors and can be thought as a subset inside a vector space (also known as vector subspace). A subspace, 𝕊 𝐨𝐟 𝐫𝐞𝐚𝐥 𝐯𝐞𝐜𝐭𝐨𝐫 𝐬𝐩𝐚𝐜𝐞 ℝ𝐧 is thought of a flat surface (having no curvature) surface with in ℝ𝐧 : is a collection of all the vectors in 𝕊 which satisfies the following (algebraic) conditions: The origin (𝟎 𝒗𝒆𝒄𝒕𝒐𝒓) is contained in 𝕊. If vector 𝒗𝟏 and 𝒗𝟐 are in 𝕊; then 𝒗𝟏 + 𝒗𝟐 ∈ 𝕊. If 𝒗𝟏 ∈ 𝕊 and 𝛂 a scalar then 𝜶𝒗𝟏 ∈ 𝕊. The span of a set of vectors 𝒗𝟏 , 𝒗𝟐 , … , 𝒗𝒏 ∈ ℝ𝒏 is the set of all possible linear combinations of those vectors. Formally, the span of 𝒗𝟏 , 𝒗𝟐 , … , 𝒗𝒏 is: 𝒔𝒑𝒂𝒏 𝒗𝟏 , 𝒗𝟐 , … , 𝒗𝒏 = 𝒄𝟏 𝒗𝟏 + 𝒄𝟐 𝒗𝟐 + ⋯ + 𝒄𝒏 𝒗𝒏 𝒄𝟏 , 𝒄𝟐 , … , 𝒄𝒏 ∈ ℝ where 𝒄𝟏 , 𝒄𝟐 , … , 𝒄𝒏 are scalar coefficients. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 25 2.3.1 Geometric Interpretation of a Span. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 26 2.4 Linearly Independent and Dependent Vectors. A set of vectors 𝒗𝟏 , 𝒗𝟐 , … , 𝒗𝒏 in a vector space ℝ𝒏 is: Linearly dependent if at least one vector can be written as a linear combination of the others. Mathematically, this means there exists at least one scalars 𝒄𝟏 , 𝒄𝟐 , … , 𝒄𝒏 which is not zero, such that: 𝐜𝟏 𝐯𝟏 + 𝐜𝟐 𝐯𝟐 + ⋯ 𝐜𝐧 𝐯𝐧 = 𝟎 ; 𝐚𝐭 𝐥𝐞𝐚𝐬𝐭 𝐨𝐧𝐞 𝐜 ≠ 𝟎. Linearly Independent if the only possible solution for above equation is 𝒄𝟏 = 𝒄𝟐 = ⋯ = 𝒄𝒏 = 𝟎, i.e. no vector set can be written as a combination of the others. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 27 3. Matrix Algebra. {Important Matrix Operations.} 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 28 3.1 Matrix Determinant. Determinant of a matrix, denoted by det(A) or 𝐀 , is a real-valued scalar encoding certain properties of the matrix E.g., for a matrix of size 2×2: a b det = ad − bc c d For larger-size matrices the determinant of a matrix id calculated as 𝐝𝐞𝐭 𝐀 = σ𝐣 𝐚𝐢𝐣 −𝟏 𝐢+𝐣 𝐝𝐞𝐭 𝐀 𝐢,𝐣 In the above, 𝐀 𝐢,𝐣 is a minor of the matrix. Properties: det AB = det BA 1 det A−1 = Fig: determinant represents area (or volume) of the parallelogram det A described by the vectors in the rows of the matrix det A = 0 → A is singular i. e. non square matrix. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 29 3.2 Rank of a Matrix. For 𝐦 × 𝐧 matrix the rank of the matrix is the largest number of linearly independent row or columns. For Example: 𝟐−𝟏 For Matrix; 𝐁 ≔ Find the Rank and Interpret. 𝟒−𝟐 Our Observation: The second column 𝒄𝟐 can be written as : 𝒄𝟐 = 𝟐 × 𝒄𝟏 Since one column can be expressed as a multiple of the other, there is only one independent column. Thus, the rank of B is 1, meaning it can span only a 1 D space in ℝ𝟐 vector space. Since the full rank of 𝟐 × 𝟐 matrix in ℝ𝟐 vector space is 2, B is considered rank – deficient. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 30 3.3 Inverse of a Matrix. The inverse of a square matrix A, denoted as 𝐴−1 , is a matrix that satisfies: 𝐀𝐀−𝟏 = 𝐀−𝟏 𝐀 = 𝐈 here I is the identity matrix. Conditions for Invertibility: A matrix 𝐀𝐦×𝐧 has an inverse if and only if : It is a square matrix 𝐧 × 𝐧. Its determinant is nonzero i.e. 𝐝𝐞𝐭 𝐀 ≠ 𝟎. Its rank is full, meaning 𝐫𝐚𝐧𝐤 𝐀 = 𝐧. If any of these conditions fail, the matrix is singular and does not have an inverse. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 31 3.3.1 Finding inverse of a Matrix. If Inverse Exist, we can find the inverse of a Matrix by: Using Row Reduction: Row reduction is a method of transforming a matrix into a simpler form (row echelon form – REF) usually the identity matrix for finding the inverse. REF can be reached via following valid row operations: Swap two rows Multiply a row by a non zero scalar Add or subtract multiples one to/from another row. It can be done using: Gaussian Elimination: Transform the matrix A into REF and then use back – substitution to solve for the inverse. Gauss – Jordan Elimination: Transform the matrix A into the identity matrix directly, with no need for back substitution. Using Adjoint (Cofactor) Formula: Find the inverse of A using the adjoint (also called adjugate) of the matrix. 𝟏 𝐀−𝟏 = × 𝐚𝐝𝐣 𝐀 , 𝐝𝐞𝐭(𝐀) For 2 × 2 matrix: 𝐚 𝐛 𝐀= ; 𝐜 𝐝 𝟏 𝐝 −𝐛 𝐀−𝟏 = 𝐚𝐝 −𝐛𝐜 −𝐜 𝐚 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 32 3.4 System of Linear Equation. A system of linear equation is a collection of one or more linear equations that share a common set of variables. For example: 𝟐𝐱 + 𝐲 = 𝟓 𝟑𝐱 + 𝟒𝐲 = 𝟔 Types of Systems: Consistent System: A system that has at least one solution. 1. Unique Solution: Occurs when the system has a single solution. 2. Infinite Solutions: Occurs when the system has many solutions. Inconsistent System: A system that has no solution. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 33 3.4.1 Solving System of Linear Equations. There are different techniques , our interest is Matrix Method (aka Matrix Inversion Method). Any system of Linear Equation: 𝐚𝟏𝟏 𝐱𝟏 + 𝐚𝟏𝟐 𝐱𝟐 = 𝐛𝟏 𝐚𝟐𝟏 𝐱𝟏 + 𝐚𝟐𝟐 𝐱𝟐 = 𝐛𝟐 can be represented in the form: 𝐚𝟏𝟏 𝐚𝟏𝟐 𝐱 𝟏 𝐛𝟏 𝐚 ⋅ 𝐱𝟐 = 𝐢. 𝐞. → 𝐀𝐱 = 𝐛 𝟐𝟏 𝐚𝟐𝟐 𝐛𝟐 here: 𝐀 → is a matrix of coefficients with size 𝑚 × 𝑛, m is the number of equations and n is the number of variable. 𝐱 → is a column vector representing the unknown variables with size 𝐧 × 𝟏. 𝐛 → is a column vector representing the constants with size 𝐦 × 𝟏. The equation can be modified: 𝐀−𝟏 𝐀𝐱 = 𝐀−𝟏 𝐛 Multiplying both side by A−1 𝐈𝐱 = 𝐀−𝟏 𝐛 I is the identity matrix 𝐱 = 𝐀−𝟏 𝐛∎ {you know how to find A−1 } 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 34 3.5 Matrix – Matrix Multiplication. Matrix multiplication between 𝐀 ∈ ℝ𝐧×𝐩 and 𝐁 ∈ ℝ𝐧×𝐩 with resultant matrix 𝐂 ∈ ℝ𝐦×𝐩 can be defined as: 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 35 3.6 Matrix – Vector Multiplication. Matrix-vector multiplication is an operation between a matrix and a vector that produces a new vector. Matrix-vector multiplication equals to taking the dot product of each column n of matrix-A with each element of vector-x resulting in vector y and is defined as: Matrix – vector multiplication can be interpreted as taking a linear combination of the columns of a matrix A weighted by elements of vector x. What can be the consequences of such operation? Matrix – vector multiplication can result in: Change in magnitude or, Change in direction or, Both changes depending on the matrix involved. Fig: How my vector will Transformed? 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 36 3.6.1 Geometric Interpretation of Matrix – vector Multiplication. Rotation Matrix: A rotation matrix rotates a vector by a specified angle while preserving its magnitude. Example: A 2D rotation matrix that rotates a vector by 90 degrees counterclockwise: 𝟎 −𝟏 𝐑= 𝟏 𝟎 Effect: This matrix rotates the vector without changing its length. Example Calculation: 𝟏 𝟎 Given the vector 𝐯 = → 𝐑𝐯 = 𝟎 𝟏 The magnitude remains 1, but the direction changes from the x-axis to the y-axis. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 37 3.6.2 Geometric Interpretation of Matrix – vector Multiplication. Scaling Matrix: A scaling matrix increases or decreases the magnitude of the vector without changing their direction. 𝐤 𝟎 𝐒= 𝟎 𝐤 where k is the scaling factor: If 𝐤 > 𝟏, the vector is stretched. If 𝟎 < 𝐤 < 𝟏, the vector is compressed. If 𝐤 < 𝟎, the vector is flipped and scaled. Example: 𝟏 Given a vector: 𝒗 = and; 𝟐 a scaling matrices 𝟐 𝟎 −𝟐 𝟎 𝒊) and 𝒊𝒊) ; 𝟎 𝟐 𝟎 −𝟐 Applying 𝑺 𝐭𝐨 𝒗: 𝟐 𝟎 𝟏 𝟐 𝐢) ⋅ = 𝟎 𝟐 𝟐 𝟒 −𝟐 𝟎 𝟏 −𝟐 𝐢𝐢) ⋅ = 𝟎 −𝟐 𝟐 −𝟒 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 38 4. Eigen Value Problem. {aka eigen Value Decomposition.} 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 39 4.1 Eigen Vector and Eigen Value. An eigenvector of a square matrix A is a non-zero vector v that, when multiplied by A, results in a scalar multiple of itself. In other words, it is a vector that does not change direction when the linear transformation represented by A is applied to it. Instead, it only gets scaled by a certain factor, called the eigenvalue. Mathematically, for any Matrix – vector pair if following holds: 𝑨𝒗 = 𝝀𝒗 then the vector 𝒗 is called eigen vector and the scaling factor 𝝀 is called eigen value. Key points about eigen vectors: Non-zero: Eigenvectors are always non-zero vectors, 𝐢. 𝐞. 𝐯 ≠ 𝟎. Scaling: The transformation 𝐴 simply scales the eigenvector by the eigenvalue 𝝀; it does not change the vector’s direction. Multiple eigenvectors: For each eigenvalue, there can be infinitely many eigenvectors, all scalar multiples of each other. They form a subspace (called the eigenspace) corresponding to that eigenvalue. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 40 4.1.1 Identify the Eigen Vector. Consider a matrix 𝟐 𝟐 𝐀= and −𝟒 𝟖 𝟏 𝟐 vectors 𝐯 = ;𝐰 =. 𝟏 𝟏 Which are Eigen vectors? For 𝐯 → we check if 𝐯 is an eigenvector by calculating 𝐀𝐯: 𝟐 𝟐 𝟏 𝟒 𝟏 𝐀𝐯 = = =𝟒 ⇒ 𝟒𝐯. −𝟒 𝟖 𝟏 𝟒 𝟏 So, v is an eigen vector with eigenvalue 𝝀 = 𝟒. What about w? 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 41 4.1.1 Identify the Eigen Vector. Consider a matrix 𝟐 𝟐 𝐀= and −𝟒 𝟖 𝟏 𝟐 vectors 𝐯 = ;𝐰 =. 𝟏 𝟏 Which are Eigen vectors? For w → we check if 𝒘 is an eigenvector by calculating 𝐀𝐰: 𝟐 𝟐 𝟐 𝟔 𝐀𝐰 = = ≠ 𝝀𝒘. −𝟒 𝟖 𝟏 𝟎 So, w is not an eigen vector there does not exist a scalar 𝝀 under which 𝐀𝐰 = 𝛌𝐰 holds true. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 42 4.2 Eigen Value Problem. The eigenvalue problem is a fundamental concept in linear algebra and plays a critical role in various fields such as machine learning, physics, and computer science. It involves finding scalar values (called eigenvalues) and corresponding non-zero vectors (called eigenvectors) for a given square matrix. Mathematically, Given a square matrix 𝐀, the eigenvalue problem is to find scalars 𝝀 and eigen vector 𝐯 that satisfy the following equation. 𝐀𝐯 = 𝛌𝐯. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 43 4.3 Steps to solve the Eigenvalue Problem. Write the characteristic equation: To find the eigenvalues, we rewrite the equation as: 𝐀 − 𝛌𝐈 𝐯 = 𝟎 called characterstic equation Where: 𝐈 is the identity matrix of the same size as A, 𝛌 → 𝐞𝐢𝐠𝐞𝐧 𝐯𝐚𝐥𝐮𝐞𝐬. 𝐯 → 𝐞𝐢𝐠𝐞𝐧 𝐯𝐞𝐜𝐭𝐨𝐫. Cautions: the matrix 𝐴 − 𝜆𝐼 must be singular i.e. 𝑑𝑒𝑡 𝐴 − 𝜆𝐼 = 0. Compute the characteristic polynomial: Solve 𝐝𝐞𝐭 𝐀 − 𝛌𝐈 = 𝟎, which gives a polynomial equation in 𝝀 which is called characteristic polynomial. Solve the characteristic polynomial: Solve the polynomial equation to find the eigenvalues 𝛌𝟏 , 𝛌𝟐 , … 𝛌𝐧. Find the eigen vectors: For each eigen value 𝛌𝐢 , substitute it back into the equation 𝐀 − 𝛌𝐈 𝐯 = 𝟎 and solve for the eigenvector 𝒗. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 44 4.3.1 Example Problem. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 45 4.3.1 Example Problem. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 46 4.4 Eigenvalue Decomposition. Eigenvalue Decomposition is a process where a square matrix is factorized into its eigenvalues and eigenvectors. Specifically, for a matrix A, if it can be decomposed into a product of three matrices: 𝐀 = 𝐕𝚲𝐕 −𝟏 where: 𝐀 is the original matrix. 𝐕 is the matrix whose columns are the eigenvectors of A. 𝚲 is a diagonal matrix whose diagonal entries are the eigenvalues of A. 𝐕 −𝟏 is the inverse of the matrix V. One of the application of Eigenvalue decomposition is Principal Component Analysis used for dimensionality reduction purposes. {This workshop we will implement PCA with eigen value decomposition and try to compress the image.} 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 47 5. Matrix and Derivative. {Finding the Slope for Univariate Function.} 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 48 5.1 What is Derivative? The derivative of a function measures how the output value of the function changes as we make small adjustments to its input. Notations: 𝐝 𝐝𝐟(𝐱) The derivative of a function 𝐟 𝐱 is represented by 𝐝 𝐟 𝐱 or or 𝐟 ′ 𝐱 and is defined as: 𝐱 𝐝 𝐱 If we have a function 𝐟 𝐱 , the derivative 𝐟 ′ 𝐱 at a point 𝐱 tell us the rate of change of function 𝐟 at that point. This rate of change is crucial for optimization techniques, such as finding maxima or minima, which are frequently used in training machine learning models. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 49 5.2 Derivatives: Scalar Function. Most popular: Derivative of a Scalar function i.e. Scalar derivatives 𝐟: ℝ → ℝ A scalar function is a function that maps a real number x to another real number 𝐟(𝐱). 𝐟 𝐱 = 𝐱𝟐 Here x: a real number and f x : also a real number. We are interested in the rate at which 𝐟(𝐱) changes as x changes. The derivative is the heart of calculus, buried inside this definition: 𝐟 𝐱+𝐡 −𝐟(𝐱) 𝐟 ′ 𝐱 = 𝐥𝐢𝐦 𝐡 when the limit exists. 𝐡→𝟎 popularly known as the “limit definition of the derivative” or “derivative by using the first principle” But what does it mean? 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 50 5.2.1 Derivative First Principle: Interpretation. Derivative of a function is a measure of local slope. 1st Example: For Linear Function 𝐲 = 𝐟 𝐱 = 𝟐𝐱. What for non linear function? 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 51 5.2.2 Derivative First Principle: Interpretation. 2nd Example: For Non - Linear Function 𝐲 = 𝐟 𝐱 = 𝐱 𝟐. The derivative of a function at a point is the slope of the tangent drawn to that curve at that point. (slope) derivative of a linear function (straight line) is constant at all the point not for the non-linear function. It also represents the instantaneous rate of change at a point on the function. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 52 5.3 Some Common Rules for determining Derivative. !!! Hands on practice in Tutorial. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 53 5.4 Derivative of some common function. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 54 6. Matrix and Derivative. {Finding the Slope for Multi – Variate Function.} 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 55 6.1 Derivative of a Multivariate Function. (scalar derivative of)Multivariate function 𝑓: ℝ𝑛 → ℝ are in the form 𝑓 𝑥, 𝑦 = 𝑥 2 𝑦. Partial Derivative: In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables, with the others held constant (as opposed to the total derivative, in which all variables are allowed to vary). This swirly-d symbol,𝜕, often called "del", is used to distinguish partial derivatives from ordinary single-variable (regular) derivatives. For Example: 𝑓 𝑥, 𝑦 = 𝑥 2 𝑦. Partial derivatives are used in vector calculus and differential geometry. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 56 6.2 {some popular} Nomenclature of Derivative. Derivative of a vector/matrix a.k.a Matrix/Vector Calculus is an extension of ordinary scalar derivative to higher dimensional settings. Overview of some extended derivative style: 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 57 6.3 Gradient. Gradient: The gradient of a function of multiple variables is the vector of partial derivatives of the function with respect to each variable. Scalar-by-vector {𝑓: ℝ → ℝ𝑛 }: The derivative of a scalar function 𝑦 with respect to a vector 𝑥 = 𝑥1 , 𝑥2 , … , 𝑥𝑛 𝑇 ∈ ℝ𝑛 is written as: 𝜕𝑦 𝜕𝑦 𝜕𝑦 𝜕𝑦 𝑇 gradients of 𝑦: 𝛻𝑦 = = 𝜕𝑥1 𝜕𝑥2 … 𝜕𝑥𝑛. gradients. 𝜕𝑥 {Stack the partial derivative against all the element of vector 𝒙} Scalar-by-Matrix {𝑓: ℝ → ℝ𝑛×𝑚 }: The derivative of a scalar function 𝜕𝑦 y with respect 𝜕𝑦 to a 𝑛 × 𝑚 matrix 𝑋 is written as: … 𝜕𝑥 𝜕𝑥 11 𝑛1 𝜕𝑦 gradients of y: 𝛻𝑦 = = ⋱ ⋮ ⋮ 𝜕𝑋 𝜕𝑦 𝜕𝑦 … 𝜕𝑥1𝑚 𝜕𝑥𝑛𝑚 {Stack the partial derivative against all the element of Matrix X.} 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 58 6.4 Gradient: Geometric Interpretation. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 59 6.4.1 Gradient: Geometric Interpretation. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 60 6.5 Gradient: Example 1. 𝒛 = 𝒇 𝒙, 𝒚 = 𝟑𝒙𝟐 𝒚 find the gradient of z at 𝟏, 𝟏. We know gradient of z is : 𝜕𝑓 𝑥,𝑦 𝜕𝑓 𝑥,𝑦 𝛻𝑧 = 𝜕𝑥 𝜕𝑦 𝜕𝑓 𝑥,𝑦 Finding: i.e. 𝑦 is constant. 𝜕𝑥 𝜕𝑓 𝑥,𝑦 Finding i.e. 𝑥 is constant. 𝜕𝑦 𝛻𝑧 is: 𝜵𝒛 = [𝟔𝒚𝒙 𝟑𝒙𝟐 ] 𝛻𝑧 at 1 1 : 𝜵𝒛 = [𝟔 × 𝟏 × 𝟏 𝟑 × 𝟏𝟐 ] = 𝟔 𝟑 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 61 6.6 Gradient of a Vector – Valued Function : Jacobian. vector-by-vector {𝒇: ℝ𝒏 → ℝ𝒎 }: 𝑻 The derivative of a vector function : 𝒚 = 𝒚𝟏 , 𝒚𝟐 , … , 𝒚𝒏 ∈ ℝ𝒏 with respect to an input vector 𝒙 = 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒎 𝑻 ∈ ℝ𝒎 is written as: 𝐉𝐲 : called Jacobian matrix is a matrix which contains all the partial derivatives of each output component with respect to each input variable, providing a full picture how the vector-valued function changes as each input variable changes. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 62 6.7 Derivative: Key Point The derivative of a univariate function is a scalar, When the derivative of a multivariate function is organized and stored in a vector, the so-called gradient. we denote the derivative of a multivariable function 𝐟 using the gradient symbol 𝜟 {read “del” or “nabla”} 𝛛𝐟 𝛛𝐱 𝛛𝐟 𝚫𝐟 = 𝛛𝐲 𝛛𝐟 𝛛𝐳 ⋮ The gradient is simply a vector listing the derivatives of a function with respect to each argument of a function. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 63 Plan Tutorial – Some Hands-on Exercise on Vector, Matrices and Gradient. Workshop – Implement PCA with eigenvalue decomposition for Image Compression Application. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 64 The – End. 2/21/2025 Week - 01 - A Review of Linear Algebra for Deep Learning. 65