Module 11 (Elements of Matrix Calculus: Concept of Partial Derivatives) PDF

Module 11 (Elements of Matrix Calculus: Concept of Partial Derivatives) Matrix Calculus In this module, we will cover the basic definitions of matrix calculus and how the chain rule of differentiation works in matrix calculus. Notations: x (or y) = scalar, 𝐱 (or 𝐲) = vector, 𝐗 (or 𝐘) = matrix For single-variable function y(x), the only derivative of our interest 𝑑y is. 𝑑x For vector functions, such as 𝐲(𝐱), we have the following 4 cases: Scalar Vector Scalar 𝑑y 𝑑y 𝑑x 𝑑𝐱 Vector 𝑑𝐲 𝑑𝐲 𝑑x 𝑑𝐱 A Brief Recap of Partial Derivatives For a multi-variable function y ∶ ℝ𝑛 → ℝ, The first-order partial derivative is: 𝜕y ∇xi y = yxi = , i = 1,2, … , n. 𝜕xi The second-order partial derivative is: 𝜕yxi 𝜕2y yxixj = = , i, j = 1,2, … , n. 𝜕xj 𝜕xi 𝜕xj Example: Consider y x1 , x2 , x3 = x1 x2 + x22 + 2x3 , then 𝜕y yx2 = = x1 + 2x2 𝜕x2 and 𝜕2 y yx2x1 = = 1. 𝜕x2 𝜕x1 Thank you Module 11 (Derivative of a Scalar) Derivative of a Scalar Derivative of a scalar with respect to a scalar: Let x and y are two scalars, where y is a function of x. This is simply the single-variable function case. 𝑑y The derivative is given by: 𝑑x dy Example: Consider y = x 3 , then = 3x 2. dx Exercise: Find the first and second derivative of the function y, where y = sin(x) + ex. Derivative of a Scalar Derivative of a scalar with respect to a vector: Let 𝐱 be a vector of order n and y is a scalar, where y is a function of 𝐱. That is, x1 x2 𝐱 = ⋮ , and y = y(𝐱). xn The derivative is given by: 𝜕y 𝜕x1 𝜕y dy = 𝜕x2. d𝐱 ⋮ 𝜕y 𝜕xn It is also called the gradient of y with respect to a vector variable 𝐱, denoted by ∇y. Derivative of a Scalar Gradient is extremely important and utilized a lot in machine learning. One of the most important properties of gradient is that the gradient of a function evaluated at one point is the direction to take in order to climb up the function the fastest. In other words, the exact opposite direction of the gradient vector is the direction to take to climb down the function the fastest. Many optimization techniques such as Steepest Decent, Newton’s Method, Conjugate Direction Methods, Quasi- Newton Method etc. are based on gradient. Derivative of a Scalar Example: Consider y 𝐱 = x1 x22 − x12 x2 , Then dy d𝐱 = x22 − 2x1 x2 2x1 x2 − x12 T. Exercise: Evaluate ∇y , where y = sin ex1 x2. Exercise: Evaluate ∇y , where y = x1 + x2 2. Thank you Module 11 (Derivative of a Vector) Derivative of a Vector Derivative of a vector with respect to a scalar: Let x be a scalar and 𝐲 be a vector of order m, where each component of 𝐲 is a function of x. That is, y1 (x) y (x) 𝐲= 2. ⋮ ym (x) The derivative is given by: d𝐲 dy1 dy2 dym dx = …. dx dx dx Derivative of a Vector Example: Let y ∈ ℝ2 s. t. y1 x = sin x , y2 x = cos x , then d𝐲 = cos(x) −sin(x). dx Exercise: Check the relation d(𝐲1 +𝐲2 ) d𝐲1 d𝐲2 = + , dx dx dx by considering an example in ℝ3. Derivative of a Vector Derivative of a vector with respect to a vector: Let 𝐱 and 𝐲 be vectors of order n and m, respectively, x1 y1 x2 y2 𝐱 = ⋮ , 𝐲= ⋮ , xn ym where each component yi is a function of components of 𝐱, or 𝐲=𝐲 𝐱. Derivative of a Vector The derivative of vector 𝐲 w. r. t. 𝐱 is given by following 𝑚 × 𝑛 matrix: 𝜕y1 𝜕y2 𝜕ym 𝜕x1 𝜕x1 𝜕x1 𝜕y1 𝜕y2 ⋯ 𝜕ym d𝐲 = 𝛻y1 𝛻𝑦2 … 𝛻𝑦𝑚 = 𝜕x2 𝜕x2 𝜕x2. d𝐱 ⋮ ⋱ ⋮ 𝜕y1 𝜕y2 𝜕ym 𝜕xn 𝜕xn ⋯ 𝜕xn Derivative of a Vector x1 y1 x Example: Given x = 2 , y = y and 2 x3 y1 = x12 − x2 , y2 = x32 + 3x2. d𝐲 Then the derivative matrix is computed as follows: d𝐱 2𝑥1 0 d𝐲 = −1 3. d𝐱 0 2𝑥3 Derivative of a Vector Exercise 1: Consider 𝐲 = 𝐀𝐱 for a constant matrix 𝐀 ∈ ℝ𝑚×𝑛. Then show that d𝐲 = 𝐀T. d𝐱 Exercise 2: Show that d𝐱 T 𝐱 = 2𝐱. d𝐱 Thank you Module 11 (Chain Rule for Vector Functions) The Chain Rule for Vector Functions Let 𝐱, 𝐲 and 𝐳 are vectors of order n, r and m, respectively, x1 y1 z1 x2 y2 z2 𝐱 = ⋮ , 𝐲 = ⋮ , 𝐳 = ⋮. xn yr zm where 𝐳 is a function of 𝐲, which is in turn a function of 𝐱. We can write 𝜕z1 𝜕z1 𝜕z1 𝜕x1 𝜕x2 𝜕xn 𝜕z2 𝜕z2 ⋯ 𝜕z2 d𝒛 T = 𝜕x1 𝜕x2 𝜕xn. d𝐱 ⋮ ⋱ ⋮ 𝜕zm 𝜕zm 𝜕zm 𝜕x1 𝜕x2 ⋯ 𝜕xn Each entry of this matrix may be expanded as 𝜕𝑧𝑖 𝜕𝑧𝑖 𝜕𝑦𝑞 𝑖 = 1,2, … , 𝑚 = σ𝑟𝑞=1 ቊ 𝜕𝑥𝑗 𝜕𝑦𝑞 𝜕𝑥𝑗 𝑗 = 1,2, … , 𝑛. The Chain Rule for Vector Functions Then 𝜕z1 𝜕yq 𝜕z1 𝜕yq 𝜕z1 𝜕yq σ σ σ 𝜕𝑦q 𝜕x1 𝜕𝑦q 𝜕x2 𝜕𝑦q 𝜕xn 𝜕z2 𝜕yq 𝜕z2 𝜕yq ⋯ 𝜕z2 𝜕yq d𝒛 T σ σ σ = 𝜕𝑦q 𝜕x1 𝜕𝑦q 𝜕x2 𝜕𝑦q 𝜕xn d𝐱 ⋮ ⋱ ⋮ 𝜕zm 𝜕yq 𝜕zm 𝜕yq 𝜕zm 𝜕yq σ σ ⋯ σ 𝜕𝑦q 𝜕x1 𝜕𝑦q 𝜕x2 𝜕𝑦q 𝜕xn 𝜕z1 𝜕z1 𝜕z1 𝜕y1 𝜕y1 𝜕𝑦1 𝜕𝑦1 𝜕𝑦2 𝜕𝑦r 𝜕x1 𝜕x2 𝜕xn 𝜕z2 𝜕z2 ⋯ 𝜕z2 𝜕y2 𝜕y2 ⋯ 𝜕y2 d𝐳 T d𝐲 T d𝐲 d𝐳 T = 𝜕𝑦1 𝜕𝑦2 𝜕𝑦r 𝜕x1 𝜕x2 𝜕xn = = d𝐲 d𝐱 d𝐱 d𝐲 ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ 𝜕zm 𝜕zm 𝜕zm 𝜕yr 𝜕yr 𝜕yr 𝜕𝑦1 𝜕𝑦2 ⋯ 𝜕x1 𝜕x2 ⋯ 𝜕𝑦r 𝜕xn The Chain Rule for Vector Functions On transposing both sides, we finally obtain d𝒛 d𝐲 d𝐳 = d𝐱 d𝐱 d𝐲 Note: This is the chain rule for vectors (different from the conventional chain rule of calculus, the chain of matrices builds toward the left). The Chain Rule for Vector Functions Example: Let 𝐱, 𝐲 are as in previous Example and 𝐳 is a function of 𝐲 defined as 2 𝑧1 𝑧1 = 𝑦1 − 2𝑦2 𝑧2 𝑧2 = 𝑦22 − 𝑦1 y2 = x32 + 3x2 𝒛 = 𝑧3 , , and ൝ 2 𝑧3 = 𝑦1 + 𝑦2 2 y2 = x32 + 3x2 𝑧4 𝑧4 = 2𝑦1 + 𝑦2 Then, we have 𝜕z1 𝜕z2 𝜕z3 𝜕z4 𝑑𝒛 𝜕𝑦1 𝜕𝑦1 𝜕𝑦1 𝜕𝑦1 2𝑦1 −1 2𝑦1 2 = =. 𝑑𝒚 𝜕z1 𝜕z2 𝜕z3 𝜕z4 −2 2𝑦2 2𝑦2 1 𝜕𝑦2 𝜕𝑦2 𝜕𝑦2 𝜕𝑦2 The Chain Rule for Vector Functions Therefore, d𝒛 d𝐲 d𝐳 = d𝐱 d𝐱 d𝐲 2𝑥1 0 2𝑦1 −1 2𝑦1 2 = −1 3 2𝑦2 2𝑦2 1 0 2𝑥3 −2 4𝑥1 𝑦1 −2𝑥1 4𝑥1 𝑦1 4𝑥1 = −2𝑦1 − 6 1 + 6𝑦2 −2𝑦1 + 6𝑦2 1. −4𝑥3 4𝑥3 𝑦2 4𝑥3 𝑦2 2𝑥3 The Chain Rule for Vector Functions Exercise: Why can we write that dh df dg dg df = = for h x = f g x dx dg dx dx dg for single-variable functions, but not d𝐳 d𝐲 d𝐳 d𝐳 d𝐲 = = d𝐱 d𝐱 d𝐲 d𝐲 d𝐱 for vectors? Check it by taking an appropriate example. (Hint: Matrix multiplication is not commutative.) Thank you

Module 11 (Elements of Matrix Calculus: Concept of Partial Derivatives) PDF

Document Details

Tags

Related

Summary

Full Transcript