Gradient Descent for SLR PDF

Things to note: Call for Volunteers for the Final Presentation, the instruction from Spring 24 is posted. Take the office hour poll Agenda for Section 1 Gradient Descent for SLR (Today) Gradient Descent for MLR (Wednesday) Gradient Descent Implementation with Vectorization & SGDRegressor in Sklearn (Next Monday) Multiclass Classification (Next Wednesday) HW1: Implementing Gradient Descent for Logistic Regression Motivation: Models you have learned Gradient Descent is used as Linear regression an optimization algorithm to Parametric Logistic regression find the optimal parameters for models. ML Models KNN Nonparametric Tree based models Neural Nets (parametric): linear regression + activation Function A NN with Two Layers Agenda for Today Lecture: After-class Assignment: Cost Function Intuition- using SLR Gradient Descent Implementation using Visualizing the Cost Function for loops in SLR (Fill in Code Blanks) Gradient Descent Gradient Descent for SLR Learning Rate in GD Implementing Gradient Descent Running Gradient Descent Cost Function in ML Cost Function: Quantify the error between the predicted values and the actual values Where, Gradient Descent is used as Linear regression an optimization algorithm to Parametric Logistic regression find the optimal parameters for models. Neural Nets Cost Function Intuition Using SLR as an illustration Simple linear regression model: simplified f w,b x = wx + b f w x = wx parameters: 2 parameters w, b - w is the weight, b is the bias w - w is slope, b is intercept cost function (MSE): mean square error goal: minimize J w, b minimize J w w,b w f w (x) = wx (for fixed w, function of x ) (function of w) input parameter 3 : actual value fw 3 2 w=1 2 y J(w) the magenta line represents 1 the true underlying function, 1 the x are the observations 0 0 -0.5 0 0.5 1 1.5 2 2.5 0 1 x 2 3 m m w 1 x i 1 1 on this J ( 1) =2 w - y i )2 = L(wx i - y i )2 Cost= function panel ( 02 +02 +02 ) = 0 2m 2m i=1 i=1 f w (x) = wx (function of x ) (function of w) 3 : actual value f w (x) 3 2 2 y J(w1) 1 w = 0.5 1 0 0 0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5 x w J 0.5 = f w (x) (function of x ) (function of w) 3 : actual value fw 3 2 2 y J(w) 1 w=0 1 0 0 0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5 x w J 0 = Visualizing the Cost Function Linear Regression with One Variable J(w) (function of w) goal of linear regression: 3 2 J(w) minimize J w 1 w to find optimal w 0 the goal of the parametric model is to figure out what -0.5 0 0.5 1 1.5 2 2.5 the optimal weight and bias are w -> this is done via minimizing the cost function General SLR: f w,b x = wx + b Parameters w, b Cost Function Objective minimize J w, b w,b f w,b (function of x ) Given w and b, 500 we can calculate J(w, b) 400 300 price in $1000'5 200 100 0 0 1000 2000 3000 size in feet2 f w,b x = 0.06x + 50 f w,b x (function of x ) Given w and b, 500 we can calculate J(w, b) 400 300 price in $1000'5 200 100 0 0 1000 2000 3000 size in feet2 f w,b x = wx + b (function of w, b) J(w,b) J w = - 10, b = - 15 J(w, b) Cost Function Visualization – 3D Gradient Descent An optimization algorithm used to minimize the cost function in machine learning Have some cost function J w, b Want min J w, b w,b Outline: Start with some w, b (set w=0, b=0) Keep changing w, b to reduce J w, b Until we settle at or near a minimum Every step taken in gradient descent moves closer to the global (or at least local) minimum! J w, b b w How to ensure that every step taken in gradient descent moves closer to the global (or at least local) minimum J ( w, b) w b : Repeat until convergence: J w { : w } J w w Gradient descent algorithm Learning rate Derivative Simultaneously update w and b Correct: Simultaneous update Incorrect a tmp_w = w - a J w, b tmp_w = w - a a J w, b aw aw a w = tmp_w tmp_b = b - a J w, b ab tmp_b = b - a a J w, b w = tmp_w ab b = tmp_b b = tmp_b Learning Rate: A key parameter how big each step is toward finding the minimum of a cost function d w=w- a J w dw J(w) minimum If a is too small. w Gradient descent may be slow. If a is too large. J(w) Gradient descent may: - Overshoot, never reach minimum - Fail to converge, diverge w minimum If a is too large Can reach local minimum with fixed learning rate not as large large Near a local minimum, - Derivative becomes smaller - Update steps become smaller Can reach minimum without decreasing learning rate Gradient Descent for SLR Training SLR SLR model Cost function f w,b (x) = wx + b Gradient descent algorithm (Optional)The chain rule a J w, b = aw a J w, b = ab Gradient descent algorithm repeat until convergence { Update w and b simultaneously } squared error cost convex function w global minimum b J(w) w =w- a ·0 w=w More than one local minimum J w, b b w Implementing Gradient Descent With Simple Linear Regression Implement GD using for loop Step functions: compute_cost: in each step, computing the cost J (w, b) as (w, b) gets updated compute_gradient: Return the total gradient update from all the examples gradient_descent: update (w, b)repeat until convergence Quiz 1 for details: Running Gradient Descent Training Linear Regression f w,b x J w, b pric e in $1000'5 size in feet2 J w, b J bw, b x f w,b siie J w, pric e in $1000'5 size sizein in feet feet22 J w, b f w,b x J w, b J w, b f w,b siie J w, b pric e in $1000'5 size in feet2 J w, b f w,b siie J w, b pric e in $1000'5 size in feet2 J w, b f w,b siie J w, b pric e in $1000'5 size in feet2 J w, b f w,b siie J w, b pric e in $1000'5 size in feet2 J w, b f w,b siie J w, b pric e in $1000'5 size in feet2 J w, b f w,b siie J w, b pric e in $1000'5 size in feet2 J w, b Learning curve Quiz1:Implement gradient descent for SLR using for loop Fill in the code blanks Reference: https://www.deeplearning.ai/

Gradient Descent for SLR PDF

Document Details

Tags

Related

Summary

Full Transcript