Linear Regression (Least Square Error Fit) PDF
Document Details
Uploaded by HeartwarmingFluxus
Thapar Institute of Engineering and Technology, Patiala
Tags
Summary
This document provides an overview and explanation of linear regression using the least square error fit method. It covers concepts, examples, and equations relevant to linear regression. The document appears to be a presentation or lecture notes.
Full Transcript
Linear Regression (Least Square Error Fit) TIET, PATIALA Linear Regression ▪ In machine learning and statistics, regression attempts to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and a series of othe...
Linear Regression (Least Square Error Fit) TIET, PATIALA Linear Regression ▪ In machine learning and statistics, regression attempts to determine the strength and character of the relationship between one dependent variable (usually denoted by Y) and a series of other variables (known as independent variables). ▪ Mathematically, regression analysis uses an algorithm to learn the mapping function from the input variables to the output variable (Y) i.e. Y = f(x) where Y is a continuous or real valued variable. ▪ Regression is said to be linear regression if the output dependent variable is a linear function of the input variables. Regression Example ▪ House Value Prediction- The example below shows that the price variable (output dependent continuous variable) depends upon various input (independent) variables such as plot size, number of bedrooms, covered area, granite flooring, distance from city, age, upgraded kitchen, etc. Simple Linear Regression (SLR) ▪ Simple linear regression is a linear regression model with a single explanatory variable. ▪ It concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function (a non-vertical straight line) that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. ▪ The adjective simple refers to the fact that the outcome variable is related to a single predictor. Simple Linear Regression (SLR) Contd…. ▪ Simple linear regression finds a linear function (a non-vertical straight line) that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. ▪ For instance, in the house price predicting problem (with only one input variable-plot size), a linear regressor will fit a straight line with x- axis representing plot size and y-axis representing price. Fitting the Straight Line for SLR ▪ The linear function that binds the input variable x with the corresponding predicted value of (yˆ) can be given by the equation of straight line(slope- intercept form) as: 𝑦ˆ=𝛽0 + 𝛽1 𝑥 ▪ where 𝛽1 is the slope of line (i.e. it measures change in output variable y with unit change in independent variable x). ▪𝛽0 represents y-intercept i.e. the point at which the line touch x-axis ▪𝑦ˆ is the predicted value of the output for the particular value of input variable x. Cost/Error function for SLR ▪ The major goal of SLR model is to fit the straight line that predicts the output variable value quite close to the actual value. ▪ But, in real world scenario, there is always some error (regression residual) in predicting the values, i.e. 𝑎𝑐𝑡𝑢𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑖 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒𝑖 + 𝑒𝑟𝑟𝑜𝑟 𝑦𝑖 = 𝑦ˆ𝑖 + 𝜖𝑖 𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝐸𝑟𝑟𝑜𝑟 = 𝜖𝑖 = 𝑦𝑖 − 𝑦ˆ𝑖 This error may be positive or negative, as it may predict values greater or lesser than actual values. So we consider square of each error value. Cost/Error function for SLR ▪ The total error for all the n points in the dataset is given by: 𝑛 𝑛 𝑇𝑜𝑡𝑎𝑙 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟 = 𝜖𝑖2 = (𝑦𝑖 − 𝑦ˆ𝑖 )2 𝑖=1 𝑖=1 𝑛 = (𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 )2 𝑖=1 ▪ The mean of square error is called the cost or error function for simple linear function denoted by J(𝛽0 , 𝛽1 ) and given by: 𝑛 1 J(𝛽0 , 𝛽1 )= (𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 )2 𝑛 𝑖=1 ▪ There exist many methods to optimize (minimize) this cost/error function to find line of best fit. Least Square Method for Line of Best Fit ▪ The least square method aims to find values 𝛽0 ˆand 𝛽1 ˆ for 𝛽0 and 𝛽1 for which the square error between the actual and the predicted values is minimum i.e. least (So, the name is least square error fit). ▪ The values 𝛽0 ˆand 𝛽1 ˆ for 𝛽0 and 𝛽1 for which the square error function (J (𝛽0 , 𝛽1 )) is minimum are computed using second derivative test as below: 𝜕 J (𝛽0 ,𝛽1 ) 𝜕 J (𝛽0 ,𝛽1 ) 1. Compute partial derivatives of J (𝛽0 , 𝛽1 ) w.r.t 𝛽0 and 𝛽1 i.e. 𝜕𝛽0 and 𝜕𝛽1 𝜕 J (𝛽0 ,𝛽1 ) 𝜕 J (𝛽0 ,𝛽1 ) 2. Find values 𝛽0 ˆand 𝛽1 ˆ for which = 0 and =0 𝜕𝛽0 𝜕𝛽1 𝜕 2J (𝛽0 ,𝛽1 ) 𝜕2 J (𝛽0 ,𝛽1 ) 3. Find second partial derivative and ; and prove it be minimum for 𝛽0 ˆand 𝜕𝛽0 2 𝜕𝛽1 2 𝛽1 ˆ. Least Square Error Fit- Contd….. 𝑇𝑜𝑡𝑎𝑙 𝑆𝑞𝑎𝑢𝑟𝑒 𝐸𝑟𝑟𝑜𝑟 = 𝐽(𝛽0 , 𝛽1 ) = σ𝑛𝑖=1(𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 )2 𝜕 J (𝛽0 ,𝛽1 ) 𝜕 J (𝛽0 ,𝛽1 ) Step 1: Compute partial derivatives of J (𝛽0 , 𝛽1 ) w.r.t 𝛽0 and 𝛽1 i.e. 𝜕𝛽0 and 𝜕𝛽1 𝜕 J (𝛽0 ,𝛽1 ) = −2 σ𝑛𝑖=1(𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 ) 𝜕𝛽0 𝜕 J (𝛽0 ,𝛽1 ) = −2 σ𝑛𝑖=1 𝑦𝑖 − 𝛽0 − 𝛽1 𝑥𝑖 𝑥𝑖 = −2 σ𝑛𝑖=1(𝑥𝑖 𝑦𝑖 − 𝛽0 𝑥𝑖 − 𝛽1 𝑥𝑖2 ) 𝜕𝛽1 𝜕 J (𝛽0 ,𝛽1 ) 𝜕 J (𝛽0 ,𝛽1 ) Step 2: Find values 𝛽0 ˆand 𝛽1 ˆ for which = 0 and =0 𝜕𝛽0 𝜕𝛽1 σ𝑛𝑖=1(𝑦𝑖 − 𝛽0 ˆ − 𝛽1 ˆ𝑥𝑖 ) = 0 (1) and σ𝑛𝑖=1(𝑥𝑖 𝑦𝑖 − 𝛽0 ˆ𝑥𝑖 − 𝛽1 ˆ𝑥𝑖2 ) = 0 (𝟐) Least Square Error Fit- Contd….. From equation 1: σ𝑛𝑖=1 𝑦𝑖 − σ𝑛𝑖=1 𝛽0 ˆ − σ𝑛𝑖=1 𝛽1 ˆ𝑥𝑖 = 0 σ𝑛𝑖=1 𝑦𝑖 − 𝑛𝛽0 ˆ − 𝛽1 ˆ σ𝑛𝑖=1 𝑥𝑖 = 0 𝑛𝛽0 ˆ + 𝛽1 ˆ σ𝑛𝑖=1 𝑥𝑖 = σ𝑛𝑖=1 𝑦𝑖 (3) From equation 2: σ𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − σ𝑛𝑖=1 𝛽0 ˆ𝑥𝑖 − σ𝑛𝑖=1 𝛽1 ˆ𝑥𝑖2 = 0 σ𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − 𝛽0 ˆ σ𝑛𝑖=1 𝑥𝑖 − 𝛽1 ˆ σ𝑛𝑖=1 𝑥𝑖2 = 0 𝛽0 ˆ σ𝑛𝑖=1 𝑥𝑖 + 𝛽1 ˆ σ𝑛𝑖=1 𝑥𝑖2 = σ𝑛𝑖=1 𝑥𝑖 𝑦𝑖 (4) Least Square Error Fit- Contd….. Multiply equation 3 with σ𝑛𝑖=1 𝑥𝑖 and equation 4 by n 𝑛𝛽0 ˆ σ𝑛𝑖=1 𝑥𝑖 + 𝛽1 ˆ( σ𝑛𝑖=1 𝑥𝑖 )2 = σ𝑛𝑖=1 𝑥𝑖 σ𝑛𝑖=1 𝑦𝑖 (5) n𝛽0 ˆ σ𝑛𝑖=1 𝑥𝑖 + 𝛽1 ˆn σ𝑛𝑖=1 𝑥𝑖2 = 𝑛 σ𝑛𝑖=1 𝑥𝑖 𝑦𝑖 (6) Subtracting Equation 5 from 6, we get, 𝒏 σ𝒏 𝒏 𝒏 𝒊=𝟏 𝒙𝒊 𝒚𝒊 − σ𝒊=𝟏 𝒙𝒊 σ𝒊=𝟏 𝒚𝒊 𝜷𝟏 ˆ = n σ𝒏𝒊=𝟏 𝒙𝟐𝒊 −( σ𝒏𝒊=𝟏 𝒙𝒊 )𝟐 From Equation (3), 𝑛𝛽0 ˆ + 𝛽1 ˆ σ𝑛𝑖=1 𝑥𝑖 = σ𝑛𝑖=1 𝑦𝑖 1 1 𝛽0 ˆ = 𝑛 σ𝑛𝑖=1 𝑦𝑖 − 𝑛 𝛽1 ˆ σ𝑛𝑖=1 𝑥𝑖 𝜷𝟎 ˆ = 𝒚 ഥ − 𝜷𝟏 ˆഥ 𝒙 Least Square Error Fit- Contd….. 𝜕 2J (𝛽0 ,𝛽1 ) 𝜕2 J (𝛽0 ,𝛽1 ) Step 3: Find second partial derivative 𝜕𝛽0 2 and 𝜕𝛽1 2 ; and prove it be minimum for 𝛽0 ˆand 𝛽1 ˆ. 𝜕 2J (𝛽0 ,𝛽1 ) 𝜕(−2 σ𝑛 𝑖=1(𝑦𝑖 −𝛽0 −𝛽1 𝑥𝑖 ) = = −2 × −1 = 2 𝜕𝛽0 2 𝜕𝛽0 𝜕 2J (𝛽0 ,𝛽1 ) 𝜕−2 σ𝑛 2 𝑖=1(𝑥𝑖 𝑦𝑖 −𝛽0 𝑥𝑖 −𝛽1 𝑥𝑖 ) = = −2 × −𝑥𝑖2 = 2𝑥𝑖2 𝜕𝛽1 2 𝜕𝛽1 Therefore, both are positive i.e. the cost function continuously increases with respect to 𝛽0 and 𝛽1 beyond 𝛽0 ˆ and 𝛽1 ˆ. But attains it minimum value at 𝛽0 ˆ and 𝛽1 ˆ Least Square Error Fit- Summary ▪ The linear function that binds the input variable x with the corresponding predicted value of (yˆ) can be given by the equation of straight line(slope-intercept form) as: 𝑦ˆ=𝛽0 + 𝛽1 𝑥 ▪ The square error in prediction is minimized when 𝑛 σ𝑛𝑖=1 𝑥𝑖 𝑦𝑖 − σ𝑛𝑖=1 𝑥𝑖 σ𝑛𝑖=1 𝑦𝑖 𝛽1 ˆ = n σ𝑛𝑖=1 𝑥𝑖2 − ( σ𝑛𝑖=1 𝑥𝑖 )2 𝜎𝑦 σ𝑛𝑖=1(𝑥𝑖 − 𝑥)(𝑦 ҧ 𝑖 − 𝑦) ത = 𝑟𝑥𝑦 = 𝜎𝑥 σ𝑛𝑖=1(𝑥𝑖 − 𝑥)ҧ 2 and 𝛽0 ˆ = 𝑦ത − 𝛽1 ˆ𝑥ҧ Least Square Error Fit- Example Height (m), xi Mass (kg), yi The data set (shown in table) gives 1.47 52.21 average masses for women as a 1.50 53.12 function of their height in a sample of 1.52 54.48 American women of age 30–39. 1.55 55.84 1.57 57.20 (a) Fit a square line for average mass as 1.60 58.57 function of height using least square 1.63 59.93 error method. 1.65 61.29 1.68 63.11 (b) Predict the average mass of women 1.70 64.47 whose height is 1.40 m 1.73 66.28 1.75 68.10 1.78 69.92 1.80 72.19 1.83 74.46 Least Square Error Fit- Example i xi yi xi2 xiyi 1 1.47 52.21 2.1609 76.7487 2 1.50 53.12 2.25 79.68 3 1.52 54.48 2.3104 82.8096 4 1.55 55.84 2.4025 86.552 5 1.57 57.20 2.4649 89.804 6 1.60 58.57 2.56 93.712 7 1.63 59.93 2.6569 97.6859 8 1.65 61.29 2.7225 101.1285 9 1.68 63.11 2.8224 106.0248 10 1.70 64.47 2.89 109.599 11 1.73 66.28 2.9929 114.6644 12 1.75 68.10 3.0625 119.175 13 1.78 69.92 3.1684 124.4576 14 1.80 72.19 3.24 129.942 15 1.83 74.46 3.3489 136.2618 Total 24.76 931.17 41.0532 1548.2453 Least Square Error Fit- Example 𝑛 σ𝑛 𝑛 𝑛 𝑖=1 𝑥𝑖 𝑦𝑖 − σ𝑖=1 𝑥𝑖 σ𝑖=1 𝑦𝑖 𝛽1 ˆ = n σ𝑛𝑖=1 𝑥𝑖2 −( σ𝑛𝑖=1 𝑥𝑖 )2 15×1548.2453 −24.76 ×931.17 𝛽1 ˆ = = 61.19 15×41.0532−24.762 𝛽0 ˆ = 𝑦ത − 𝛽1 ˆ𝑥ҧ 931.17 24.76 𝛽0 ˆ = − 61.19 × = −38.88 15 15 Therefore the line of best fit is given by: 𝑦ˆ= − 38.88 + 61.19𝑥 Predicted value of y when x is 1.4 is 𝑦ˆ= − 38.88 + 61.19 × 1.4 = 46.78 Multiple Linear Regression (MLR) ▪ Multiple regression models describe how a single response variable Y depends linearly on a number of predictor variables. ▪ Examples: The selling price of a house can depend on the desirability of the location, the number of bedrooms, the number of bathrooms, the year the house was built, the square footage of the lot and a number of other factors. The height of a child can depend on the height of the mother, the height of the father, nutrition, and environmental factors. Multiple Linear Regression Model ▪ A multiple linear regression model with k independent predictor variables x1,x2...,xk predicts the output variable as: 𝑦 ˆ = 𝛽 + 𝛽 𝑥 + 𝛽 𝑥 + 𝛽 𝑥 + ⋯ … … ….. +𝛽 𝑥 0 1 1 2 2 3 3 𝑘 𝑘 ▪ There is always some error (regression residual) in predicting the values, i.e. 𝑎𝑐𝑡𝑢𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑖 = 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒𝑖 + 𝑒𝑟𝑟𝑜𝑟 𝑦𝑖 = 𝑦ˆ𝑖 + 𝜖𝑖 𝑦ˆ𝑖 = 𝛽0 + 𝛽1 𝑥𝑖1 + 𝛽2 𝑥𝑖2 + 𝛽3 𝑥𝑖3 + ⋯ … … ….. +𝛽𝑘 𝑥𝑖𝑘 + 𝜖𝑖 The total error can be computed from all the values in dataset i.e. i=1,2,….,n 𝑇𝑜𝑡𝑎𝑙 𝐸𝑟𝑟𝑜𝑟 = σ𝑛𝑖=1 𝜖𝑖 = σ𝑛𝑖=1 𝑦𝑖 − 𝛽0 − σ𝑘𝑗=1 𝛽𝑗 𝑥𝑖𝑗 (7) Multiple Linear Regression Model ▪ Equation (7) presented in the previous slide, can be represented in matrix form as: 𝜖 = 𝑦 − 𝑋𝛽 𝜖1 𝑦1 𝛽0 𝜖2 𝑦2 ▪Where 𝜖 = ⋮ ; y= ⋮ ; β = 𝛽2 ⋮ ⋮ ⋮ 𝜖𝑛 𝑦𝑛 𝛽𝑘 1 𝑥11 𝑥12 … 𝑥1𝑘 1 𝑥21 𝑥22 … 𝑥2𝑘 and 𝑋 = ⋮ ⋮ ⋮ ⋮ ⋮ 1 𝑥𝑛1 𝑥𝑛2 … 𝑥𝑛𝑘 Least Square Error Fit for MLR ▪According to Least Square Error method, we have to find the values of the matrix 𝛽 for which total square error is minimum. 𝑛 𝑇𝑜𝑡𝑎𝑙 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟 = 𝐽 𝛽 = 𝜖𝑖2 = 𝜖 𝑇 𝜖 𝑖=1 = (𝑦 − 𝑋𝛽)𝑇 (𝑦 − 𝑋𝛽) = (𝑦 𝑇 − 𝛽 𝑇 𝑋 𝑇 )(𝑦 − 𝑋𝛽) 𝐽 𝛽 = 𝑦 𝑇 𝑦 − 𝛽 𝑇 𝑋 𝑇 𝑦 − 𝑦 𝑇 𝑋𝛽 + 𝛽 𝑇 𝑋 𝑇 𝑋𝛽 𝑱 𝜷 = 𝒚𝑻 𝒚 − 𝟐𝒚𝑻 𝑿𝜷 + 𝜷𝑻 𝑿𝑻 𝑿𝜷 [Because 𝑦 𝑇 𝑋𝛽 and 𝛽 𝑇 𝑋 𝑇 𝑦 is always equal with only one entry] ▪ The square error function is minimized using second derivative test. Least Square Error Fit for MLR ▪Step 1: Compute the partial derivate of J(𝜷) w.r.t 𝜷 𝜕J(𝛽) 𝜕(𝑦 𝑇 𝑦 − 2𝑦 𝑇 𝑋𝛽 + 𝛽𝑇 𝑋 𝑇 𝑋𝛽) = 𝜕𝛽 𝜕𝛽 𝜕𝑦 𝑇 𝑦 𝜕2𝑦 𝑇 𝑋𝛽 𝜕𝛽𝑇 𝑋 𝑇 𝑋𝛽 = − + 𝜕𝛽 𝜕𝛽 𝜕𝛽 𝑇 𝑇 𝑇 𝜕𝛽 𝜕𝛽 𝑋 𝑋𝛽 = 0 − 2𝑋 𝑦 + 𝜕𝛽 𝜕𝛽 𝜕𝐴𝑋 [𝐵𝑒𝑐𝑎𝑢𝑠𝑒 = 𝐴𝑇 ] 𝜕𝑋 = −2𝑋 𝑇 𝑦 + 2𝑋 𝑇 𝑋𝛽 𝜕𝑋 𝑇 𝐴𝑋 [𝐵𝑒𝑐𝑎𝑢𝑠𝑒 = 2𝐴𝑋] 𝜕𝑋 Least Square Error Fit for MLR 𝝏J(𝜷) ▪Step 2: Compute 𝜷ˆ for 𝜷 for which =𝟎 𝝏𝜷 −2𝑋 𝑇 𝑦 + 2𝑋 𝑇 𝑋𝛽ˆ = 0 𝑋 𝑇 𝑋𝛽ˆ=𝑋 𝑇 𝑦 𝜷ˆ=(𝑿𝑻 𝑿)−𝟏 𝑿𝑻 𝒚 𝝏2J(𝜷) ▪ Step 3: Compute and prove it to be minimum for 𝜷ˆ 𝝏𝜷𝟐 𝜕2J(𝛽) 𝜕(−2𝑋 𝑇 𝑦 + 2𝑋 𝑇 𝑋𝛽) 𝑇 = = 0 + 2 𝑋𝑋 = +𝑣𝑒 𝜕𝛽2 𝜕𝛽 Least Square Error Fit for MLR- Example Example: The Delivery Times Data A soft drink bottler is analyzing the vending machine serving routes in his distribution system. He is interested in predicting the time required by the distribution driver to service the vending machines in an outlet. It has been suggested that the two most important variables influencing delivery time (y in min) are the number of cases of product stocked (x1) and the distance walked by the driver (x2 in feet). 3 observations on delivery times, cases stocked and walking times have been recorded. number of cases of product the distance walked by the Delivery time (in min) y stocked (x1) driver (x2) 7 560 16.68 3 220 11.50 3 340 12.03 (a) Fit a multiple regression line using least square error fit. (b) Compute the delivery time when 4 cases are stocked and the distance traveled by driver is 80 feet. Least Square Error Fit for MLR- Example Soln ▪ The multiple linear regression equation is: y = 𝛽1ˆ+𝛽2ˆx1+𝛽3ˆx2 𝛽1ˆ Where 𝛽1ˆ,𝛽2ˆ,𝛽3ˆ or 𝛽ˆ= 𝛽2ˆ are regression coefficients for line of best fit. 𝛽3ˆ We know, 𝜷ˆ=(𝑿𝑻 𝑿)−𝟏 𝑿𝑻 𝒚 1 7 560 1 1 1 𝑋= 1 3 220 𝑎𝑛𝑑 𝑋 𝑇 = 7 3 3 1 3 340 560 220 340 3 13 1120 𝑋 𝑇 𝑋 = 13 67 5600 1120 5600 477600 Least Square Error Fit for MLR- Example Soln 799/288 79/288 −7/720 (𝑋 𝑇 𝑋)−1 = 79/288 223/288 −7/720 −7/720 −7/720 1/7200 7.7696 𝛽ˆ=(𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 = 0.9196 0.0044 The line of best fit is, 𝑦 = 7.7696 + 0.9196𝑥1 + 0.0044𝑥2 When x1= 4, x2= 80 𝑦 = 7.7696 + 0.9196 𝑋 4 + 0.0044 𝑋 80 = 11.80min