Learning from Data Lecture 3 PDF

Learning from Data Lecture 3 Dr Marcos Oliveira Measures of error Recap from Lecture 2 We determined the slope of the regression line and Y-intercept of the regression line. = 0.914, = 1.16 3.0 Box office 2.0 1.0 0.0 1.0 2.0 3.0 Movie budget Recap from Lecture 2 We determined the slope of the regression line and Y-intercept of the regression line. = 0.914, = 1.16 Another pair of Another pair of 3.0 3.0 3.0 Box office Box office Box office 2.0 2.0 2.0 1.0 1.0 1.0 0.0 1.0 2.0 3.0 0.0 1.0 2.0 3.0 0.0 1.0 2.0 3.0 Movie budget Movie budget Movie budget Recap from Lecture 2 We determined the slope of the regression line and Y-intercept of the regression line. The linear correlation coefficient is just the standardised slope of the simple linear regression line. = 0.914, = 1.16 Budget (x) Revenue (y) Predicted revenue (yb) 1.2 3.2 2.306 0.8 2.6 1.84 3.6 1.5 5.09 2.2 2.8 3.47 4.3 8.9 5.9 0.5 1.1 1.494 Recap from Lecture 2 We determined the slope of the regression line and Y-intercept of the regression line. The linear correlation coefficient is just the standardised slope of the simple linear regression line. = 0.914, = 1.16 Budget (x) Revenue (y) Predicted Error (yb – y) Error revenue (yb) 1.2 3.2 2.306 -0.894 0.8 2.6 1.84 -0.76 Predicted Observed value value 3.6 1.5 5.09 3.59 2.2 2.8 3.47 0.67 4.3 8.9 5.9 -3 0.5 1.1 1.494 0.394 Recap from Lecture 2 We determined the slope of the regression line and Y-intercept of the regression line. The linear correlation coefficient is just the standardised slope of the simple linear regression line. = 0.914, = 1.16 Budget (x) Revenue (y) Predicted Error (yb – y) Error revenue (yb) 1.2 3.2 2.306 -0.894 0.8 2.6 1.84 -0.76 Predicted Observed value value 3.6 1.5 5.09 3.59 2.2 2.8 3.47 0.67 4.3 8.9 5.9 -3 How do we find the regression coefficients such 0.5 1.1 that the prediction error is minimised? 1.494 0.394 Learning objectives To describe the different measures of error. To describe the objectives of supervised machine learning. To show how to choose the regression coefficients in order to fit the data. Outline Introduction to the different measures of error: Sum of squared errors, sum of squared residuals, total sum of squares. Calculate the regression coefficients for a simple linear regression model using the linear least squares method. Derive the ordinary least squares coefficients formula. Introduction to the objectives of supervised machine learning: Trade off between a model’s interpretation and prediction. Modeling best practices. Minimising the error function: Linear regression For one observation, the error function is 60 Illustration yb 40 Y 20 0 10 20 30 X Minimising the error function: Linear regression For one observation, the error function is 60 Illustration Example yb How do we find the regression coefficients such Error (yb – y) 40 that the prediction error is minimised? Y 20 -0.894 -0.76 0 10 20 30 3.59 X 0.67 -3 0.394 Minimising the error function: Linear regression For one observation, the error function is 60 Illustration Example yb Error (yb – y) 40 Y 20 -0.894 -0.76 0 10 20 30 3.59 X 0.67 Thus, we are interested in the positive -3 version the error function, which can tell us 0.394 the error magnitude. Minimising the error function: Linear regression For one observation, the error function is 60 Illustration Example yb Error (yb – y) 40 Y The Mean Squared Error (MSE) function is 20 the sum of squared errors divide by the -0.894 number of values: -0.76 0 10 20 30 min 3.59 X , 0.67 We try to optimise such that we -3 minimise the distance between the predicted 0.394 values and observed values. Linear regression Linear regression: Python Import the class containing the regression method from sklearn.linear_model import LinearRegressio Create an instance of the class LR = LinearRegression() Fit the instance on the data and then predict the expected value LR = LR.fit(X_train, Y_train) y_predict = LR.predict(X_test) n Linear least squares method Given (x1, y1), …, (xn, yn) data points, “best” fit to the data. Let’s use an example with four data points: (1, 6), (2, 5), (3, 7), (4, 10). We want to find a line that best fits these four points. In other words, we would like to find the regression coefficients and that approximately solves the following overdetermined* linear system of four equations and two unknowns: * A system of equations is considered overdetermined if there are more equations that unknowns. Linear least squares method (cont’d) Epsilon represents the error, at each point, between the curve fit and the data. The linear least squares approach to solving the above problem is to try to make the sum of squares of these errors as small as possible—that is, we want to find the minimum of the function: Linear least squares method (cont’d) We want to find the minimum of the function: The minimum is determined by calculating the partial derivatives of with respect to and and setting them to zero: 154 This results in a system of two equations and two unknowns, called the normal equations: 154 Linear least squares method (cont’d) We can solve the equations using the elimination method. Eq. (1) Eq. (2) Multiply Eq. (1) by 3: Subtract the two equations to eliminate : Substitute value of into one of the equations to get : Least squares coefficients formula We can us these handy equations: It is the ratio of two quantities: The sum of the deviation of x from its mean times Slope of the the deviation of y from its mean in the sample, over regression line: all the observations. The sum the squared deviation of x from its mean over all observations. Y-intercept of the regression line: Where do these formulas come from? Deriving the least squares coefficients formula Ordinary least squares: choose and to minimise the sum of squared errors (i.e., prediction mistakes) in the sample. sWe calculate the difference between the observed value and predicted value. Then, we square those differences (i.e., errors) and add them up over all n observations in the sample. We want to minimise the right hand side of the above equation. We need to choose the values of and to make that overall sum as small as possible. To find the minimum, we take the derivatives with respect to and and set them to 0. Taking the derivative with respect to Eq. 1 When you take a derivative, the derivative of the sum is the sum of the derivative. First, the power rule. Take the 2 down and subtract 1 from it. Second, apply the chain rule. Take the derivative of w.r.t. inside the parenthesis and bring negative 1 out according to the chain rule. Taking the derivative with respect to Eq. 2 When you take a derivative, the derivative of the sum is the sum of the derivative. First, the power rule. Take the 2 down and subtract 1 from it. Second, apply the chain rule. Take the derivative of w.r.t. inside the parenthesis and bring the negative xi out according to the chain rule. A reminder of some useful definitions We note that: and similarly: Also: Exponent rule: A reminder of some useful definitions (cont’d) Also: Eq. Solving 1 This implies that the ordinary least squares regression line passes through the means of x and y. Eq. Solving 2 Divide both sides by -2 Expand the brackets. Substitute using our definitions. Expand the brackets. Continue in the next slide … Eq. Solving 2 From our definitions: 1. 2. 😇 Slope of the Y-intercept of the regression line: regression line: Understanding our model 60 yb 40 Y mean of Y 20 0 10 20 30 X Understanding our model: variations 60 is the unexplained yb variation. 40 Y mean of Y 20 0 10 20 30 X Understanding our model: variations 60 is the unexplained yb variation. 40 Y mean of Y is the explained 20 variation. 0 10 20 30 total variation = unexplained variation + explained variation X Understanding our model: variations 60 is the unexplained yb variation. 40 Y mean of Y is the explained 20 variation. 0 10 20 30 total variation = unexplained variation + explained variation X Sum of Squared Residual Sum of Squared Error Total Sum of Squares Coefficient of Determination Using models Using models: prediction Prediction The primary objective is to make the best prediction. In , , the prediction approach compares yp with y. The focus is on performance metrics, which measure the quality of the model’s predictions. Performance metrics usually involve some measure of closeness between yp and y. Without focusing on interpretability, we risk having a Black-box model. Example prediction exercises: x = customer purchase history, y = customer churn; focus on predicting customer churn. x = financial information, y = flagged default/non-fault; focus on predicting loan default. x = purchase history, y = next purchase; focus on predicting the next purchase. Example: Regression for prediction The target is the number of car sharing memberships and our features include the year and membership Suppose we fit our model based on data on car sharing membership and obtain estimates of parameters Our primary aim may be prediction, in which case we are more focused on generating values for yp than in interpreting parameters... Example: Regression for prediction (cont’d) We use characteristics to predict car sharing memberships, focusing on how accurately we are able to predict. We use a line plot to visualise the predicted and observed values without any focus on interpretability. The closer the predicted values are to the observed values, the more accurate the prediction is. The further the predicted values are to the observed values, the less accurate the prediction is. Using models: interpretation Interpretation The primary objective is to train a model to find insights from the data. In , the interpretation approach uses to give us insight into a system. Common workflow: Gather x, y: Train model by finding the that gives the best prediction Focus on (rather than yp) to generate insights. Example interpretation exercises: x = customer demographics, y = sales data; Examine to understand loyalty by segment. x = car safety features, y = traffic accidents; Examine to understand what makes cars safer. x = marketing budget, y = movie revenue; Examine to understand marketing effectiveness. Example: Regression for interpretation The target is the price of housing and our features include characteristics about the house and area Suppose we fit our model based on data on housing sales and obtain estimates of parameters These parameters represent coefficients relating the features x with expected target values We can interpret our results to learn about feature importance.... Example: Regression for interpretation (cont’d) Which features are most important Overall quality Living area Year built Note: Feature importance can be in a negative direction, e.g., crime rate. We take the absolute value to figure out which one is important.... ? Modeling best practices Establish what is the cost function we want to minimise? This will give us the method to compare the strength of one model with another model. Develop multiple models. Different hyper-parameters, different model. See which one of the models will give us the best prediction. Compare the results and choose the best one according to our cost function. Linear regression: The Syntax Import the class containing the regression method from sklearn.linear_model import LinearRegressio Create an instance of the class LR = LinearRegression() Fit the instance on the data and then predict the expected value LR = LR.fit(X_train, Y_train) y_predict = LR.predict(X_test) n Lessons learned We presented the different measures of error and described the linear least squares method within linear regression. We showed how to calculate the regression coefficients using the linear least squares method using an example and derived the ordinary least squares coefficients formula. We presented the objectives of supervised learning, described the difference between interpretation and prediction and best practices for modeling. Learning from Data Lecture 3 Dr Marcos Oliveira

Learning from Data Lecture 3 PDF

Document Details

Tags

Related

Summary

Full Transcript