Topic-3,4,5. Supervised Learning - Regression - clear.pptx
Document Details
Uploaded by Deleted User
Tags
Full Transcript
COE101 Introductory Artificial Intelligence College of Engineering Supervised Learning: Regression Supervised Learning Classical Machine Learning What is Supervised Learning? Definition The machine has a "supervisor" or a "teacher" who gives the machine all the answers The...
COE101 Introductory Artificial Intelligence College of Engineering Supervised Learning: Regression Supervised Learning Classical Machine Learning What is Supervised Learning? Definition The machine has a "supervisor" or a "teacher" who gives the machine all the answers The teacher has already labeled the data into classes The machine will learn faster with a teacher More commonly used in real-life Example: a student is given 50 problems and their solutions You choose the function that you think connects the input x to the output y in y=f(x), but it is incomplete. You want AI to complete it for you Types of Supervised Learning Regression Prediction of a specific point on a numeric axis. Classification Prediction of an object's category Regression vs. Classification Supervised Learning - Regression What is Regression Definition Find a function (e.g., a line) to model the data (go through it) In Regression we predict a number instead of category Examples Voltage Temperature Processes, memory Power consumption Protein structure Energy Robot arm controls Torque at effector Location, industry, past losses Premium Types of Regression Linear Regression: When we believe the function is a straight line The basic idea in linear regression is to add up the effects of each of the feature variables to produce the predicted value. Nonlinear Regression: When we believe the function is not a line For example, the function is curved like a polynomial Linear Regression What is Linear Regression - Simplified Thinking of linear regression as a shopping bill Suppose you go to the grocery store and buy 2.5kg potatoes, 1.0kg carrots, and two bottles of milk. Shopping Bill = Amount of Potatoes (per kg) x 2 € + Amount of Carrots (per kg) x 4 € + Milk Bottles x 3 € Total = 2.5 × 2€ + 1.0 × 4€ + 2 × 3€ = 15€. In linear regression, the amount of potatoes, carrots, and milk are the inputs in the data. The output is the cost of your shopping, which clearly depends on both the price and how much of each product you buy. Linear Regression Simple vs Multiple Regression Simple linear regression uses one independent variable to explain or predict the outcome of the dependent variable Y Y=A X+B Multiple linear regression uses two or more independent variables to predict the outcome Y = A X1 + B X2 + C Linear Regression Equation Y = AX + B y = mx + b x= independent variable (known value) y = dependent variable (predicted value) A = y-axis intercept B = slope of the line Linear Regression Example #1 Revenue = 20,000 + 3 (Ad Spending) Y = 20,000 + 3 X The coefficient A would represent total expected revenue when ad spending is zero. The coefficient B would represent the average change in total revenue when ad spending is increased by one unit (e.g. one dollar). If B is negative, it would mean that more ad spending is associated with less revenue. If B is close to zero, it would mean that ad spending has little effect on revenue. Depending on the value of B, a company may decide to either decrease or increase their ad spending. Linear Regression Example #2 Blood Pressure = 120+ 1000(Dosage (mL)) The coefficient A would represent the expected blood pressure when dosage is zero. The coefficient B would represent the average change in blood pressure when dosage is increased by one unit. If B is negative, it would mean that an increase in dosage is associated with a decrease in blood pressure. If B is close to zero, it would mean that an increase in dosage is associated with no change in blood pressure. If B is positive, it would mean that an increase in dosage is associated with an increase in blood pressure. Depending on the value of B, researchers may decide to change the dosage given to a patient. Linear Regression Example #3 Crop yield = A + B(amount of fertilizer ~ X1) + C(amount of water ~ X2) The coefficient A would represent the expected crop yield with no fertilizer or water. The coefficient B would represent the average change in crop yield when fertilizer is increased by one unit, assuming the amount of water remains unchanged. The coefficient C would represent the average change in crop yield when water is increased by one unit, assuming the amount of fertilizer remains unchanged. Depending on the values of B and C, the scientists may change the amount of fertilizer and water used to maximize the crop yield. Linear Regression Example #4 Points Scored = A + B(yoga sessions~ X1) + C(weightlifting sessions~X2) The coefficient A would represent the expected points scored for a player who participates in zero yoga sessions and zero weightlifting sessions. The coefficient B would represent the average change in points scored when weekly yoga sessions is increased by one, assuming the number of weekly weightlifting sessions remains unchanged. The coefficient C would represent the average change in points scored when weekly weightlifting sessions is increased by one, assuming the number of weekly yoga sessions remains unchanged. Depending on the values of B and C, the data scientists may recommend that a player participates in more or less weekly yoga and weightlifting sessions in order to maximize their points scored. Evaluation Regression Loss Functions In Machine Learning, our main goal is to minimize the error which is defined by the Loss Function. Loss Functions Root Mean Sum of Sum of Mean Sum of Squared Absolute Squared Squared Errors (SE) Errors Errors (SAE) Errors (SSE) Errors (MSE) (RMSE) Linear Regression Sum of Errors (SE) Error will be the difference in the predicted value and the actual value. (X=5,Y_actual=7) (X=5,Y_predicted= 2(X)+ 10 = 20) comes from data/table From the line (AI) Golden Truth, Actual, Tru = 13 Example: Readings are 40, 50 and 65 where true values are 45 50 and 60 respectively SE = (40 – 45) + (50 – 50) + (65 – 60) SE = -5 + 0 + 5 = 0 (Loss) Misleading me to believe my AI is great when Linear Regression Sum of Absolute Errors (SAE) Take the absolute values of the errors for all iterations. SAE = | 40 – 45| + | 50 – 50 | + | 65 – 60| SAE = 5 + 0 + 5 = 10 Linear Regression Sum of Squared Errors (SSE) Take the squares instead of the absolutes. The loss function will now become: SSE = (40 – 45)^2 + (50 – 50 )^2 + (65 – 60 )^2 SSE = 25 + 0 + 25 = 50 Linear Regression Mean Squared Errors (MSE) We take the average or mean of SSE. So more the data, lesser will be the aggregated error, MSE. MSE = (1/3) [ (40 – 45)^2 + (50 – 50 )^2 + (65 – 60 )^2 ] MSE = 25 + 0 + 25 = 50/3 = 16.67 Linear Regression Root Mean Squared Error (RMSE) We take the root of the MSE — which is the Root Mean Squared Error:. RMSE = sqrt( (1/3) [ (40 – 45)^2 + (50 – 50 )^2 + (65 – 60 )^2 ] ) RMSE = sqrt(25 + 0 + 25) = sqrt(50/3) = sqrt(16.67) = 4.08 Always use this one! Linear Regression – 1. Representation Linear Regression # X Y (Age) (Cats) Example 1 25 2 2 30 2 3 19 1 4 5 1 5 80 5 6 70 6 7 65 4 8 28 2 9 42 3 10 39 3 11 12 2 12 55 4 13 13 1 14 45 2 15 22 1 Linear Regression X Y # XY Example 1 (Age) (Cats) 25 2 2 30 2 3 19 1 4 5 1 5 80 5 6 70 6 7 65 4 8 28 2 9 42 3 10 39 3 11 12 2 12 55 4 13 13 1 14 45 2 𝑦 = 𝐴+ 𝐵𝑥 15 Sum 22 1 Linear Regression – 2. Optimization Linear Regression X Y # XY Example 1 (Age) (Cats) 25 2 50 625 4 2 30 2 60 900 4 1. Age ( independent variable) x 3 19 1 19 361 1 2. Cats ( dependent variable) y 4 5 1 5 25 1 5 80 5 400 6400 25 3. Product of “x” and “y” values XY 6 70 6 420 4900 36 4. Square of “x” value X^2 7 65 4 260 4225 16 5. Square of “y” value Y^2 8 28 2 56 784 4 9 42 3 126 1764 9 10 39 3 117 1521 9 11 12 2 24 144 4 12 55 4 220 3025 16 13 13 1 13 169 1 14 45 2 90 2025 4 15 22 1 22 484 1 Linear Regression X Y # XY Example 1 (Age) (Cats) 25 2 50 625 4 2 30 2 60 900 4 1. Age ( independent variable) x 3 19 1 19 361 1 2. Cats ( dependent variable) y 4 5 1 5 25 1 5 80 5 400 6400 25 3. Product of “x” and “y” values XY 6 70 6 420 4900 36 4. Square of “x” value X^2 7 65 4 260 4225 16 5. Square of “y” value Y^2 8 28 2 56 784 4 9 42 3 126 1764 9 6. Total of the values of the numbers in 10 39 3 117 1521 9 each column Sum 11 12 2 24 144 4 12 55 4 220 3025 16 13 13 1 13 169 1 14 45 2 90 2025 4 15 22 1 22 484 1 Sum 550 39 1882 27352 135 Linear Regression - Optimization Linear Regression X Y # XY Example 1 (Age) (Cats) 25 2 50 625 4 2 30 2 60 900 4 3 19 1 19 361 1 4 5 1 5 25 1 5 80 5 400 6400 25 6 70 6 420 4900 36 A = 0.29344962 7 65 4 260 4225 16 8 28 2 56 784 4 9 42 3 126 1764 9 10 39 3 117 1521 9 11 12 2 24 144 4 12 55 4 220 3025 16 13 13 1 13 169 1 B = 0.0629059 14 45 2 90 2025 4 15 22 1 22 484 1 Sum 550 39 1882 27352 135 Linear Regression X Y # XY Example 1 (Age) (Cats) 25 2 50 625 4 2 30 2 60 900 4 A = 0.29344962 3 19 1 19 361 1 4 5 1 5 25 1 B = 0.0629059 5 80 5 400 6400 25 6 70 6 420 4900 36 7 65 4 260 4225 16 8 28 2 56 784 4 y = A + Bx 9 42 3 126 1764 9 10 39 3 117 1521 9 y = 0.293 + 0.0629x 11 12 2 24 144 4 12 55 4 220 3025 16 13 13 1 13 169 1 14 45 2 90 2025 4 15 22 1 22 484 1 Sum 550 39 1882 27352 135 Linear Regression – 3. Evaluation Linear Regression y Y (Cats) Example # X (Age) (Actual) (Predict ed) 1 25 2 1.8655 0.01809 y = A + Bx 2 30 2 2.18 0.0324 0.23824 y = 0.293 + 0.0629x 3 19 1 1.4881 2 0.15405 4 5 1 0.6075 6 MSE = 0.344435 0.10562 5 80 5 5.325 5 1.70041 6 70 6 MSE = sqrt(0.344435) = 0.586885849 4.696 6 MSE 0.59 0.14554 7 65 4 4.3815 2 0.00293 8 28 2 2.0542 8 0.00425 9 42 3 2.9348 1 Goodness of Fit: Overfitting, Underfitting, and Generalization Overfitting Which one is the best? Age Versus Cat Ownership Age Versus Cat Ownership Age Versus Cat Ownership 7 7 7 6 6 6 Number of Cats 5 Number of Cats Number of Cats 5 5 4 4 4 3 3 3 2 2 2 1 1 1 0 0 10 20 30 40 50 60 70 80 90 0 0 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 Age (yrs) Age (yrs) Age (yrs) Model Prediction Error (MPE): Error between predicted and actual output on training samples Performance Variance (PV): Difference in model performance on training and Is itsamples testing the one with the best fit to the data? Good Fit Underfitting Overfitting Age Versus Cat Ownership Age Versus Cat Ownership Age Versus Cat Ownership 7 7 7 6 6 6 Number of Cats 5 Number of Cats Number of Cats 5 5 4 4 4 3 3 3 2 2 2 1 1 1 0 0 10 20 30 40 50 60 70 80 90 0 0 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 Age (yrs) Age (yrs) Age (yrs) Goodness of Fit How well is it going to predict future data? Age Versus Cat Ownership Age Versus Cat Ownership Age Versus Cat Ownership 7 7 7 6 6 6 Number of Cats 5 Number of Cats Number of Cats 5 5 4 4 4 3 3 3 2 2 2 1 1 1 0 0 10 20 30 40 50 60 70 80 90 0 0 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 Age (yrs) Age (yrs) Age (yrs) Generalization What is Generalization For any real-world problem with inputs and output data, we can map these inputs to the output using a function The goal of supervised machine learning model is to produce a model that understand the function between input and output for training data but also generalize this function so that it can work with new unseen data with good accuracy Generalization Example What is Model Prediction Error? High Model Prediction Error means the model has created a function that fails to understand the relationship between input and output data Low Model Prediction Error means the model has created a function that has understood the relationship between input and output data What is Performance Variance? The variance of the machine learning model is the amount by which its performance varies in the future Low Performance Variance means, the performance of the machine learning model does not vary much with the different data set High Performance Variance means, the performance of the machine learning model varies considerably with different data set Good Fit Model Well-trained model A well-trained model should have low variance and low error. This is also known as Good Fit A good fit function is actually very close to the actual true function that generalizes the data distribution This means a good fit model should be generalized enough to work with any unseen data (low performance variance) and at the same time should produce low prediction error (low model prediction error) A good fit model is what we try to achieve during the training phase Model Prediction Error Performance Variance Goodness of (How bad did I score (How different is my own Fit? myself at home?) scoring at home from that of my teacher’s in the exam?) Good Fit I solve problems. Many times I go to the exam. I believed at (Our Target) I get the right answers. home I can score 80 from my Sometimes I do some errors. own assessment and I get 78 in Low Model Prediction the test Error Low Performance Variance Dangerous Because Overfitting I memorized 10 problems I go to the exam. I thought I will Undetectab (Misleading. I can and their solutions. I do 0 get 100%. I got 60 because I le discover this after mistakes in any of these saw problems I did not the product is problems memorize Not a big out) Low Model Prediction High Performance Variance deal Error Since it is Detectable Underfiting I seem to have studied and I go to the exam and do the (I can discover developed some picture. But same kind of performance I do this and solve it I still do a lot of mistakes. I at home (same mistakes) I get Overfitting & Underfitting How to detect them? How to handle them? Overfitting introduces the problem of high Handle Handle variance and underfitting results in high Overfittin Underfitti model prediction error and thus resulting g ng in a bad model. We can identify these models during Increase Increase training and testing phase itself. Training Training Data Data If a model is showing high accuracy during the training phase but fails to Reduce Increase show similar accuracy during the Model Complexit testing phase it indicates overfitting Complexit y of Model y If a model fails to show satisfactory accuracy during the training phase Remove itself it means the model is Noise from underfitting. Data Overfitting vs. Underfitting Example: A class consisting of 3 students & a Professor Overfitting vs. Underfitting Example Overfitting vs. Underfitting Example Overfitting vs. Underfitting Example Overfitting vs. Underfitting Example Overfitting vs. Underfitting Example Overfitting vs. Underfitting Example Overfitting vs. Underfitting Example Overfitting vs. Underfitting Detection through Visual Inspection Cross Validation for Regression Cross Validation for Regression Techniques Leave K-Fold Test Set One Out Cross Method Cross Validation Validation Test Set Method Cross Validation Methods Test Set Method 1. Randomly choose 30% of the data to be in a test set. 2. The remainder is a training set. 3. Perform your regression on the training set. 4. Estimate your future performance with the test set. Cross Validation Methods Test Set Method 1. Randomly choose 30% of the data to be in a test set. 2. The remainder is a training set. 3. Perform your regression on the training set. 4. Estimate your future performance with the test set. Cross Validation Methods Test Set Method – the Wrong Way – Why? # X Y (Age) (Cats) 1 25 2 1. Choose 30% of the data to be in a test set. 2 30 2 Training 2. The remainder is a training set. 3 19 1 3. Perform your regression on the training set. 4 5 1 5 80 5 4. Estimate your future performance with the test set. 6 70 6 7 65 4 8 28 2 9 42 3 10 39 3 11 12 2 12 55 4 Test 13 13 1 14 45 2 15 22 1 Cross Validation Methods Test Set Method # X Y (Age (Cat XY ) s) A = 0.504657584 1 25 2 50 625 4 B = 0.061322329 2 30 2 60 900 4 Training 3 19 1 19 361 1 4 5 1 5 25 1 y = A + Bx 5 80 5 400 6400 25 6 70 6 420 4900 36 y = 0.505 + 0.0613x 7 65 4 260 4225 16 8 28 2 56 784 4 9 42 3 126 1764 9 10 39 3 117 1521 9 11 12 2 24 144 4 Test 12 55 4 13 13 1 14 45 2 15 22 1 Cross Validation Methods y X Y Test Set Method # (Age) (Cats) (Predicte d) 1 25 2 A = 0.504657584 2 30 2 B = 0.061322329 3 19 1 Training 4 5 1 5 80 5 y = A + Bx 6 70 6 7 65 4 y = 0.505 + 0.0613x 8 28 2 9 42 3 10 39 3 11 12 2 0.01503 Test 12 55 4 3.877386 4 0.09111 13 13 1 1.301848 2 1.59810 14 45 2 Cross Validation Methods Test Set Method – the Right Way X Y # (Age) (Cats) 1. Randomly choose 30% of the data to be in a test set. 2 30 2 9 42 3 Training 2. The remainder is a training set. 4 5 1 3. Perform your regression on the training set. 7 65 4 6 70 6 4. Estimate your future performance with the test set. 15 22 1 5 80 5 8 28 2 13 13 1 11 12 2 3 19 1 Test 1 25 2 12 55 4 10 39 3 14 45 2 Test Set Method Pros & Cons Very Simple Pros Can then simply choose the method with the best test-set score Wastes Data: we get an estimate of the best method to apply to 30% less data Cons If we don’t have much data, our test-set might just be lucky or unlucky (“test-set estimator of performance has high variance”) K-Fold Cross Validation Methods K-Fold Cross Validation Cross Validation Methods Fold #1 5-Fold Cross Validation # X Y (Age) (Cats) Test 1 25 2 1. Split the data into 5 groups. 2 30 2 2. For each unique group: 3 19 1 1. Take the group as a hold out or test data set 4 5 1 2. Take the remaining groups as a training data 5 80 5 set 6 70 6 Training 3. Perform your regression on the training set 7 65 4 and evaluate it on the test set 8 28 2 9 42 3 10 39 3 11 12 2 12 55 4 13 13 1 14 45 2 15 22 1 Fold #1 Cross Validation Methods y X Y 5-Fold Cross Validation # (Age) (Cats) (Predicte d) 0.00452 A = 0.39759 1 25 2 1.932722 6 B = 0.061405 2 30 2 0.05747 2.239749 9 0.31842 3 19 1 y = A + Bx 4 5 1 1.564291 4 y = 0.39759 + 0.061405x 5 6 80 70 5 6 7 65 4 8 28 2 9 42 3 10 39 3 11 12 2 12 55 4 13 13 1 Cross Validation Methods Fold #2 5-Fold Cross Validation # X Y (Age) (Cats) Test Training 1 25 2 1. Split the data into 5 groups. 2 30 2 2. For each unique group: 3 19 1 1. Take the group as a hold out or test data set 4 5 1 2. Take the remaining groups as a training data 5 80 5 set 6 70 6 3. Perform your regression on the training set 7 65 4 and evaluate it on the test set 8 28 2 9 42 3 Training 10 39 3 11 12 2 12 55 4 13 13 1 14 45 2 15 22 1 Fold #2 Cross Validation Methods y X Y 5-Fold Cross Validation # (Age) (Cats) (Predicte d) 1 25 2 A = 0.41913 2 30 2 B = 0.055621 3 19 1 0.09166 4 5 1 0.697237 6 y = A + Bx 5 80 5 4.868839 0.01720 3 y = 0.41913 + 0.055621x 6 70 6 4.312626 2.84723 2 7 65 4 8 28 2 9 42 3 10 39 3 11 12 2 12 55 4 13 13 1 Cross Validation Methods Fold #3 5-Fold Cross Validation # X Y (Age) (Cats) 1 25 2 Training 1. Split the data into 5 groups. 2 30 2 2. For each unique group: 3 19 1 1. Take the group as a hold out or test data set 4 5 1 2. Take the remaining groups as a training data 5 80 5 set 6 70 6 3. Perform your regression on the training set 7 65 4 Test and evaluate it on the test set 8 28 2 9 42 3 10 39 3 Training 11 12 2 12 55 4 13 13 1 14 45 2 15 22 1 Fold #3 Cross Validation Methods y X Y 5-Fold Cross Validation # (Age) (Cats) (Predicte d) 1 25 2 A = 0.264577 2 30 2 B = 0.064639 3 4 19 5 1 1 5 80 5 y = A + Bx 6 70 6 0.21724 7 65 4 y = 0.264577 + 0.064639x 4.466095 4 0.00554 8 28 2 2.074462 5 0.00042 9 42 3 2.979404 4 10 39 3 11 12 2 12 55 4 13 13 1 Cross Validation Methods Fold #4 5-Fold Cross Validation # X Y (Age) (Cats) 1 25 2 1. Split the data into 5 groups. 2 30 2 2. For each unique group: 3 19 1 1. Take the group as a hold out or test data set Training 4 5 1 2. Take the remaining groups as a training data 5 80 5 set 6 70 6 3. Perform your regression on the training set 7 65 4 and evaluate it on the test set 8 28 2 9 42 3 10 39 3 Training Test 11 12 2 12 55 4 13 13 1 14 45 2 15 22 1 Fold #4 Cross Validation Methods y X Y 5-Fold Cross Validation # (Age) (Cats) (Predicte d) 1 25 2 A = 0.060635 2 30 2 B = 0.065929 3 4 19 5 1 1 5 80 5 y = A + Bx 6 70 6 7 65 4 y = 0.505 + 0.0613x 8 28 2 9 42 3 0.13552 10 39 3 2.631858 9 1.31840 11 12 2 0.851781 8 0.09814 12 55 4 3.686718 6 13 13 1 Cross Validation Methods Fold #5 5-Fold Cross Validation # X Y (Age) (Cats) 1 25 2 1. Split the data into 5 groups. 2 30 2 Training 2. For each unique group: 3 19 1 1. Take the group as a hold out or test data set 4 5 1 2. Take the remaining groups as a training data 5 80 5 set 6 70 6 3. Perform your regression on the training set 7 65 4 and evaluate it on the test set 8 28 2 9 42 3 10 39 3 11 12 2 12 55 4 Test 13 13 1 14 45 2 15 22 1 Fold #5 Cross Validation Methods y X Y 5-Fold Cross Validation # (Age) (Cats) (Predicte d) 1 25 2 A = 0.50274 2 30 2 B = 0.061632 3 4 19 5 1 1 5 80 5 y = A + Bx 6 70 6 7 65 4 y = 0.50274 + 0.061632x 8 9 28 42 2 3 10 39 3 11 12 2 12 55 4 0.09239 13 13 1 1.303958 1 1.62865 14 45 2 3.276188 5 Cross Validation Methods Overall Test MSE # X Y (Age (Cat ) s) Fold5 Fold4 Fold3 Fold2 Fold1 1 25 2 2 30 2 0.127 3 19 1 4 5 1 5 80 5 0.985 6 70 6 7 65 4 8 28 2 0.074 9 42 3 10 39 3 11 12 2 0.517 12 55 4 13 13 1 14 45 2 0.819 15 22 1 Leave One Out Cross Validation : n-Fold CV Model 1 (A1, B1) Performance RMSE 1 Model 2 (A2, B2) Performance RMSE 2 Model 3 (A3, B3) Performance RMSE 3 Model 4 (A4, B4) Performance RMSE 4 Model 5 (A5, B5) Performance RMSE 5 RMSE (Average, Standard Deviation Model 6 (A6, B6) Performance RMSE 6 Model 7 (A7, B7) Performance RMSE 7 Model 8 (A8, B8) Performance RMSE 8 Model 9 (A9, B9) Performance RMSE 9 Model 10 (A10, B10) Performance RMSE 10 LOOCV Activity A bookstore needs to know how many Number of Books books to buy for the new students students based on previous available data: 5 6 3 4 Perform LOOCV on the dataset shown here. Find out A & B at each iteration 7 10 6 8 4 4 LOOCV Activity A = -1.5 Numbe Actual Predict r of Numbe ed B = 1.6 studen r of Numbe ts Books r of Y Books y y = A + Bx 6.5 0.5 5 6 y = -1.5 + 1.6 x 3 4 7 10 6 8 4 4 LOOCV Activity A = -4 Numbe Actual Predict r of Numbe ed B=2 studen r of Numbe ts Books r of Y Books y = A + Bx y y = -4 + 2 x 5 6 3 4 2 2 7 10 6 8 4 4 LOOCV Activity A = -0.8 Numbe Actual Predict r of Numbe ed B = 1.4 studen r of Numbe ts Books r of y = A + Bx Y Books y y = -0.8 + 1.4 x 5 6 3 4 7 10 9 1 6 8 4 4 LOOCV Activity A = -1.6 Numbe Actual Predict r of Numbe ed B = 1.6 studen r of Numbe ts Books r of y = A + Bx Y Books y y = -1.6 + 1.6x 5 6 3 4 7 10 6 8 8 0 4 4 LOOCV Activity Numbe Actual Predict A = -0.8 r of Numbe ed B = 1.48 studen r of Numbe ts Books r of Y Books y = A + Bx y 5 6 y = -0.8 + 1.48x 3 4 7 10 6 8 4 4 5.14 1.14 LOOCV Activity Numbe Actual Predict r of Numbe ed studen r of Numbe ts Books r of Y Books y 5 6 6.5 0.5 3 4 2 2 RMSE Mean 0.928 RMSE Std 0.749 7 10 9 1 6 8 8 0 4 4 5.14 1.14 Cross Validation Methods LOOCV For k=1 to N 1. Let be the kth record 2. Temporarily remove from the dataset. 3. Train on the remaining N-1 data points 4. Note your error When you’ve done all points, report the mean error. LOOCV MSE = 2.12 LOOCV MSE = 0.962 LOOCV MSE = 3.33 Cross Validation Methods Test Set vs. LOOCV vs. K-Fold Cross Validation Disadvantages Advantages When to Use? Method Test Set Variance: unreliable Cheap Lots of data estimate of future available for performance testing Leave One Out Expensive Doesn’t waste Very limited data data available K-Fold Less Expensive that Doesn’t waste Somewhere in LOOCV data between Nonlinear Regression Nonlinear Regression Popular nonlinear regression models 𝑦 =𝑎 Exponential Model Power Model ⅇ 𝑏𝑥 𝑏 𝑦 =𝑎 𝑥 Saturation Growth Model 𝑎𝑥 𝑦 = 𝑏 + 𝑥 Polynomial Model 𝑚 𝑦 =𝑎 0 + 𝑎1 𝑥 + … + 𝑎𝑚 𝑥 Regression Models Advantages & Disadvantages Regression Model Advantages Disadvantages Linear Regression Works well irrespective of the The assumptions of dataset size linear regression Gives information about the Linear Regression is relevance of features susceptible to over- Linear Regression is simple to fitting implement and easier to interpret the output coefficients. Polynomial Works on any size of the We need to choose the Regression dataset right polynomial degree Works very well on nonlinear for good model problems prediction error/ variance tradeoff Breakout Session Linear Regression Class Activity Suppose that an extensive study is carried out, and it is found that in a particular country, the life expectancy (the average number of years that people live) among non-smoking women who don't eat any vegetables is 80 years. Suppose further that on the average, men live 5 years less. Also take the numbers mentioned above: every cigarette per day reduces the life expectancy by half a year, and a handful of veggies per day increases it by one year. Calculate the life expectancies for the following example cases: For example, the first case is a male (subtract 5 years), smokes 8 cigarettes per day (subtract 8 × 0.5 = 4 years), and eats two handfuls of veggies per day (add 2 × 1 = 2 years), so the predicted life expectancy is 80 - 5 - 4 + 2 = 73 years. Male Life Expectancy = 75 - 0.5 NoCig + 1 x HandfulVeggies Female Life Expectancy = 80 - 0.5 NoCig + 1 x HandfulVeggies Linear Regression Class Activity Suppose that an extensive study is carried out, and it is found that in a particular country, the life expectancy (the average number of years that people live) among non-smoking women who don't eat any vegetables is 80 years. Suppose further that on the average, men live 5 years less. Also take the numbers mentioned above: every cigarette per day reduces the life expectancy by half a year, and a handful of veggies per day increases it by one year. Calculate the life expectancies for the following example cases: For example, the first case is a male (subtract 5 years), smokes 8 cigarettes per day (subtract 8 × 0.5 = 4 years), and eats two handfuls of veggies per day (add 2 × 1 = 2 years), so the predicted life expectancy is 80 - 5 - 4 + 2 = 73 years. Linear Regression Class Activity Your task: Predict the correct value as an integer (whole number) for the missing sections A, B, and C. Smoking (cigarettes per Vegetables (handfuls Gender Life expectancy (years) day) per day) male 8 2 73 male 0 6 female 16 1 female 0 4 Male Life Expectancy = 75 - 0.5 NoCig + 1 x HandfulVeggies Female Life Expectancy = 80 - 0.5 NoCig + 1 x HandfulVeggies Linear Regression Class Activity Your task: Predict the correct value as an integer (whole number) for the missing sections A, B, and C. Smoking (cigarettes per Vegetables (handfuls Gender Life expectancy (years) day) per day) male 8 2 73 male 0 6 A (75-0+1x6 = 81) female 16 1 B (80-0.5*16+1x1 = 73) female 0 4 C (80-0+4*1 = 84) Male Life Expectancy = 75 - 0.5 NoCig + 1 x HandfulVeggies Female Life Expectancy = 80 - 0.5 NoCig + 1 x HandfulVeggies Breakout Session Linear Regression Activity A bookstore needs to know how many Number of Books books to buy for the new students students based on previous available data? 5 6 3 4 7 10 6 8 4 4 5 7 Linear Regression Activity Students Books Students * x y Books xy 5 6 = 25 5*6 = 30 3 4 =9 3*4 = 12 7 10 = 49 7*10 = 70 6 8 = 36 6*8 = 48 -1.5 4 4 = 16 4*4 = 16 5 7 = 25 5*7 = 35 30 39 160 211 5 6.5 n=6 Linear Regression Activity Students Books Students * x y Books xy 5 6 = 25 5*6 = 30 3 4 =9 3*4 = 12 7 10 = 49 7*10 = 70 6 8 = 36 6*8 = 48 1.6 4 4 = 16 4*4 = 16 5 7 = 25 5*7 = 35 30 39 160 211 5 6.5 n=6