Regression Analysis PDF

Regression analysis 의료 인공지능 So Yeon Kim *Slides adapted from Roger Grosse Problem Setup Want to predict a scalar t as a function of a scalar x 𝑖 𝑖 𝑁 Given a dataset of pairs 𝑥 ,𝑡 𝑖=1 𝑖...

Regression analysis 의료 인공지능 So Yeon Kim *Slides adapted from Roger Grosse Problem Setup Want to predict a scalar t as a function of a scalar x 𝑖 𝑖 𝑁 Given a dataset of pairs 𝑥 ,𝑡 𝑖=1 𝑖 𝑖 The 𝑥 are called inputs, and the 𝑡 are called targets 2 Problem Setup Model: y is a linear function of x: 𝑦 = 𝑤𝑥 + 𝑏 y is the prediction w is the weight b is the bias w and b are the parameters Settings of the parameters are called hypotheses 3 Problem Setup Loss function: squared error (says how bad the fit is) y −t is the residual, and we want to make this small in magnitude The 1/2 factor is just to make the calculations convenient 4 Problem Setup Cost function: loss function averaged over all training examples 5 Problem Setup Loss function: squared error (says how bad the fit is) Cost function: loss function averaged over all training examples 6 Problem Setup Suppose we have multiple inputs x1,...,xD. This is referred to as multivariable regression. 7 Optimization We’d like to minimize a cost function Solution: Gradient descent! 8 Polynomial regression Suppose we want to model the following data Polynomial regression: 9 Polynomial regression 10 Polynomial regression 11 Polynomial regression 12 Polynomial regression 13 Linear classification? Binary linear classification classification: predict a discrete-valued target binary: predict a binary target linear: model is a linear function of x, thresholded at zero 14 Logistic Regression We can’t optimize classification accuracy directly with gradient descent because it’s discontinuous. Instead, we typically define a continuous surrogate loss function which is easier to optimize. Logistic regression is a canonical example of this, in the context of classification. The model outputs a continuous value, which you can think of as the probability of the example being positive. 15 Logistic Regression The logistic function is a kind of sigmoidal, or S-shaped, function: A linear model with a logistic nonlinearity is known as log-linear: σ is called an activation function, and z is called the logit 16 Logistic Regression 𝑒 𝛽0 +𝛽1 𝑋 𝑃(𝑌 = 1 | 𝑋) = 1+𝑒 𝛽0 +𝛽1 𝑋 We need to determine 𝛽 like we did in a linear regression. multivariate logistic regression: y=1 𝑒 𝛽0 +𝛽1 𝑥𝑖 +𝛽2 𝑥2 +⋯ 𝑃(𝑋) = 1+𝑒 𝛽0 +𝛽1 𝑥𝑖 +𝛽2 𝑥2 +⋯ y = 0.5 y=0 17 Generalization Underfitting : model is too simple — does not fit the data. Overfitting : model is too complex — fits perfectly, does not generalize. 18 Generalization 19 Generalization We would like our models to generalize to data they haven’t seen before The degree of the polynomial is an example of a hyperparameter We can tune hyperparameters using a validation set 20 L2 Regularization Rather than restricting the size of the model, regularize it Observation: polynomials that overfit often have large coefficients. 21 L2 Regularization Observation: polynomials that overfit often have large coefficients. We can encourage the weights to be small by choosing as our regularizer the L2 penalty. The regularized cost function makes a tradeoff between fit to the data and the norm of the weights 22 Summary choose a model and a loss function formulate an optimization problem solve the optimization problem using one of two strategies direct solution (set derivatives to zero) gradient descent improve the generalization by adding a regularizer 23 Model evaluation 1 2 MSE (Mean squared error) = σ𝑖 𝑦𝑖 − 𝑦ෝ𝑖 𝑛 RMSE (Root mean squared error) = MSE 1 MAE (Mean absolute error) = σ𝑖 |𝑦𝑖 − 𝑦ෝ𝑖 | 𝑛 Use cross-validation to minimize the loss (will be covered in next lecture) 24 Goodness-of-fit R2 (coefficient of determination) assess the goodness-of-fit in a regression model 0% : a model that does not explain any of the variation in the response variable around its mean. 100%: a model that explains all the variation in the response variable around its mean R2 = 0.95 R2 = 0.5 25 Goodness-of-fit p-value is a measure of evidence against the null hypothesis that the regression coefficient is zero When the p-value < 0.05, it indicates that the regression coefficient is statistically significant (evidence to reject the null hypothesis) R2 P-value 𝑥1 -1.8 0.0001 𝑥2 4.6 0.7382 𝑥3 2.5 0.0011 𝑥4 -6.4 4.2e-6 26 Thank You! Q&A 27

Regression Analysis PDF

Document Details

Tags

Related

Summary

Full Transcript