Linear Regression PDF
Document Details
Uploaded by QuickerEucalyptus
Tags
Summary
This document explains linear regression, a statistical method for modeling the relationship between a dependent variable and one or more independent variables. It covers different aspects of linear regression, including how it works, the calculation of r-squared, and an alternative technique: gradient descent.
Full Transcript
Linear Regression Linear Regression Fit a line to a data set of observations Use this line to predict unobserved values I don’t know why they call it “regression.” It’s really misleading. You can use it to predict points in the future, the past, whatever. In fact time usually h...
Linear Regression Linear Regression Fit a line to a data set of observations Use this line to predict unobserved values I don’t know why they call it “regression.” It’s really misleading. You can use it to predict points in the future, the past, whatever. In fact time usually has nothing to do with it. Linear Regression: How does it work? Usually using “least squares” Minimizes the squared-error between each point and the line Remember the slope-intercept equation of a line? y=mx+b The slope is the correlation between the two variables times the standard deviation in Y, all divided by the standard deviation in X. ▫ Neat how standard deviation how some real mathematical meaning, eh? The intercept is the mean of Y minus the Linear Regression: How does it work? Least squares minimizes the sum of squared errors. This is the same as maximizing the likelihood of the observed data if you start thinking of the problem in terms of probabilities and probability distribution functions This is sometimes called “maximum likelihood estimation” More than one way to do it Gradient Descent is an alternate method to least squares. Basically iterates to find the line that best follows the contours defined by the data. Can make sense when dealing with 3D data Easy to try in Python and just compare the results to least squares Measuring error with r-squared How do we measure how well our line fits our data? R-squared (aka coefficient of determination) measures: The fraction of the total variation in Y that is captured by the model Computing r-squared 1.0 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 𝑠𝑢𝑚 𝑜𝑓𝑒𝑟𝑟𝑜𝑟𝑠 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑟𝑜𝑚 - 𝑚𝑒𝑎𝑛 Interpreting r-squared Ranges from 0 to 1 0 is bad (none of the variance is captured), 1 is good (all of the variance is captured). Let’s play with an example.