Linear Models - Ridge Regression (RR) PDF
Document Details
Uploaded by CleverNobelium1412
Tags
Summary
This document provides an overview of ridge regression, a regularization technique used in linear regression. It discusses the concept of ridge regression and explains how it can be used to reduce errors caused by overfitting on training data, and specifically corrects for multicollinearity in regression analysis. The document also examines the causes and detection methods of multicollinearity.
Full Transcript
Linear Models – Ridge Regression (RR) What is Ridge Regression? Ridge regression —also known as L2 regularization—is one of several types of regularization for linear regression models. Ridge regression is a statistical regularization technique. Regularization is a statist...
Linear Models – Ridge Regression (RR) What is Ridge Regression? Ridge regression —also known as L2 regularization—is one of several types of regularization for linear regression models. Ridge regression is a statistical regularization technique. Regularization is a statistical method to reduce errors caused by overfitting on training data. Also Ridge regression specifically corrects for multicollinearity in regression analysis. Ridge Regression may also be applied in logistic regression. 2 multiple-variable linear regression A standard, multiple-variable linear regression equation is: Y = b0 + b1*x1 + b2*x2 + b3*x3 + … +bn*xn Where, Y is the predicted value (dependent variable), X is any predictor (independent variable), b is the regression coefficient attached to that independent variable, and x0 is the value of the dependent variable when the independent variable equals zero (also called the y- intercept). 3 What is multicollinearity? Multicollinearity can be defined as the presence of high correlations between two or more independent variables (predictors). This is essentially a phenomenon in which independent variables are correlated with one another. The correlation between two variables can be positive (changes in one variable lead to changes in another variable in the same direction), negative (changes in one variable charge to changes in the opposite direction), or not. When more than one predictor in a regression analysis is highly correlated, they are termed multicollinear. A higher education level is generally associated with a higher income, so one variable can easily be predicted using another variable. As a result of keeping both of these variables in our analysis, we may encounter problems. 4 Understanding Multicollinearity A multicollinear regression model can be problematic because it is impossible to distinguish between the effects of the independent variables on the dependent variable. For example, consider the following linear equation: Y = b0 + b1*x1 + b2*x2 + b3*x3 + … +bn*xn The coefficient b1 represents the increase in Y for a unit increase in x1 while x2 remains constant. However, since x1 and x2 are highly correlated, changes in x1 would also affect x2, and we would not be able to distinguish their effects on Y independently. We may not lose accuracy due to Multicollinearity. Still, we may lose reliability when determining the effects of individual features in your model, which can be an issue in interpreting it. 5 Causes of Multicollinearity The following problems may contribute to Multicollinearity: 1. Data multicollinearity may result from problems with the dataset at the time of its creation, which could include poorly designed experiments, highly observational data, or an inability to manipulate the data: 2. In addition, Multicollinearity may occur when new variables are created that are dependent on other variables. 3. Insufficient data in some cases can also cause multicollinearity problems. 6 Detecting Multicollinearity Multicollinearity can be detected via various methods. The most common one is VIF (Variable Inflation Factors). VIF determines the strength of the correlation between the independent variables. It is predicted by taking a variable and regressing it against every other variable. R² value is determined to find out how well an independent variable is described by the other independent variables. A high value of R² means that the variable is highly correlated with the other variables. This is captured by the VIF, which is denoted below: So, the closer the R² value to 1, the higher the value of VIF and the higher the multicollinearity with the particular independent variable. 7 Dealing with Multicollinearity Method #1: Dropping one of the correlated features will help in bringing down the multicollinearity between correlated features. Dropping variables should be an iterative process starting with the variable having the largest VIF value because its trend is highly captured by other variables. If you do this, you will notice that VIF values for other variables would have reduced too, although to a varying extent. Method #2: Other options for fixing multicollinearity include increasing sample size Method #3: Deploying a different model. Method #4: Apply ridge regression model to address multicollinearity.1 8 What is Ridge regression ? Ridge regression is useful in solving problems where you have less than one hundred thousand samples or when you have more parameters than samples. Ridge regression is used to solve overfitting and multicollinearity problems. Ridge regression is one of the types of linear regression in which a small amount of bias is introduced so that we can get better long-term predictions. 9 What is Ridge regression ? Ridge regression is a regularization technique, which is used to reduce the complexity of the model. It is also called as L2 regularization. In this technique, the cost function is altered by adding the penalty term to it. The amount of bias added to the model is called Ridge Regression penalty. We can calculate it by multiplying with the lambda to the squared weight of each individual feature. The equation for the cost function in ridge regression will be: 𝑁 𝑀 𝑦𝑖 − 𝑦𝑖 ′ 2 + λ 𝑏𝑗 2 𝑖=1 𝑗=1 10 What is Ridge regression ? 𝑁 𝑀 𝑦𝑖 − 𝑦𝑖 ′ 2 + λ 𝑏𝑗 2 𝑖=1 𝑗=1 In the above equation, the penalty term regularizes the coefficients of the model, and hence ridge regression reduces the amplitudes of the coefficients that decreases the complexity of the model. As we can see from the above equation, if the values of λ tend to zero, the equation becomes the cost function of the linear regression model. Hence, for the minimum value of λ, the model will resemble the linear regression model. A general linear or polynomial regression will fail if there is high collinearity between the independent variables, so to solve such problems, Ridge regression can be used. 11 Example of Ridge Regression 12 Example of Ridge Regression 13 Questions 14