Regularization PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document explains regularization techniques in machine learning, focusing on L1 and L2 regularization. It describes how these methods prevent overfitting by adding penalty terms to the objective function, and how different regularization approaches impact model performance and feature selection.
Full Transcript
Regularization Regularization is a technique used in machine learning to prevent overfitting and improve the generalization performance of a model. It involves adding a penalty term to the objective function that the model aims to minimize during trai...
Regularization Regularization is a technique used in machine learning to prevent overfitting and improve the generalization performance of a model. It involves adding a penalty term to the objective function that the model aims to minimize during training. The purpose of regularization is to discourage the model from fitting the training data too closely, which can lead to poor performance on new, unseen data. Regularization 1 L2 Regularization A linear regression that uses the L2 regularization technique is called ridge regression. In other words, in ridge regression, a regularization term is added to the cost function of the linear regression, which keeps the magnitude of the model’s weights (coefficients) as small as possible. The L2 regularization technique tries to keep the model’s weights close to zero, but not zero, which means each feature should have a low impact on the output while the model's accuracy should be as high as possible. Where λ controls the strength of regularization, and wj are the model's weights (coefficients). By increasing λ the model becomes flattered and underfit. On the other hand, by decreasing λ, the model becomes more overfit, and with λ= 0, the regularization term will be eliminated. L2 regularization penalizes large coefficients and discourages the model from relying too much on any single feature. It generally does not lead to sparsity in the model coefficients. L1 Regularization Least Absolute Shrinkage and Selection Operator (lasso) regression is an alternative to ridge for regularizing linear regression. Lasso regression also adds a penalty term to the cost function, but slightly different, called L1 regularization. L1 regularization makes some coefficients zero, meaning the model will ignore those features. Ignoring the least important features helps emphasize the model's essential features. Regularization 2 Where λ controls the strength of regularization, and wj are the model's weights (coefficients).Lasso regression automatically performs feature selection by eliminating the least important features. L1 regularization tends to produce sparse models because the penalty term encourages some of the model coefficients to be exactly zero. The sparsity property makes L1 regularization useful for feature selection, as irrelevant features may have zero coefficients. Elastic Net Regularization: Elastic Net is a combination of L1 and L2 regularization, using both absolute values and squared values of the coefficients. It introduces two regularization parameters (αα and λλ) to control the trade-off between L1 and L2 regularization. α: Mixing parameter that determines the balance between L1 and L2 regularization. Key Difference between Ridge Regression and Lasso Regression Ridge regression is mostly used to reduce the overfitting in the model, and it includes all the features present in the model. It reduces the complexity of the model by shrinking the coefficients. Lasso regression helps to reduce the overfitting in the model as well as feature selection. Regularization 3