Week 12 COE305 Machine Learning PDF
Document Details
Uploaded by ToughSard1958
İstinye Üniversitesi
Femilda Josephin
Tags
Summary
This document provides a summary of machine learning, focusing on ensemble methods like bagging, boosting (GBM, XGBoost, LightGBM, CatBoost), and stacking. It also includes an explanation of explainable AI (XAI) and SHAP values, commonly used methods in machine learning for model interpretation.
Full Transcript
WEEK 12 COE305 MACHINE LEARNING BY FEMILDA JOSEPHIN ENSEMBLE LEARNING + Ensemble Learning is a method of performing predictions by fusing the salient properties of two or more models. + The final ensemble learning framework is more robust than the individual models that constitu...
WEEK 12 COE305 MACHINE LEARNING BY FEMILDA JOSEPHIN ENSEMBLE LEARNING + Ensemble Learning is a method of performing predictions by fusing the salient properties of two or more models. + The final ensemble learning framework is more robust than the individual models that constitute the ensemble. + Ensemble reduces the variance in the prediction errors Types of Ensemble Learning Bagging + The Bagging ensemble technique is the acronym for “bootstrap aggregating” and is one of the earliest ensemble methods proposed. + Subsamples from a dataset are created and they are called “bootstrap sampling.” + In the bagging mechanism, a parallel stream of processing occurs. + The main aim of the bagging method is to improve the overall performance in the ensemble predictions. + It is a homogeneous weak learners' model Types of Ensemble Learning Boosting + Instead of parallel processing of data, sequential processing of the dataset occurs. + The first classifier is fed with the entire dataset , and the predictions are analyzed. + The instances where Classifier-1 fails to produce correct predictions are fed to the second classifier and so on. + The main aim of the boosting method is to improve the overall performance in the ensemble decision. Types of Ensemble Learning Stacking + The stacking ensemble method also involves creating bootstrapped data subsets, like the bagging ensemble mechanism for training multiple models. + The outputs of all such models are used as an input to another classifier, called meta- classifier, which finally predicts the samples. Why should we consider using an ensemble? + Performance: An ensemble can make better predictions and achieve better performance than any single contributing model. + Robustness: An ensemble reduces the spread or dispersion of the predictions and model performance. Algorithms based on Bagging and Boosting Bagging algorithms: + Bagging meta-estimator + Random forest Boosting algorithms: + AdaBoost + GBM + XGBM + LightGBM + CatBoost Gradient Boosting Machine (GBM) + A Gradient Boosting Machine or GBM combines the predictions from multiple decision trees to generate the final predictions. + All the weak learners in a gradient boosting machine are decision trees. + If we are using the same algorithm, then how is using a hundred decision trees better than using a single decision tree? + The nodes in every decision tree take a different subset of features for selecting the best split. + Additionally, each new tree considers the errors or mistakes made by the previous trees. + So, every successive decision tree is built on the errors of the previous trees. GBM-Example + Step 1: Make an initial prediction + Gradient boosting is an algorithm that gradually increases its accuracy. + To start the process, we need an initial guess or prediction. + The initial guess is always the average of the target. + initial prediction is the average—156 dollars. Hold it in memory as we continue. GBM-Example + Step 2: Calculate the pseudo-residuals + The next step is to find the differences between each observed value and initial prediction: 156 – Observed. GBM-Example Step 3: Build a weak learner + Next, build a decision tree (weak learner) that predicts the residuals using the three features (age, category, purchase weight). + After the tree is fit to the data, we make a prediction for each row in the data. + For the first row the perfect predictions but might be heavily overfitting to the training data. + So, to mitigate this problem, gradient boosting has a parameter called learning rate. + The learning rate in gradient boosting is simply a multiplier between 0 and 1 that scales the prediction of each weak learner. + When we add an arbitrary learning rate of 0.1 into the mix, the prediction becomes 152.75, not the perfect 123.45. + + prediction on the second row + continue in this fashion for all rows until we have four predictions for four rows: 152.75, 146.08, 174.945, 150.2. Next, we find the new pseudo-residuals by subtracting new predictions from the purchase amount. + Step 4: Iterate + iterate on step 3, i.e. build more weak learners. + Remember to keep adding the residuals of each tree to the initial prediction to generate the next. + For example, if 10 trees are built and the residuals of each tree are denoted as r_i (1