Week 12 COE305 Machine Learning PDF

WEEK 12 COE305 MACHINE LEARNING BY FEMILDA JOSEPHIN ENSEMBLE LEARNING + Ensemble Learning is a method of performing predictions by fusing the salient properties of two or more models. + The final ensemble learning framework is more robust than the individual models that constitute the ensemble. + Ensemble reduces the variance in the prediction errors Types of Ensemble Learning Bagging + The Bagging ensemble technique is the acronym for “bootstrap aggregating” and is one of the earliest ensemble methods proposed. + Subsamples from a dataset are created and they are called “bootstrap sampling.” + In the bagging mechanism, a parallel stream of processing occurs. + The main aim of the bagging method is to improve the overall performance in the ensemble predictions. + It is a homogeneous weak learners' model Types of Ensemble Learning Boosting + Instead of parallel processing of data, sequential processing of the dataset occurs. + The first classifier is fed with the entire dataset , and the predictions are analyzed. + The instances where Classifier-1 fails to produce correct predictions are fed to the second classifier and so on. + The main aim of the boosting method is to improve the overall performance in the ensemble decision. Types of Ensemble Learning Stacking + The stacking ensemble method also involves creating bootstrapped data subsets, like the bagging ensemble mechanism for training multiple models. + The outputs of all such models are used as an input to another classifier, called meta- classifier, which finally predicts the samples. Why should we consider using an ensemble? + Performance: An ensemble can make better predictions and achieve better performance than any single contributing model. + Robustness: An ensemble reduces the spread or dispersion of the predictions and model performance. Algorithms based on Bagging and Boosting Bagging algorithms: + Bagging meta-estimator + Random forest Boosting algorithms: + AdaBoost + GBM + XGBM + LightGBM + CatBoost Gradient Boosting Machine (GBM) + A Gradient Boosting Machine or GBM combines the predictions from multiple decision trees to generate the final predictions. + All the weak learners in a gradient boosting machine are decision trees. + If we are using the same algorithm, then how is using a hundred decision trees better than using a single decision tree? + The nodes in every decision tree take a different subset of features for selecting the best split. + Additionally, each new tree considers the errors or mistakes made by the previous trees. + So, every successive decision tree is built on the errors of the previous trees. GBM-Example + Step 1: Make an initial prediction + Gradient boosting is an algorithm that gradually increases its accuracy. + To start the process, we need an initial guess or prediction. + The initial guess is always the average of the target. + initial prediction is the average—156 dollars. Hold it in memory as we continue. GBM-Example + Step 2: Calculate the pseudo-residuals + The next step is to find the differences between each observed value and initial prediction: 156 – Observed. GBM-Example Step 3: Build a weak learner + Next, build a decision tree (weak learner) that predicts the residuals using the three features (age, category, purchase weight). + After the tree is fit to the data, we make a prediction for each row in the data. + For the first row the perfect predictions but might be heavily overfitting to the training data. + So, to mitigate this problem, gradient boosting has a parameter called learning rate. + The learning rate in gradient boosting is simply a multiplier between 0 and 1 that scales the prediction of each weak learner. + When we add an arbitrary learning rate of 0.1 into the mix, the prediction becomes 152.75, not the perfect 123.45. + + prediction on the second row + continue in this fashion for all rows until we have four predictions for four rows: 152.75, 146.08, 174.945, 150.2. Next, we find the new pseudo-residuals by subtracting new predictions from the purchase amount. + Step 4: Iterate + iterate on step 3, i.e. build more weak learners. + Remember to keep adding the residuals of each tree to the initial prediction to generate the next. + For example, if 10 trees are built and the residuals of each tree are denoted as r_i (1

Week 12 COE305 Machine Learning PDF

Document Details

Tags

Related

Summary

Full Transcript