Ensemble Learning Methods

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which of the following statements accurately describes the core principle behind ensemble learning?

  • Ensemble learning primarily aims to reduce the computational resources required for model training.
  • Ensemble learning involves strategically combining multiple individual models to make a final prediction. (correct)
  • Ensemble learning focuses on training a single, highly complex model to capture all data variations.
  • Ensemble learning seeks to identify the single best model from a pool of pre-trained models.

Which voting method involves averaging each model's prediction to get a continuous final prediction value?

  • Majority Wins
  • Soft Voting
  • Hard Voting (correct)
  • Weighted Averaging

In ensemble learning, what is the primary purpose of bias-variance decomposition?

  • To isolate and quantify sources of error to improve model accuracy. (correct)
  • To increase the model complexity.
  • To reduce the amount of training data required.
  • To simply the model for increased interpretability

What is the main goal of ensemble methods like Bagging?

<p>To reduce variance by creating multiple models from different subsets of the training data. (A)</p> Signup and view all the answers

In the context of the bias-variance tradeoff, what does 'bias' refer to?

<p>The systematic error in a model's predictions, indicating a consistent deviation from the true values. (A)</p> Signup and view all the answers

Which of the following statements best characterizes 'variance' in the context of the bias-variance tradeoff?

<p>The stability of a model's predictions when faced with different training datasets. (C)</p> Signup and view all the answers

A team is using an ensemble method with three classifiers. Given each classifier has an accuracy rate of 0.7, what is the probability the ensemble classifier makes a correct prediction, assuming classification follows the majority among three classifiers?

<p>0.784 (A)</p> Signup and view all the answers

An ensemble model is constructed by averaging the predictions of 'n' models, each having a variance of $σ^2$. If the pairwise correlation between the models is 0, what is variance of the ensemble as a function of 'n' and $σ^2$?

<p>$σ^2/n$ (D)</p> Signup and view all the answers

Which of the following is NOT a main camp of ensemble learning methods?

<p>Clustering (B)</p> Signup and view all the answers

How do classifiers learn in the Bagging method?

<p>Classifiers are learned independently, in a parallel way. (A)</p> Signup and view all the answers

What is the main difference between Bagging and Boosting?

<p>Bagging trains classifiers in parallel, while Boosting trains them sequentially. (A)</p> Signup and view all the answers

In Bagging, what is the purpose of creating k bootstrap samples from the original dataset?

<p>To create different versions of the training dataset to train <code>k</code> different models. (B)</p> Signup and view all the answers

In bootstrap sampling for Bagging, given a dataset D containing m training examples, what is the approximate probability that a sample is not selected in a new bootstrap sample D' of size m?

<p>Approximately 36.8% (D)</p> Signup and view all the answers

In Bagging, how are the predictions of individual classifiers combined to classify a new instance?

<p>By classifier vote with equal weights. (C)</p> Signup and view all the answers

What is a key characteristic of the base learners used in Bagging?

<p>They are usually weaker learners of the same type. (C)</p> Signup and view all the answers

How does Random Forest differ from Bagging?

<p>Random Forest selects among a subset of all features for each tree, while Bagging uses all features. (B)</p> Signup and view all the answers

What does 'Out-of-Bag Evaluation' refer to in the context of Random Forests?

<p>A method for evaluating the model's performance on unseen data by using instances not sampled during bagging. (A)</p> Signup and view all the answers

In the scikit-learn implementation of Random Forest, what does the feature_importances_ attribute provide?

<p>A measure of the contribution of each feature to the model's predictive power. (A)</p> Signup and view all the answers

Which of the following is a limitation of Bagging?

<p>Bagging assumes an equal importance for each training example during bootstrap sampling. (A)</p> Signup and view all the answers

What is the primary strategy for improving the efficiency of the Bagging?

<p>Both A and C (C)</p> Signup and view all the answers

What characterizes how Adaptive Boosting (AdaBoost) identifies the shortcomings of existing weak classifiers?

<p>AdaBoost identifies shortcomings by focusing on high-weight data points. (D)</p> Signup and view all the answers

In Adaptive Boosting (AdaBoost), an instance is wrongly classified, the algorithm aims to:

<p>Increase the weight (C)</p> Signup and view all the answers

What does the misclassification rate (epsilon) in AdaBoost indicate?

<p>The proportion of misclassified instances, weighted by their respective weights. (C)</p> Signup and view all the answers

In AdaBoost, if a classifier has a higher accuracy, what adjustment is typically made?

<p>The higher accuracy classifier is assigned a higher weight for predictions in the combined predictions of the ensemble. (C)</p> Signup and view all the answers

In AdaBoost, how and why must the 'instance weights' be updated?

<p>The weights are updated, to focus on instances that are difficult to classify. (C)</p> Signup and view all the answers

In gradient boosting, how are the shortcomings of existing weak classifiers identified?

<p>By analyzing the gradients of the loss function. (D)</p> Signup and view all the answers

Which of the following is correct with respect to regression?

<p>Regression fits a model F(x) to minimize square loss. (C)</p> Signup and view all the answers

How does Gradient Boosting leverage gradient descent?

<p>To find the minimum loss of function (B)</p> Signup and view all the answers

The algorithm summary consists of?

<p>All of the above (D)</p> Signup and view all the answers

What is true about stacking?

<p>Technique uses predictions from multiple models to build a new model. (A)</p> Signup and view all the answers

In stacking, what data is used to train the level 2 models?

<p>The predictions from pre-trained models (C)</p> Signup and view all the answers

Which of the following does stacking involve?

<p>Stacking uses cross-validation to generate predictions from base models. (D)</p> Signup and view all the answers

In stacking, after base models generate predictions using cross-validation, what is the next step?

<p>Fit the whole training data (B)</p> Signup and view all the answers

You're building an ensemble model to predict housing prices. Which of the following scenarios would likely benefit the MOST from using a stacking ensemble method?

<p>The dataset contains a mix of numerical, categorical, and textual features. (A)</p> Signup and view all the answers

You are tasked with using an ensemble model to classify images. After implementing an AdaBoost model, you observe that it performs poorly. Which of the following is the MOST likely reason for this poor performance?

<p>An inadequate tuning of base classifiers caused the mode to classify poorly. (B)</p> Signup and view all the answers

A data scientist is building an ensemble model and observes the following problem. Despite high accuracy on the training set, the ensemble exhibits significantly lower accuracy on the validation set. Which of the following is the MOST appropriate to address this issue?

<p>Reduce the number of base learners in the ensemble. (B)</p> Signup and view all the answers

You are using Random Forest to build a model. Upon inspecting the feature_importances_ attribute, you notice that only a few features have high importance scores, while most others have very low scores. What course of action should you take?

<p>Use feature selection to focus of the number of key features. (B)</p> Signup and view all the answers

Flashcards

Ensemble Learning

Learning multiple models and combining them for better accuracy.

Hard Voting

Each classifier votes, and the majority wins. For regression, predictions are averaged.

Soft Voting

Classifiers provide a probability distribution, weighted by importance and summed up.

Reduce Bias

Technique that reduces bias in models.

Signup and view all the flashcards

Reduce Variance

Technique that reduces variance in models.

Signup and view all the flashcards

Bias

A systematic deviation from the truth.

Signup and view all the flashcards

Variance

A model learns the peculiarities in the training data.

Signup and view all the flashcards

Error

Encompasses all deviations from the truth, including both systematic and random errors.

Signup and view all the flashcards

Bagging

Ensemble method with independent classifiers in parallel and average classifiers.

Signup and view all the flashcards

Bootstrap Aggregating

Randomly sampling data with replacement

Signup and view all the flashcards

Boosting

Ensemble method that incrementally adds classifiers sequentially in an adaptive way.

Signup and view all the flashcards

Stacking

Ensemble method combining predictions from previous models.

Signup and view all the flashcards

Bootstrap Samples

Creating k bootstrap samples from the original data.

Signup and view all the flashcards

Classifying Instances

Classifying a new instance by classifier vote with equal weights.

Signup and view all the flashcards

Random Forest

Ensemble method with a collection of trees.

Signup and view all the flashcards

Decision Trees

Recursively partition samples based on Gini coefficient or information gain.

Signup and view all the flashcards

Variation

Variation on training data, training subsamples and selecting features.

Signup and view all the flashcards

Feature Splitting

The best split determined by searching in a subset of randomly selected features.

Signup and view all the flashcards

Original Samples

Using the whole original samples.

Signup and view all the flashcards

Value Splitting

Randomize the value to split on the randomly selected attribute.

Signup and view all the flashcards

Out-of-Bag Evaluation

Remaining instances not sampled, used to evaluate each predictor

Signup and view all the flashcards

Feature Importance

Weighted average measures how much nodes in forest reduce impurity.

Signup and view all the flashcards

Boosting Model

Additive model that compensates shortcomings with weak classifiers.

Signup and view all the flashcards

Adaptive Boosting

Identifying 'shortcomings' using high-weight data points.

Signup and view all the flashcards

Gradient Boosting

Identifying 'shortcomings' using gradients.

Signup and view all the flashcards

Instances Weights

Instances are misclassified and weights are increased.

Signup and view all the flashcards

Weight Instances

Train a new base learner using weighted instances and obtain hypothesis.

Signup and view all the flashcards

Hard to classify

Focus on the examples that are difficult to classify.

Signup and view all the flashcards

Regression Tree

Fit a regression tree to data.

Signup and view all the flashcards

Loss Function

Algorithm summary with general loss function.

Signup and view all the flashcards

Stacking

Uses predictions from multiple models to build a new model

Signup and view all the flashcards

Meta-Learner

Meta that takes prediction from multiple models

Signup and view all the flashcards

Study Notes

Ensemble Learning Overview

  • Ensemble methods involve training multiple models and combining them to improve accuracy
  • Combining the models involves hard or soft voting

Hard Voting

  • Individual classifiers vote for a class, with the majority winning
  • For regression, each model's prediction is averaged for the final prediction

Soft Voting

  • Classifiers provide a probability distribution over possible classes or a numerical prediction with certainty
  • Predictions are weighted based on the classifier's importance, then summed or averaged
  • The target label with the highest sum of weighted probabilities wins the vote

Why Use Ensemble Methods

  • To reduce bias and variance
  • Bias is a systematic deviation from the truth
  • Variance is a model's sensitivity to peculiarities in the training data
  • The bias-variance tradeoff aims to minimize both systematic and random errors
  • Overall, it is desired to reduce bias and variance

Reducing Bias

  • Using multiple classifiers can increase accuracy
  • Consider a scenario with 3 independent classifiers, each having a 60% accuracy rate (a = 0.6)
  • An ensemble classifier, which follows the majority among the three, can achieve higher accuracy

Reducing Variance

  • Ensemble learning can reduce variance using multiple models
  • If we have n models with the same variance σ², the average variance formula is expressed differently if models corelate or do not

Three Main Ensemble Learning Methods

  • Bagging involves bootstrap aggregating, with classifiers trained independently in parallel ("averaging" classifiers)
  • Boosting involves incrementally "adding" classifiers, with classifiers learned sequentially in an adaptive way
  • Stacking combines predictions from previous models

Bagging (Bootstrap Aggregating)

  • Creates k bootstrap samples (D1, D2,..., Dk)
  • Trains distinct base classifiers (hᵢ) on each bootstrap sample Di, which are usually weaker learners of the same type
  • Classifies new instances by averaging the classifier votes
  • Introduces variation by using training subsamples and feature selection for each classifier

Bootstrap Sampling

  • Given a set D containing m training examples, create Di by drawing m examples at random with replacement from D
  • Approximately 63% of samples are distinct in each bootstrap

Random Forest

  • Composed of a collection of trees
  • Decision trees recursively partition samples based on Gini coefficient or information gain until a criterion is met
  • Random Forest is composed of a collection of random trees
  • There is variation on training subsamples and selecting among a subset of all features for each tree

Random Forest and Tree Bagging: Feature Selection

  • Tree Bagging selects features at bootstrap sampling and over all features
  • Random Forest searches best split in a subset of randomly selected features, where the number of features is the square root of n_features
  • ExtraTrees: Extremely Randomized Trees use whole original samples, randomize the value to split

Scikit-learn Support

  • Out-of-Bag (Oob) evaluation uses the remaining 37% of training instances to evaluate each predictor
  • Feature importance is a weighted average across all trees measuring impurity reduction on average

Bagging Limitations

  • Inefficient Bootstrap Sampling: Every example has = chance to be sampled, with no distinction between simple or difficult examples
  • Inefficient Model Combination: Constant weight for each classifier, with no distinction between accurate or inaccurate classifiers

Improving Bagging Efficiency

  • Better sampling strategy: Focus on difficult-to-classify examples which leads to Boosting
  • Better combination strategy: Accurate model should be assigned larger weights, so use another machine learning model to combine the results which leads to Stacking

Boosting

  • Additive model that compensates shortcomings of classifiers
  • Adaptive Boosting (Adaboost) weights the data
  • Gradient Boosting (Gradient Descent + Boosting) identifies shortcomings by gradients

Adaptive Boosting: Example

  • Adaptively adjusts weights based on classification performance
  • Instances classified incorrectly have increased weights
  • Instances classified correctly have decreased weights
  • Adaptive Boosting: Example Instances that are wrongly classified will have their weights are increased
  • Instances that are correctly classified will have their weights decreased

AdaBoost Algorithm

  • Given input x, where xᵢ ∈ X, and classification 𝑦𝑖 ∈ {−1, +1}
  • All instances get an initial start up weight value
  • Iteratively calculate new base learners using weighted instances set and compute the misclassification rate et
  • Update instance weights for the new iteration with Dt+1(i) depending on instance importance

Gradient Boosting: Intuition

  • It consists of an error-correction approach, fixes mistakes from previous classifiers

Gradient Boosting

  • In regression form, the model minimizes the square loss
  • Models are created that minimize square loss with the purpose to get closer to the true model

Gradient Boosting Algorithm Summary

  • Utilizes additive model F = Σt ρtht
  • Compensates shortcomings of existing models, adjusting with the factor F = F + ph, p = learning rate
  • Identifies shortcomings with the negative gradients

Stacking

  • Stacking is an ensemble learning technique using predictions from many models to build a new one
  • Training can be broken down in two stages
  1. Splitting into 2 sets for training + tests
  2. Splitting training data into folds for K-fold cross-validation

Midterm Exam

  • The midterm exam is April 3, with a mix of T/F and Multiple Choice questions
  • Primarily closed book format

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser