Decision Tree Pruning

EliteAsh avatar
EliteAsh
·
·
Download

Start Quiz

Study Flashcards

10 Questions

What strategy is proposed to improve the classification tree?

Grow a very large tree and then prune it back to obtain a subtree

What is the purpose of cost complexity pruning?

To select a small set of subtrees for consideration

What is the role of the tuning parameter 𝛼 in cost complexity pruning?

To index a sequence of trees

How is the value of 𝛼 chosen in the cost complexity pruning algorithm?

By using k-fold cross-validation to minimize the average error

What is the purpose of step 3 in the cost complexity pruning algorithm?

To evaluate the MSE for the test error for the left fold

What is the result of applying the cost complexity pruning to a large tree?

A sequence of best subtrees as a function of 𝛼

What is the goal of the cost complexity pruning algorithm?

To find the subtree that minimizes the average error

What is the relationship between 𝛼 and the subtree T?

T is a subset of T0 and has the smallest possible error

What is the advantage of using cost complexity pruning over considering every possible subtree?

It allows for a more efficient search of the subtree space

What is the role of k-fold cross-validation in the cost complexity pruning algorithm?

To choose the value of 𝛼

Study Notes

Tree Pruning

  • The tuning parameter α controls the trade-off between subtree complexity and its fit to the training data.
  • When α = 0, the subtree T will simply equal T0.

Cost Complexity Pruning

  • Cost complexity pruning is a method to select a small set of subtrees for consideration.
  • It is also known as weakest link pruning.
  • The algorithm for cost complexity pruning involves:
    • Growing a large tree using recursive binary splitting and stopping according to the stopping condition.
    • Applying cost complexity pruning to obtain a sequence of best subtrees as a function of α.
    • Using k-fold cross-validation to choose α.
    • Returning the subtree that corresponds to the chosen value of α.

Classification Trees

  • A classification tree is used to predict a qualitative response.
  • The classification error rate is the fraction of training observations in a region that do not belong to the most common class.
  • The classification error rate is not sufficient for tree-growing, and alternative measures are preferable.

Gini Index

  • The Gini index is a measure of total variance across K classes.
  • It takes on a small value if all the p_mk's are close to zero or one.
  • The Gini index is a measure of node purity, with a small value indicating that a node contains predominantly observations from a single class.

Entropy

  • An alternative to the Gini index is entropy, given by the formula -∑p_mk log p_mk.
  • The entropy takes on a value near zero if the p_mk's are all near zero or near one.
  • The Gini index and entropy are quite similar numerically.

Tree Pruning Strategy

  • A better strategy is to grow a very large tree T0 and then prune it back to obtain a subtree.
  • This approach is preferable because it allows for a more exhaustive search of possible subtrees.

Quiz about decision tree pruning, including the tuning parameter alpha and its effects on subtree complexity and fit to training data.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free

More Quizzes Like This

Decision Trees in AI and ML Quiz
3 questions
Élagage des Arbres de Décision
48 questions

Élagage des Arbres de Décision

VisionaryVerisimilitude avatar
VisionaryVerisimilitude
Use Quizgecko on...
Browser
Browser