Decision Trees in Analytics

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

Which of the following is the primary function of non-leaf nodes in a decision tree?

To represent the predicted outcome of a case.
To perform a test on an attribute's value. (correct)
To indicate the end of the classification process.
To store the final classification decision.

What is the process for classifying a new instance using a decision tree?

Begin at the root, apply tests at each node, and follow the branch corresponding to the outcome until a decision is reached. (correct)
Apply all tests simultaneously and aggregate the results to determine the classification.
Choose the shortest path from root to leaf based on instance attributes.
Start at a random node and traverse until a leaf is found.

What characteristics make decision trees valuable in modern analytics?

Their exceptional predictive accuracy, surpassing most newer methods.
Their wide availability, efficient implementations, and ability to handle various data types and missing data. (correct)
Their ability to work exclusively with numerical attributes and complete datasets.
Their complex structure, which enables them to model highly non-linear relationships in data.

In a decision tree, what do the leaves represent?

Final classifications. (D) Signup and view all the answers

When classifying an instance with a decision tree, what action is taken if a leaf node is encountered?

The decision is made at that leaf. (C) Signup and view all the answers

Why is using an attribute like PESEL (or SSN) problematic for decision tree learning, despite yielding maximal information gain?

Each record having a unique value makes it a useless test; it memorizes rather than generalizes patterns. (C) Signup and view all the answers

What is a major drawback of using information gain as the sole criterion for selecting attributes in decision tree learning?

It is biased towards tests with many outcomes, potentially selecting useless attributes. (D) Signup and view all the answers

What is the purpose of using gain ratio instead of information gain in decision tree learning?

To correct the bias of information gain towards tests with many outcomes. (C) Signup and view all the answers

The gain ratio is calculated by dividing the information gain Hgain(T) by what?

The entropy of the test, $H(T)$ (C) Signup and view all the answers

What problem can arise when using gain ratio, and what is a common workaround?

It can select tests regardless of their predictive value when H(T) is low; consider only tests with above-average information gain. (A) Signup and view all the answers

According to the content, how does one mitigate the side effect produced as a problem of gain ratio?

Compute the average information gain of all tests, and only consider tests whose Hgain is above average. Out of those tests, pick the one with the highest gain ratio. (B) Signup and view all the answers

Besides using gain ratio, what is another solution mentioned in the content to address the issue of tests with many outcomes?

Use only binary splits. (A) Signup and view all the answers

What does the content suggest about complex or 'hacky' solutions often found in decision tree learning?

They often work well in practice despite their lack of elegance. (C) Signup and view all the answers

Which of the following joint probability distributions satisfies the conditions where A and Y are statistically independent, B and Y are also statistically independent, but A and B together determine the value of Y?

P(A=0, B=0, Y=0) = 0.25, P(A=0, B=1, Y=1) = 0.25, P(A=1, B=0, Y=1) = 0.25, P(A=1, B=1, Y=0) = 0.25 (D) Signup and view all the answers

Why might early stopping criteria in decision tree learning sometimes be ineffective?

They fail to account for the possibility that combinations of variables may offer significant predictive improvements, even if individual variables do not. (D) Signup and view all the answers

What is the primary purpose of pruning in decision tree construction?

To enhance the tree's ability to generalize and improve accuracy on unseen data. (B) Signup and view all the answers

How does the pruning strategy typically compare to the test selection criterion in terms of its impact on the quality of a decision tree?

The pruning strategy typically has a much bigger impact. (A) Signup and view all the answers

During the pruning process, what label is assigned to a leaf node that replaces a pruned subtree?

The majority class in the training data falling into that node. (D) Signup and view all the answers

In the iterative pruning algorithm described, what is the order in which nodes are considered for pruning?

Bottom to top, level by level. (B) Signup and view all the answers

Why is the error rate on the training set not a good estimate for pruning a decision tree?

The error on the training set is usually too low and does not reflect the tree's performance on unseen data. (C) Signup and view all the answers

Which of the following statements is true regarding estimating classification error during pruning?

Error on a validation set or through cross-validation provides a better estimate of classification error than using the training set. (D) Signup and view all the answers

What is the first step in the 1-SE rule for cost-complexity pruning?

Pick the tree with the smallest validation error (D) Signup and view all the answers

What logical step follows after selecting the tree with the smallest validation error?

Compute the standard deviation of the tree's error (C) Signup and view all the answers

How does the 1-SE rule determine which tree to prune?

Choose the smallest tree whose error is less than the error of the best tree plus one standard deviation (D) Signup and view all the answers

What is a significant advantage of pruning a decision tree?

It simplifies the decision-making process and maintains or improves accuracy (C) Signup and view all the answers

Which method is NOT mentioned as a technique for handling missing values in decision trees?

Imputation through averaging (B) Signup and view all the answers

What is the primary function of surrogate splits in decision tree algorithms?

To make use of alternative splits when the primary attribute value is missing (D) Signup and view all the answers

When considering numerical attributes in decision trees, how is a splitting point determined?

By picking the midpoint between two consecutive different values (C) Signup and view all the answers

What is a common issue that arises with missing values in datasets when applying machine learning algorithms?

They can lead to significant data loss when records or attributes are removed. (C) Signup and view all the answers

What is the purpose of surrogate splits in decision trees?

To provide alternative tests when a test cannot be performed due to a missing value (B) Signup and view all the answers

What splitting criterion does the CART algorithm use?

Gini gain (B) Signup and view all the answers

Which algorithm is considered historically the first in decision trees?

ID3 (C) Signup and view all the answers

What stop criterion does the CHAID algorithm use?

No significant relationships can be found based on chi-square test (A) Signup and view all the answers

Which of the following statements about C4.5 is correct?

It employs gain ratio as its splitting criterion. (B) Signup and view all the answers

What technique is used in CART to handle missing values?

Using surrogate splits (C) Signup and view all the answers

Which development allows decision trees to accommodate numerical target variables?

Regression trees (D) Signup and view all the answers

Which of the following is a characteristic of the ID3 algorithm?

It does not perform any pruning. (A) Signup and view all the answers

Which statement regarding surrogate splits is true?

They can be misleading due to correlated attributes. (D) Signup and view all the answers

What is the main disadvantage of the CHAID algorithm?

It fails to correct for multiple statistical tests. (A) Signup and view all the answers

Which of the following is NOT one of the primary approaches to decision tree pruning discussed?

Based on information gain ratios (D) Signup and view all the answers

In validation error based pruning, what is the primary purpose of the validation set?

To calculate the error during pruning (B) Signup and view all the answers

During validation error based pruning, the algorithm iterates through nodes in which order?

Bottom to top, level by level (B) Signup and view all the answers

What is the pruning criterion in validation error based pruning?

Prune the subtree if has a validation error greater than or equal than the single leaf (A) Signup and view all the answers

What is a significant disadvantage of validation error based pruning, especially for smaller datasets?

A large part of available data is not used for training. (D) Signup and view all the answers

In the context of pruning based on confidence intervals, what does 'nE' represent?

The number of misclassified records in a leaf (A) Signup and view all the answers

According to Quinlan's method from 1987, what statistical distribution is used to model classification errors in leaves?

Binomial Distribution (B) Signup and view all the answers

When using normal approximation for confidence intervals in pruning, what is a limitation?

It does not work well for probabilities close to $0$ or $1$. (D) Signup and view all the answers

Flashcards

Decision Trees

A machine learning algorithm that uses a tree-like model for decision making.

Leaf Node

A terminal node in a decision tree representing a decision outcome.