Decision Trees: Types and Examples

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

What does each node represent in a decision tree?

  • An outcome or end result
  • A statistical probability
  • A possible decision
  • A feature (attribute) (correct)

Which of the following best describes a decision tree?

  • A diagram or chart used to determine a course of action or show a statistical probability (correct)
  • A method of organizing data in a hierarchical, non-linear way
  • A detailed report showing all past decisions made by an organization
  • A complex algorithm used to determine a specific calculation

The process of dividing a node into sub-nodes is known as pruning.

False (B)

What does the term 'splitting' refer to in the context of decision trees?

<p>Dividing a node into two or more sub-nodes (A)</p> Signup and view all the answers

In decision tree terminology, what is the 'root node'?

<p>The starting point representing the entire population or sample (C)</p> Signup and view all the answers

A node that does not split into further sub-nodes is called a ______ or Terminal node.

<p>leaf</p> Signup and view all the answers

What is the process of removing sub-nodes from a decision node called?

<p>Pruning (C)</p> Signup and view all the answers

Pruning is the opposite of splitting.

<p>True (A)</p> Signup and view all the answers

Which statement is true regarding Parent and Child Nodes:

<p>A Parent node is divided into sub-nodes called Child nodes. (A)</p> Signup and view all the answers

What is the main goal of the ID3 algorithm?

<p>To construct decision trees (C)</p> Signup and view all the answers

Which of the following algorithms is an extension of the D3 algorithm?

<p>ID3 (B)</p> Signup and view all the answers

Which algorithm preforms multi-level splits when computing classification trees?

<p>CHAID (A)</p> Signup and view all the answers

Which of the following is a successor to ID3?

<p>C4.5 (B)</p> Signup and view all the answers

What is the purpose of calculating entropy in the context of the ID3 algorithm?

<p>To measure the amount of uncertainty or randomness in data (C)</p> Signup and view all the answers

In the ID3 algorithm, what does Information Gain indicate?

<p>The reduction in uncertainty after splitting on an attribute (B)</p> Signup and view all the answers

Iterative Dichotomiser 3 is a step in creating a desicion tree.

<p>True (A)</p> Signup and view all the answers

In dataset analysis for creating decision trees, what is the value that you are trying to achieve when picking the attribute?

<p>highest gain attribute</p> Signup and view all the answers

What kind of approach does the ID3 algorithm follow?

<p>greedy (A)</p> Signup and view all the answers

What is the formula for entropy?

<p>$Entropy = - \sum_{i=1}^{n} p_{i} \log_{2}(p_{i})$ (B)</p> Signup and view all the answers

Match the following terms with their correct descriptions:

<p>Root Node = Represents the entire population or sample. Decision Node = A sub-node splits into further sub-nodes. Leaf/Terminal Node = Nodes do not split. Parent Node = A node that is divided into sub-nodes (child nodes).</p> Signup and view all the answers

Which of these data mining algorithms uses a probabilistic machine learning algorithm?

<p>Naive Bayes (C)</p> Signup and view all the answers

What is a Naive Bayes Classifier based on?

<p>Bayes' Theorem (B)</p> Signup and view all the answers

What assumption regarding predictors does a Naive Bayes classifier make?

<p>They are independent. (A)</p> Signup and view all the answers

Naive Bayes is called naive because it assumes features in a dataset are always dependent.

<p>False (B)</p> Signup and view all the answers

A fruit is red, round, and about 3 inches in diameter. In reality, color and dimension are depedent variables, but if they independently contribute to the probability that this fruit is an apple what type of algorithm is being used?

<p>Naive Bayes</p> Signup and view all the answers

In machine learning one may want to use a ______ in situations like with spam filtering.

<p>Naive Bayes</p> Signup and view all the answers

Which of the following best describes the application of Bayes' Theorem?

<p>Predicting a class given a set of features using probability (B)</p> Signup and view all the answers

In the context of neural networks, what is an 'activation function'?

<p>A function that determines the output of a node based on its input (A)</p> Signup and view all the answers

In a neural network, where is the actual processing done?

<p>Hidden layers (A)</p> Signup and view all the answers

In a typical neural network, how are layers organized?

<p>Interconnected (C)</p> Signup and view all the answers

Layers in neural networks are made up of interconnected '______' which contain an 'activation function'.

<p>nodes</p> Signup and view all the answers

An input layer has weighted connections to output layers.

<p>False (B)</p> Signup and view all the answers

What is the key characteristic of a deep neural network?

<p>It has more than one hidden layer. (A)</p> Signup and view all the answers

For binary classification, how many neurons does a neural network output layer contain?

<p>one</p> Signup and view all the answers

If there are n features, how many nuerons does the input layer contain?

<p>n + 1 (D)</p> Signup and view all the answers

Match the following description of neural networks to its name.

<p>Multilayer Perceptron = Used when data is not linearly separable. Modular = Foundation block that is independently functioning. Feed-forward = Network moves until it reaches output node. Recurrent = Can help with text-to-speech transformation.</p> Signup and view all the answers

What is the purpose of the radial basis function in RBF networks?

<p>Enabling reasonable interpolation while fitting data (A)</p> Signup and view all the answers

Which type of neural network is known for retaining information in next layers?

<p>Recurrent (A)</p> Signup and view all the answers

What is a key feature of modular neural networks?

<p>They perform tasks independent of each other. (D)</p> Signup and view all the answers

Why would someone use a Convolutional?

<p>network is an advanced version of Multilayer Perceptron (A)</p> Signup and view all the answers

Which of the following applications is commonly associated with neural networks?

<p>Text-to-speech conversion (D)</p> Signup and view all the answers

Flashcards

What is a Decision Tree?

A diagram or chart used to determine a course of action or show a statistical probability.

Node (in a decision tree)

Represents a feature or attribute in a decision tree.

Branch (in a decision tree)

Represents a possible decision or reaction in a decision tree.

Leaf (in a decision tree)

The farthest branches on the tree, representing an outcome or the end results.

Signup and view all the flashcards

Classification Trees

Decision trees with Yes/No or categorical outcomes.

Signup and view all the flashcards

Regression Trees

Decision trees with continuous data types as outcomes.

Signup and view all the flashcards

Root Node

Represents the entire population or sample in a decision tree.

Signup and view all the flashcards

Splitting (in a decision tree)

The process of dividing a node into two or more sub-nodes.

Signup and view all the flashcards

Decision Node

A node that splits into further sub-nodes.

Signup and view all the flashcards

Leaf / Terminal Node

Nodes that do not split any further.

Signup and view all the flashcards

Pruning (in a decision tree)

Removing sub-nodes from a decision node. Opposite of splitting.

Signup and view all the flashcards

Branch / Sub-Tree

A subsection of the entire tree.

Signup and view all the flashcards

Parent Node

A node divided into sub-nodes.

Signup and view all the flashcards

Child Node

Sub-nodes of a parent node.

Signup and view all the flashcards

ID3 Algorithm

Extension of D3 algorithm.

Signup and view all the flashcards

C4.5 Algorithm

Successor of ID3 algorithm.

Signup and view all the flashcards

CART Algorithm

Classification and Regression Tree.

Signup and view all the flashcards

CHAID Algorithm

An algorithm stands for Chi-square automatic interaction detection.

Signup and view all the flashcards

MARS Algorithm

An algorithm stands for multivariate adaptive regression splines

Signup and view all the flashcards

What is ID3 Algorithm?

Stands for Iterative Dichotomiser 3.

Signup and view all the flashcards

Entropy

The amount of uncertainty or randomness in data.

Signup and view all the flashcards

Information Gain IG(A)

How much uncertainty in S was reduced after splitting set S on attribute A.

Signup and view all the flashcards

Naïve Bayes Algorithm

Naïve Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.

Signup and view all the flashcards

Why is Naive called Naive?

The assumption that all features of a dataset are independent.

Signup and view all the flashcards

Bayes Equation

Finds the probability of Class A given Features 1 and 2.

Signup and view all the flashcards

Basics of Neural Networks

Neworks are typically organized in layers.

Signup and view all the flashcards

Input Layer

Layer which present the patterns to the network.

Signup and view all the flashcards

Hidden Layer

Layers where the actual processing is done.

Signup and view all the flashcards

Output Layer

Layer which give the answer as output.

Signup and view all the flashcards

What is Input Layer?

This layer contains the neurons for the input of features adding also the bias.

Signup and view all the flashcards

What is Hidden Layer?

The hidden layer is the intermediate layer between the input and output layers.

Signup and view all the flashcards

Convolutional

Network which is the advanced version of Multilayer Perceptron.

Signup and view all the flashcards

Recurrent

Networks where the output of a particular layer is saved and is put back into the input again.

Signup and view all the flashcards

Modular

Also called Modularity and which is the basic foundation block of this neural network.

Signup and view all the flashcards

Study Notes

Decision Tree Overview

  • A decision tree is a diagram or chart used to determine a course of action or show statistical probability.
  • Each node represents a feature or attribute.
  • Each branch represents a possible decision, rule, or reaction.
  • A leaf represents an outcome or the end result and they are the farthest branches of the tree.

Types of Decision Trees

  • Classification trees have 'yes' or 'no' types of outcomes such as fit or unfit and uses categorical decision variables.
  • Regression trees use continuous data types for the decision or outcome variable, such as a number like 123.

Decision Tree Sample Problem

  • A decision tree can be used to predict if a person is fit based on their age, eating habits and physical activity using questions containing binary trees.
  • In another sample problem, one can predict if a customer will pay their renewal premium with an insurance company
  • Customer income is a significant variable, and a decision tree can predict customer income based on occupation, product and other variables
  • This predicts values for continuous variables.

Important Decision Tree Terminology

  • The root node represents the entire population or sample that is further divided into two or more homogeneous sets.
  • Splitting is the process of dividing a node into two or more sub-nodes.
  • A decision node is when a sub-node splits into further sub-nodes.
  • A leaf/terminal node does not split.
  • Pruning is removing sub-nodes from a decision node, essentially the opposite of splitting.
  • A branch/sub-tree is a subsection of the entire tree.
  • A parent node is divided into sub-nodes, and sub-nodes are the child of a parent node.

Decision Tree Algorithms

  • ID3 is an extension of D3 decision tree
  • C4.5 is a successor of the ID3 decision tree algorithm
  • CART is the Classification And Regression Tree algorithm
  • CHAID is the Chi-square automatic interaction detection algorithm, and it performs multi-level splits when computing classification trees
  • MARS is the multivariate adaptive regression splines algorithm

Decision Tree with ID3 Algorithm

  • The Iterative Dichotomiser 3, known as ID3 Algorithm is one of the best algorithms for creating decision trees
  • J. Ross Quinlan developed the ID3 algorithm
  • It is the core algorithm for building decision trees, and it is a supervised learning algorithm used for classification problems.
  • It is a classification algorithm that follows a greedy approach by selecting the best attribute for a node that yields maximum Information Gain(IG) or minimum Entropy(H).
  • Entropy, also called Shannon Entropy, is denoted by H(S) for a finite set S, and is the measure of the amount of uncertainty or randomness in data.
  • Information Gain IG(A) tells how much uncertainty in S reduced after splitting set S on attribute A.

Steps to create a Decision Tree

  • Calculate Entropy (the amount of uncertainty in a dataset, based on the number of positive and negative evidences).
  • Compute the entropy of each attribute.
  • Calculate average information.
  • Calculate information gain (difference in entropy before and after splitting dataset on attribute A).

Decision Tree creation: Root Node

  • Creating a Root Node means choosing the attribute that best classifies the training data at the root of the tree.
  • Calculate the number of positive and negative examples or evidences
  • To compute the Entropy for dataset Entropy(S) to calculate the number
  • Determine the entropy for all other values.
  • Take Average Information Entropy for the current attribute, then calculate Average Information Entropy
  • Calculate Information Gain for the current attribute, then calculate the gain for each attribute
  • Pick the Highest Gain Attribute for the node

Repeat Algorithm

  • After picking the highest gain attribute, repeat the same procedure for sub-trees until we get the tree we desired (the last node should be the leaf node)
  • Dataset examples showing Outlook as Sunny or Rainy use this method until every potential data set is included in the tree

Naive Bayes Algorithm

  • Supervised learning is used with the Naive Bayes Algorithm
  • An assumption of independence among predictors is used in this class of algorithms

Algorithm Characteristics

  • The assumption is that all features of a dataset are independent
  • An apple is considered to be red, round, and about 3 inches in diameter, but even if these features depend on each other its properties contribute to the probability it is an apple
  • Thomas Bayes was an English statistician, and Bayes' Theorem was named after him
  • Allows users to predict the class given set of features using probability
  • The simplified equation for classification finds the probability of Class A given Features 1 and 2
  • If Features 1 and 2 are seen, then the equation determines the probability the data is Class A
  • Numerator is the probability of Feature 1 given Class A, multiplied by the probability of Feature 2 given Class A, multiplied by the probability of Class A
  • Denominator is the probability of Feature 1 multiplied by the probability of Feature 2

Naive Bayes - Example

  • To calculate the probability of an unknown long, sweet and yellow fruit it is calculated in 4 steps
  • In the example of 1,000 fruits, a training dataset included a Banana, Orange and Other, with features Long, Sweet and Yellow
  • An example dataset breakdown: out of 500 bananas, 400 are long, 350 are sweet and 450 are yellow. Out of 300 oranges, none are long, 150 are sweet and 300 are yellow. Out of the remaining 200 fruit, 100 are long, 150 are sweet and 50 are yellow.

Naive Bayes Algorithm Steps

  1. Recognize probability. The probability of the class Banana given the features Long, Sweet and Yellow can be noted as P(Banana Long, Sweet, Yellow)
  2. Plug the data in that is familiar. The numerator is P(Long Banana) = 400/500 = 0.8, P(Sweet|Banana) = 350/500 = 0.7, P(Yellow Banana) = 450/500 = 0.9, P(Banana) = 500 / 1000 = 0.5
  3. Multiply everything together as in the equation. 0.8 x 0.7 x 0.9 x 0.5 = 0.252
  4. Ignore the denominator
  5. Do a similar calculation for the other classes.
    • P(Orange Long, Sweet, Yellow) = 0
    • P(Other Long, Sweet, Yellow) = 0.01875
    • Naive Bayes would classify this long, sweet and yellow fruit as a banana, because 0.252 is greater than 0.01875

Why Use Naive Bayes

  • Naive Bayes involve simple arithmetic by tallying counts, multiplying and dividing.
  • Classifying an unknown fruit involves calculating the probabilities for all the classes after frequency tables are calculated
  • Then simply choose the highest probability.
  • Naive Bayes can be effective spam filtering despite its simplicity.

Neural Network Overview

  • A neural network is a computing systemmade of simple, highly interconnected processing elements
  • Connects to external inputs via dynamic state response
  • Neural networks are organized in layers, and layers are made up of interconnected nodes
  • Nodes contain an activation function.
  • Patterns are presented to the network via the input layer.
  • The input layer sends a signal to one or more hidden layers
  • Actual processing using weighted connections occurs in the hidden layers and then connect to an output layer
  • The answer is output as shown in the image.

Three Layers of Neural Networks

  • There are Input, Hidden & Output Layers in neural networks
  • The input layer contains the neurons for the input of features.
  • One bias is added to the input layer in addition to the features.
  • N+1 neurons are present where N is the number of features.
  • The hidden layers are the intermediate layers between the input and output layers.
  • Layers can be any number of hidden layers.
  • There are more than one hidden layer and the network is called deep neural networks.
  • The neurons in the hidden layer get input from the input layer
  • They provide an output to the output layer.
  • The output layer contains the number of neurons based on the number of output classes.
  • A multi-class classification problem contains the number of neurons equal to the number of classes.
  • Binary classification contains one neuron.

Types of Neural Networks

  • Feed-Forward
  • Radial Basis Function (RBF)
  • Multilayer Perceptron
  • Convolutional
  • Recurrent
  • Modular

Types of Neural Networks Defined

  • Feed-forward networks are neural networks that only move forward to the output node, with no back feedback.
  • Radial Basis Function (RBF) networks measure the distance of data points with respect to the center.
  • Multilayer Perceptron networks have more than 2 layers with at least one hidden layer where the data is not linearly separable.
  • Convolutional Neural Networks have an advanced version of Multilayer Perceptron.
  • One or more convolutional layers filter mechanisms to enable activations, the location and strength of a detected feature
  • Recurrent Neural Network The output of a particular layer is saved and is put back into the input again, especially with text-to-speech conversion.
  • In a Modular Neural Network, independently functioning different networks carry out sub-tasks without interacting with each other

Applications of Neural Networks

  • Neural networks are used in images for character recognition, image classification or labeling, object detection and image generation.
  • Neural networks are used with for text Classification and Categorization and Language Generation and Document Summarization in natural languages
  • Neural Signals include Speech Recognition
  • Also applied in Aerospace, Automotive, Military,Electronics, Financial, Industrial, Medical, Telecommunications, Transportation, Software, Time Series Prediction, Signal Processing, Control, and Anomaly Detection

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

Use Quizgecko on...
Browser
Browser