Podcast
Questions and Answers
What does each node represent in a decision tree?
What does each node represent in a decision tree?
- An outcome or end result
- A statistical probability
- A possible decision
- A feature (attribute) (correct)
Which of the following best describes a decision tree?
Which of the following best describes a decision tree?
- A diagram or chart used to determine a course of action or show a statistical probability (correct)
- A method of organizing data in a hierarchical, non-linear way
- A detailed report showing all past decisions made by an organization
- A complex algorithm used to determine a specific calculation
The process of dividing a node into sub-nodes is known as pruning.
The process of dividing a node into sub-nodes is known as pruning.
False (B)
What does the term 'splitting' refer to in the context of decision trees?
What does the term 'splitting' refer to in the context of decision trees?
In decision tree terminology, what is the 'root node'?
In decision tree terminology, what is the 'root node'?
A node that does not split into further sub-nodes is called a ______ or Terminal node.
A node that does not split into further sub-nodes is called a ______ or Terminal node.
What is the process of removing sub-nodes from a decision node called?
What is the process of removing sub-nodes from a decision node called?
Pruning is the opposite of splitting.
Pruning is the opposite of splitting.
Which statement is true regarding Parent and Child Nodes:
Which statement is true regarding Parent and Child Nodes:
What is the main goal of the ID3 algorithm?
What is the main goal of the ID3 algorithm?
Which of the following algorithms is an extension of the D3 algorithm?
Which of the following algorithms is an extension of the D3 algorithm?
Which algorithm preforms multi-level splits when computing classification trees?
Which algorithm preforms multi-level splits when computing classification trees?
Which of the following is a successor to ID3?
Which of the following is a successor to ID3?
What is the purpose of calculating entropy in the context of the ID3 algorithm?
What is the purpose of calculating entropy in the context of the ID3 algorithm?
In the ID3 algorithm, what does Information Gain indicate?
In the ID3 algorithm, what does Information Gain indicate?
Iterative Dichotomiser 3 is a step in creating a desicion tree.
Iterative Dichotomiser 3 is a step in creating a desicion tree.
In dataset analysis for creating decision trees, what is the value that you are trying to achieve when picking the attribute?
In dataset analysis for creating decision trees, what is the value that you are trying to achieve when picking the attribute?
What kind of approach does the ID3 algorithm follow?
What kind of approach does the ID3 algorithm follow?
What is the formula for entropy?
What is the formula for entropy?
Match the following terms with their correct descriptions:
Match the following terms with their correct descriptions:
Which of these data mining algorithms uses a probabilistic machine learning algorithm?
Which of these data mining algorithms uses a probabilistic machine learning algorithm?
What is a Naive Bayes Classifier based on?
What is a Naive Bayes Classifier based on?
What assumption regarding predictors does a Naive Bayes classifier make?
What assumption regarding predictors does a Naive Bayes classifier make?
Naive Bayes is called naive because it assumes features in a dataset are always dependent.
Naive Bayes is called naive because it assumes features in a dataset are always dependent.
A fruit is red, round, and about 3 inches in diameter. In reality, color and dimension are depedent variables, but if they independently contribute to the probability that this fruit is an apple what type of algorithm is being used?
A fruit is red, round, and about 3 inches in diameter. In reality, color and dimension are depedent variables, but if they independently contribute to the probability that this fruit is an apple what type of algorithm is being used?
In machine learning one may want to use a ______ in situations like with spam filtering.
In machine learning one may want to use a ______ in situations like with spam filtering.
Which of the following best describes the application of Bayes' Theorem?
Which of the following best describes the application of Bayes' Theorem?
In the context of neural networks, what is an 'activation function'?
In the context of neural networks, what is an 'activation function'?
In a neural network, where is the actual processing done?
In a neural network, where is the actual processing done?
In a typical neural network, how are layers organized?
In a typical neural network, how are layers organized?
Layers in neural networks are made up of interconnected '______' which contain an 'activation function'.
Layers in neural networks are made up of interconnected '______' which contain an 'activation function'.
An input layer has weighted connections to output layers.
An input layer has weighted connections to output layers.
What is the key characteristic of a deep neural network?
What is the key characteristic of a deep neural network?
For binary classification, how many neurons does a neural network output layer contain?
For binary classification, how many neurons does a neural network output layer contain?
If there are n features, how many nuerons does the input layer contain?
If there are n features, how many nuerons does the input layer contain?
Match the following description of neural networks to its name.
Match the following description of neural networks to its name.
What is the purpose of the radial basis function in RBF networks?
What is the purpose of the radial basis function in RBF networks?
Which type of neural network is known for retaining information in next layers?
Which type of neural network is known for retaining information in next layers?
What is a key feature of modular neural networks?
What is a key feature of modular neural networks?
Why would someone use a Convolutional?
Why would someone use a Convolutional?
Which of the following applications is commonly associated with neural networks?
Which of the following applications is commonly associated with neural networks?
Flashcards
What is a Decision Tree?
What is a Decision Tree?
A diagram or chart used to determine a course of action or show a statistical probability.
Node (in a decision tree)
Node (in a decision tree)
Represents a feature or attribute in a decision tree.
Branch (in a decision tree)
Branch (in a decision tree)
Represents a possible decision or reaction in a decision tree.
Leaf (in a decision tree)
Leaf (in a decision tree)
Signup and view all the flashcards
Classification Trees
Classification Trees
Signup and view all the flashcards
Regression Trees
Regression Trees
Signup and view all the flashcards
Root Node
Root Node
Signup and view all the flashcards
Splitting (in a decision tree)
Splitting (in a decision tree)
Signup and view all the flashcards
Decision Node
Decision Node
Signup and view all the flashcards
Leaf / Terminal Node
Leaf / Terminal Node
Signup and view all the flashcards
Pruning (in a decision tree)
Pruning (in a decision tree)
Signup and view all the flashcards
Branch / Sub-Tree
Branch / Sub-Tree
Signup and view all the flashcards
Parent Node
Parent Node
Signup and view all the flashcards
Child Node
Child Node
Signup and view all the flashcards
ID3 Algorithm
ID3 Algorithm
Signup and view all the flashcards
C4.5 Algorithm
C4.5 Algorithm
Signup and view all the flashcards
CART Algorithm
CART Algorithm
Signup and view all the flashcards
CHAID Algorithm
CHAID Algorithm
Signup and view all the flashcards
MARS Algorithm
MARS Algorithm
Signup and view all the flashcards
What is ID3 Algorithm?
What is ID3 Algorithm?
Signup and view all the flashcards
Entropy
Entropy
Signup and view all the flashcards
Information Gain IG(A)
Information Gain IG(A)
Signup and view all the flashcards
Naïve Bayes Algorithm
Naïve Bayes Algorithm
Signup and view all the flashcards
Why is Naive called Naive?
Why is Naive called Naive?
Signup and view all the flashcards
Bayes Equation
Bayes Equation
Signup and view all the flashcards
Basics of Neural Networks
Basics of Neural Networks
Signup and view all the flashcards
Input Layer
Input Layer
Signup and view all the flashcards
Hidden Layer
Hidden Layer
Signup and view all the flashcards
Output Layer
Output Layer
Signup and view all the flashcards
What is Input Layer?
What is Input Layer?
Signup and view all the flashcards
What is Hidden Layer?
What is Hidden Layer?
Signup and view all the flashcards
Convolutional
Convolutional
Signup and view all the flashcards
Recurrent
Recurrent
Signup and view all the flashcards
Modular
Modular
Signup and view all the flashcards
Study Notes
Decision Tree Overview
- A decision tree is a diagram or chart used to determine a course of action or show statistical probability.
- Each node represents a feature or attribute.
- Each branch represents a possible decision, rule, or reaction.
- A leaf represents an outcome or the end result and they are the farthest branches of the tree.
Types of Decision Trees
- Classification trees have 'yes' or 'no' types of outcomes such as fit or unfit and uses categorical decision variables.
- Regression trees use continuous data types for the decision or outcome variable, such as a number like 123.
Decision Tree Sample Problem
- A decision tree can be used to predict if a person is fit based on their age, eating habits and physical activity using questions containing binary trees.
- In another sample problem, one can predict if a customer will pay their renewal premium with an insurance company
- Customer income is a significant variable, and a decision tree can predict customer income based on occupation, product and other variables
- This predicts values for continuous variables.
Important Decision Tree Terminology
- The root node represents the entire population or sample that is further divided into two or more homogeneous sets.
- Splitting is the process of dividing a node into two or more sub-nodes.
- A decision node is when a sub-node splits into further sub-nodes.
- A leaf/terminal node does not split.
- Pruning is removing sub-nodes from a decision node, essentially the opposite of splitting.
- A branch/sub-tree is a subsection of the entire tree.
- A parent node is divided into sub-nodes, and sub-nodes are the child of a parent node.
Decision Tree Algorithms
- ID3 is an extension of D3 decision tree
- C4.5 is a successor of the ID3 decision tree algorithm
- CART is the Classification And Regression Tree algorithm
- CHAID is the Chi-square automatic interaction detection algorithm, and it performs multi-level splits when computing classification trees
- MARS is the multivariate adaptive regression splines algorithm
Decision Tree with ID3 Algorithm
- The Iterative Dichotomiser 3, known as ID3 Algorithm is one of the best algorithms for creating decision trees
- J. Ross Quinlan developed the ID3 algorithm
- It is the core algorithm for building decision trees, and it is a supervised learning algorithm used for classification problems.
- It is a classification algorithm that follows a greedy approach by selecting the best attribute for a node that yields maximum Information Gain(IG) or minimum Entropy(H).
- Entropy, also called Shannon Entropy, is denoted by H(S) for a finite set S, and is the measure of the amount of uncertainty or randomness in data.
- Information Gain IG(A) tells how much uncertainty in S reduced after splitting set S on attribute A.
Steps to create a Decision Tree
- Calculate Entropy (the amount of uncertainty in a dataset, based on the number of positive and negative evidences).
- Compute the entropy of each attribute.
- Calculate average information.
- Calculate information gain (difference in entropy before and after splitting dataset on attribute A).
Decision Tree creation: Root Node
- Creating a Root Node means choosing the attribute that best classifies the training data at the root of the tree.
- Calculate the number of positive and negative examples or evidences
- To compute the Entropy for dataset Entropy(S) to calculate the number
- Determine the entropy for all other values.
- Take Average Information Entropy for the current attribute, then calculate Average Information Entropy
- Calculate Information Gain for the current attribute, then calculate the gain for each attribute
- Pick the Highest Gain Attribute for the node
Repeat Algorithm
- After picking the highest gain attribute, repeat the same procedure for sub-trees until we get the tree we desired (the last node should be the leaf node)
- Dataset examples showing Outlook as Sunny or Rainy use this method until every potential data set is included in the tree
Naive Bayes Algorithm
- Supervised learning is used with the Naive Bayes Algorithm
- An assumption of independence among predictors is used in this class of algorithms
Algorithm Characteristics
- The assumption is that all features of a dataset are independent
- An apple is considered to be red, round, and about 3 inches in diameter, but even if these features depend on each other its properties contribute to the probability it is an apple
- Thomas Bayes was an English statistician, and Bayes' Theorem was named after him
- Allows users to predict the class given set of features using probability
- The simplified equation for classification finds the probability of Class A given Features 1 and 2
- If Features 1 and 2 are seen, then the equation determines the probability the data is Class A
- Numerator is the probability of Feature 1 given Class A, multiplied by the probability of Feature 2 given Class A, multiplied by the probability of Class A
- Denominator is the probability of Feature 1 multiplied by the probability of Feature 2
Naive Bayes - Example
- To calculate the probability of an unknown long, sweet and yellow fruit it is calculated in 4 steps
- In the example of 1,000 fruits, a training dataset included a Banana, Orange and Other, with features Long, Sweet and Yellow
- An example dataset breakdown: out of 500 bananas, 400 are long, 350 are sweet and 450 are yellow. Out of 300 oranges, none are long, 150 are sweet and 300 are yellow. Out of the remaining 200 fruit, 100 are long, 150 are sweet and 50 are yellow.
Naive Bayes Algorithm Steps
- Recognize probability. The probability of the class Banana given the features Long, Sweet and Yellow can be noted as P(Banana Long, Sweet, Yellow)
- Plug the data in that is familiar. The numerator is P(Long Banana) = 400/500 = 0.8, P(Sweet|Banana) = 350/500 = 0.7, P(Yellow Banana) = 450/500 = 0.9, P(Banana) = 500 / 1000 = 0.5
- Multiply everything together as in the equation. 0.8 x 0.7 x 0.9 x 0.5 = 0.252
- Ignore the denominator
- Do a similar calculation for the other classes.
- P(Orange Long, Sweet, Yellow) = 0
- P(Other Long, Sweet, Yellow) = 0.01875
- Naive Bayes would classify this long, sweet and yellow fruit as a banana, because 0.252 is greater than 0.01875
Why Use Naive Bayes
- Naive Bayes involve simple arithmetic by tallying counts, multiplying and dividing.
- Classifying an unknown fruit involves calculating the probabilities for all the classes after frequency tables are calculated
- Then simply choose the highest probability.
- Naive Bayes can be effective spam filtering despite its simplicity.
Neural Network Overview
- A neural network is a computing systemmade of simple, highly interconnected processing elements
- Connects to external inputs via dynamic state response
- Neural networks are organized in layers, and layers are made up of interconnected nodes
- Nodes contain an activation function.
- Patterns are presented to the network via the input layer.
- The input layer sends a signal to one or more hidden layers
- Actual processing using weighted connections occurs in the hidden layers and then connect to an output layer
- The answer is output as shown in the image.
Three Layers of Neural Networks
- There are Input, Hidden & Output Layers in neural networks
- The input layer contains the neurons for the input of features.
- One bias is added to the input layer in addition to the features.
- N+1 neurons are present where N is the number of features.
- The hidden layers are the intermediate layers between the input and output layers.
- Layers can be any number of hidden layers.
- There are more than one hidden layer and the network is called deep neural networks.
- The neurons in the hidden layer get input from the input layer
- They provide an output to the output layer.
- The output layer contains the number of neurons based on the number of output classes.
- A multi-class classification problem contains the number of neurons equal to the number of classes.
- Binary classification contains one neuron.
Types of Neural Networks
- Feed-Forward
- Radial Basis Function (RBF)
- Multilayer Perceptron
- Convolutional
- Recurrent
- Modular
Types of Neural Networks Defined
- Feed-forward networks are neural networks that only move forward to the output node, with no back feedback.
- Radial Basis Function (RBF) networks measure the distance of data points with respect to the center.
- Multilayer Perceptron networks have more than 2 layers with at least one hidden layer where the data is not linearly separable.
- Convolutional Neural Networks have an advanced version of Multilayer Perceptron.
- One or more convolutional layers filter mechanisms to enable activations, the location and strength of a detected feature
- Recurrent Neural Network The output of a particular layer is saved and is put back into the input again, especially with text-to-speech conversion.
- In a Modular Neural Network, independently functioning different networks carry out sub-tasks without interacting with each other
Applications of Neural Networks
- Neural networks are used in images for character recognition, image classification or labeling, object detection and image generation.
- Neural networks are used with for text Classification and Categorization and Language Generation and Document Summarization in natural languages
- Neural Signals include Speech Recognition
- Also applied in Aerospace, Automotive, Military,Electronics, Financial, Industrial, Medical, Telecommunications, Transportation, Software, Time Series Prediction, Signal Processing, Control, and Anomaly Detection
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.