Podcast
Questions and Answers
Decision tree algorithms fall under which category of machine learning?
Decision tree algorithms fall under which category of machine learning?
- Unsupervised learning
- Supervised learning (correct)
- Semi-supervised learning
- Reinforcement learning
Decision tree algorithms can only be applied to classification problems and not regression problems.
Decision tree algorithms can only be applied to classification problems and not regression problems.
False (B)
Which of the following best describes decision trees?
Which of the following best describes decision trees?
- A white box containing a set of rules. (correct)
- A black box containing a set of rules.
- A support vector machine.
- A neural network with multiple hidden layers.
In a decision tree, the ______ of the tree test the attributes.
In a decision tree, the ______ of the tree test the attributes.
In the context of decision trees, what do the 'leaves' of the tree represent?
In the context of decision trees, what do the 'leaves' of the tree represent?
Describe the difference between binary classification and multi-class classification in the context of decision trees.
Describe the difference between binary classification and multi-class classification in the context of decision trees.
In decision trees, there is only one possible correct decision tree for a given dataset.
In decision trees, there is only one possible correct decision tree for a given dataset.
Which of the following is the primary goal when constructing a decision tree?
Which of the following is the primary goal when constructing a decision tree?
What does minimizing the expected number of tests help to achieve in the context of decision trees?
What does minimizing the expected number of tests help to achieve in the context of decision trees?
The measure of ______ is used to minimize the expected number of tests when classifying in decision trees.
The measure of ______ is used to minimize the expected number of tests when classifying in decision trees.
What is the purpose of calculating information gain?
What is the purpose of calculating information gain?
Define 'entropy' in the context of decision trees and information theory.
Define 'entropy' in the context of decision trees and information theory.
A higher entropy value signifies less uncertainty in a dataset.
A higher entropy value signifies less uncertainty in a dataset.
What does a low information gain suggest about an attribute?
What does a low information gain suggest about an attribute?
The attribute with the ______ information gain is chosen as the splitting attribute.
The attribute with the ______ information gain is chosen as the splitting attribute.
Match the following terms with their descriptions in the context of decision trees:
Match the following terms with their descriptions in the context of decision trees:
What is the first step in constructing a decision tree?
What is the first step in constructing a decision tree?
Once a decision tree model is developed, it cannot be used for prediction on new data.
Once a decision tree model is developed, it cannot be used for prediction on new data.
During the data preparation phase for a decision tree, what is a typical task done?
During the data preparation phase for a decision tree, what is a typical task done?
Name two algorithms used to construct decision trees.
Name two algorithms used to construct decision trees.
Flashcards
Supervised Algorithms
Supervised Algorithms
Algorithms that learn from labeled training data, used for classification or regression.
Regression Problem
Regression Problem
Predicting a continuous value, like temperature or credit amount.
Classification Problem
Classification Problem
Predicting a category, like whether an email is spam or not.
Decision Trees
Decision Trees
Signup and view all the flashcards
Nodes
Nodes
Signup and view all the flashcards
Leaves
Leaves
Signup and view all the flashcards
Binary Classification
Binary Classification
Signup and view all the flashcards
Multi-class Classification
Multi-class Classification
Signup and view all the flashcards
White Box
White Box
Signup and view all the flashcards
Dataset
Dataset
Signup and view all the flashcards
Data Division
Data Division
Signup and view all the flashcards
Random Attribute Selection
Random Attribute Selection
Signup and view all the flashcards
Algorithmic Attribute Selection
Algorithmic Attribute Selection
Signup and view all the flashcards
ID3
ID3
Signup and view all the flashcards
Data Pure
Data Pure
Signup and view all the flashcards
Data non Pure
Data non Pure
Signup and view all the flashcards
Best Attribute
Best Attribute
Signup and view all the flashcards
Entropy
Entropy
Signup and view all the flashcards
Information Gain
Information Gain
Signup and view all the flashcards
Tests Minimization
Tests Minimization
Signup and view all the flashcards
Study Notes
- This document is a course about decision trees in Artificial Intelligence.
- It is for Licence SSD & MID level.
- The course is prepared and taught by Pr. Sanae KHALI ISSA.
Introduction
- Decision tree algorithms are an example of supervised machine learning algorithms.
- They are applied to classification or regression problems, depending on the type of variables.
- Continuous variables are for regression problems.
- Categorical variables are for classification problems.
- Examples of regression problems include predicting sales, the amount of a credit, and the temperature.
- Examples of classification problems include predicting if a person is diabetic, if an email is spam, who a citizen might vote for and the club supported by a person.
- Decision tree algorithms are popular for classification problems.
- An example shows how to construct a knowledge base to predict the state of a new patient (sick or healthy) based on symptoms.
- A table shows the patient's temperature, sore throat, and whether they are sick.
- A decision tree shows how to classify patients based on these attributes.
Algorithm Principle
- Decision trees are classifiers for entities represented in an attribute/value format.
- The nodes of the tree test the attributes.
- There is a branch for each value of the tested attribute.
- The leaves indicate the classes.
- Binary classification has two classes.
- Multi-class classification has multiple classes.
- A decision tree is a white box containing a set of rules.
- Based on the values of an attribute, you can know which class an element belongs to.
- The classes C1, C2, C3, C7, C8, and C9 are shown, with V1, V2 being the values of attribute A1 and V'1, V'2, V'3 are the values of attribute V2
- Several rules show class determination logic, Si (A1=V1) & (A2 =V'1) alors C1 etc...
Decision Tree Example
- Attributes or variables are features used to make decisions.
- A table shows the attribute like the player note (Bon, Moyen, Mauvais) and the recorded result (Gagné, Nul).
- A decision tree illustrates how to classify players based on their note to predict the result.
Decision Tree Construction steps
- Step 1: Find a dataset and prepare the data for division into training and test sets.
- Step 2: Create a model by defining the general structure of the tree, specifying the root, branches, and leaves.
- Step 3: Apply the developed model to make predictions.
Decision Tree Construction
- To construct a decision tree, you should decide how to select the attribute that represents the root and the order of other attributes.
- Method 1 is random selection of the attributes.
- Method 2 is selecting attributes by applying a precise algorithm, such as ID3 (Iterative Dichotomiser 3), C4.5, C5 (successors of ID3), or CART (Classification And Regression Tree).
Attributes Selection
- Algorithm for constructing the tree is DT (T, E)
- If all examples of E are in the same class (Data pure), then assign the label Ci to the current node
- Else (Data non pure), select the best attribute A with values v1, v2, ..., vn
- Partition E according to v1, ..., vn into E1, ..., En
- For j=1 to n DT(Tj, Ej)
- With A={V1, V2, ..., Vn} and E=E1 U... U En
Attributes Selection Example
- In this example, a dataset allows to predict whether a game of tennis will take place depending on the following parameters: Sky, Temperature, Humidity and Wind
- The values for Jouer are either NON or OUI.
- E={e1, e2, 23, 24, 25, eb, e7, 28, e9, e10, e11, e12, e13, e14}
- The attributes are Sky, Temperature, Humidity, Wind
- One approach to selecting attributes is random selection
- Splitting on Sky creates three subsets, with branches for Soleil, Couvert, and Pluie.
- Humidite and Vent are used to divide the subsets again.
- A short exercise, Construire un arbre de décision pour le même problème en choisissant comme racine cette fois l'attribut Température (Construct a decision tree for the same problem, with the root of the tree being "Température").
- The goal is to find l'attribut le plus représentatif (The most representative attribute).
Attributes Selection Solution
- The "Température" tree is shown.
- The branches are "Frais", "Bon", and "Chaud".
- Possibility of constructing several correct decision trees, that may be simple or complicated
- Objective: construct a simple tree
- The attribute to select is the one that allows a simple tree
- Method: Minimizing the mathematical expectation/hope of the number of tests in order to classify a new object by means of the impurity measure
Impurity Measure Definition (Entropy)
- Boltzmann's entropy in thermodynamics and generalized Shanoon's entropy are discussed.
- Shanoon proposed it in 1949 to measure the entropy values for discrete probability distributions.
- It expresses the quantity of information, that is, the number of bits needed to specify the distribution.
- The formula for calculating information entropy is: I= -Σ Pi X log2 (Pi)
- Here Pi is the probability of class Ci/set of data.
- Exemple de calcul d'entropie (Example of calculation of the entropy)
- Using the weather data with a Jouer attribute from the previous example.
- n : number of examples | n1 : number of elements belonging to the NON class | P1: probability of the NON class | n2: number of elements belonging to the YES class | P2: the probability of the YES class.
- Equation: I=- Σ Pi X log2 (Pi) aveck: nombre de classes (with k : nb of classes)
- Results: I= 0,940 bits
Purity Measurement Solution
- l'ensemble de données se compose de (the datasSet is made off): 5 triangle and 9 square
- Calcul de l'entropie (compute the entropy)
- I=- Σ Pi X log2 (Pi)
- I= - P(triangle) x log2 (triangle) - P(square) x log2(square)
- I= -5/14 x log2(5/14) - 9/14 x log2(9/14)
- I= 0,940 bits
Entropy Gain Computation
- Entropy gain associated with an attribute:
- The best attribute is the one that maximizes the entropy gain
- To compute the entropy gain asscociated with the attribute name A, the equation is, Gain (S, A) = I (S) – Ires(A)
- Avec (with) Ires(A) = ∑ P(v) I(v), avec v appartient à A (with V belonging to A)
- Legend; S: ensemble de jeu de données (dataset) | I(S): entropie de l'ensemble de jeu de données (dataSet entropy)| A: nom de l'attribut (attribute name) | Ires (A): le résiduel de A (Ires(A): the residual of a A | P(v): probabilité des classes dans l'ensemble des données (probability of the courses within the dataset) | v: ensemble des valeurs de l'attribut A (v: name of the label of A | I(v): entropie de v (I(v): entropy og v).
Entropic gain computation
- S = {1, 2, ..., 14} | I(S) = 0,940 bits (already computed)
- l'attribut Color
- v1, v2, v3 sont les valeurs de l'attribut Color = { green, yellow, red} ( v1, v2, v3 are the possible value of the attribute)
- P(v1) = P(green) = 6/14 | P(v2) = P(yellow) = 3/14 | P(v3) = P(red) = 5/14
- Ires (Color) = P(v1) x I(v1) + P(v2) x I(v2) + P(v3) x I(v3)
- Gain( S, Color) = 0,428
Example: Entropy Gain Linked Attribute Outline
- Using S = {1, 2, ..., 14}, and already calculated I(S) = 0,940 bits
- For the attribute Outline, possible values are dashed, solid
- Given P(dashed) = 7/14 and P(solid) = 7/14
- Gain(S, Outline) = I(S) - Ires(Outline) and Ires(Outline) = P(dashed) * i(dashed) + P(solid) * i(solid)
- Gain(S, Outline) = 0,940 - 0,789 =0,151 bits.
Example: Entropy Gain Linked To "Dot" Attribute
- S is dataset one examples (0,940 bits)
- Attribute dot has 2 options "yes" and "no'
- Then P(yes) is 6/14 and P(No) = 8/14, which are used to compute Gain(S,Dot)
Calculating Gain
- Gain is, by result analysis found the attribute "Color" (with 0,428) gives best split of examples
- So make tree from that.
- Refaire le même calcul pour partitionner les deux ensembles E2 et E3 et en calculant (Redo math for partition of E2 and E3 examples sets by computation)
- Gain(Outline) = ?
- Gain (Dot) = ?
Calculating gain solution
- Si Color = red and Dot= yes | alors shape = triangle (then shape = triangle) | the others ways, shape = squarre.
- The results gain for E3 examples and dot (E3, dot) found maximal dot gain to 0,0971.
- si Color = green and Outline = dashed, the result shape to triangle. Other wise if colors = green and lines its solid example the result of shape the example square.
- Final step to find the Maximal to be complete set of computation to result gain from the examples
Final Algorithm Summary
- The final Decision Tree structure looks like with these attributes and classifications:
- Root node Color splits into red, yellow, and green.
- The tree leads to complete classification of the shapes
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.