Decision Tree PDF
Document Details
Uploaded by ExceptionalEuropium
Hamdan Bin Mohammed Smart University
Dr. Amin Mohamed Shayea
Tags
Summary
This document details decision tree algorithms, their theories, and tutorials. It covers various aspects of decision tree algorithms. The document explains the algorithms, advantages, disadvantages and includes an example using Gini index and information gain.
Full Transcript
Artificial intelligence techniques Theories and Tutorials Decision Tree Decision Tree Decision Tree algorithm belongs to the family of supervised learning algorithms. Unlike other supervised learning algorithms, decision tree algorithm can be used for solving regression and classif...
Artificial intelligence techniques Theories and Tutorials Decision Tree Decision Tree Decision Tree algorithm belongs to the family of supervised learning algorithms. Unlike other supervised learning algorithms, decision tree algorithm can be used for solving regression and classification problems too. The general motive of using Decision Tree is to create a training model which can use to predict class or value of target variables by learning decision rules inferred from prior data(training data). The understanding level of Decision Trees algorithm is so easy compared with other classification algorithms. The decision tree algorithm tries to solve the problem, by using tree representation. Each internal node of the tree corresponds to an attribute, and each leaf node corresponds to a class label. 2 Decision Tree ◼ Most popular DT algorithms include ◼ ID3: Iterative Dichotomiser 3 ◼ C4.5, C5: Classifier Model ◼ CART: classification and regression trees. ◼ CHAID: Chi-squared Automatic Interaction Detection 3 Decision Tree A general algorithm for decision tree building 1. Place the best attribute of the dataset at the root of the tree. 2. Split the training set into subsets. Subsets should be made in such a way that each subset contains data with the same value for an attribute. 3. Repeat step 1 and step 2 on each subset until you find leaf nodes in all the branches of the tree. 4 Decision Tree ◼ DT algorithms mainly differ on 1. Splitting criteria ◼ Which variable, what value, etc. 2. Stopping criteria ◼ When to stop building the tree 3. Pruning (generalization method) ◼ Pre-pruning versus post-pruning 5 Decision Tree ◼ Splitting criteria ◼ Gini index determines the purity of a specific class as a result of a decision to branch along a particular attribute/value ◼ Used in CART. ◼ Information gain uses entropy to measure the extent of uncertainty or randomness of a particular attribute/value split ◼ Used in ID3, C4.5, C5 ◼ Chi-square statistics (used in CHAID) 6 Decision Tree Example: Construct a Decision Tree by using “Gini Index” as a criterion Features A B C D A B C D E >= 5 >=3.0 >= 4.2 >= 1.4 1 4.8 3.4 1.9 0.2 Positive 2 5 3 1.6 0.2 Positive = 0 6 14 5.7 2.8 4.5 1.3 Negative C 4.2 15 6.3 3.3 4.7 1.6 Negative = 1.4: 5 A B C D E 0 2 5 2 1 4.8 3.4 1.9 0.2 Positive 𝐺𝑖𝑛𝑖 0,5 = 1 − 5 + 5 = 0.0 2 5 3 1.6 0.2 Positive Value < 1.4: 11 3 5 3.4 1.6 0.4 Positive 8 2 3 2 𝐺𝑖𝑛𝑖 8,3 = 1 − 11 + 11 = 0.3966 4 5.2 3.5 1.5 0.2 Positive 5 5.2 3.4 1.4 0.2 Positive By adding weight and sum each of the gini indices: 6 4.7 3.2 1.6 0.2 Positive 7 4.8 3.1 1.6 0.2 Positive Cases 𝑔𝑖𝑛𝑖 𝑇𝑎𝑟𝑔𝑒𝑡, 𝐷 8 5.4 3.4 1.5 0.4 Positive 5 11 9 7 3.2 4.7 1.4 Negative = 16 ∗ 0.0 + 16 ∗ 0.3966 = 0.2727 10 6.4 3.2 4.5 1.5 Negative Target 11 6.9 3.1 4.9 1.5 Negative 12 5.5 2.3 4 1.3 Negative Positive Negative 13 6.5 2.8 4.6 1.5 Negative >= 0 5 14 5.7 2.8 4.5 1.3 Negative D 1.4 15 6.3 3.3 4.7 1.6 Negative = 5 >=3.0 >= 4.2 >= 1.4 𝑖∈𝐼 = 3 & class == negative: 4/12 2 5 3 1.6 0.2 Positive Entropy(8,4) = -1 * ( (8/12)*log2(8/12) + 3 5 3.4 1.6 0.4 Positive (4/12)*log2(4/12)) = 0.39054 4 5.2 3.5 1.5 0.2 Positive For VarB = 4.2 & class == negative: 6/6 3 5 3.4 1.6 0.4 Positive Entropy(0,6) = 0 4 5.2 3.5 1.5 0.2 Positive For VarC < 4.2 & class == positive: 8/10 5 5.2 3.4 1.4 0.2 Positive For Var C < 4.2 & class == negative: 2/10 6 4.7 3.2 1.6 0.2 Positive 7 4.8 3.1 1.6 0.2 Positive Entropy(8,2) = 0.72193 Cases 8 5.4 3.4 1.5 0.4 Positive Entropy(Target, C) = P(>=4.2) * E(0,6) + P(< 4.2) 9 7 3.2 4.7 1.4 Negative * E(8,2) 10 6.4 3.2 4.5 1.5 Negative = (6/16) * 0 + (10/16) * 0.72193 = 0.4512 11 6.9 3.1 4.9 1.5 Negative 12 5.5 2.3 4 1.3 Negative 13 6.5 2.8 4.6 1.5 Negative 𝑰𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏 𝑮𝒂𝒊𝒏 𝑪 14 5.7 2.8 4.5 1.3 Negative = 𝑬 𝑻𝒂𝒓𝒈𝒆𝒕 − 𝑬 𝑻𝒂𝒓𝒈𝒆𝒕, 𝑪 15 6.3 3.3 4.7 1.6 Negative = 𝟏 − 𝟎. 𝟒𝟓𝟏𝟐 = 𝟎. 𝟓𝟒𝟖𝟖 16 4.9 2.4 3.3 1 Negative 20 Decision Tree Example: Construct a Decision Tree by using “information gain” as a criterion Information gain for Var D Features Var D has value >=1.4 for 5 records out of 16 A B C D E and 11 records with value = 1.4 & class == positive: 0/5 2 5 3 1.6 0.2 Positive For Var D >= 1.4 & class == negative: 5/5 3 5 3.4 1.6 0.4 Positive Entropy(0,5) = 0 4 5.2 3.5 1.5 0.2 Positive For Var D < 1.4 & class == positive: 8/11 5 5.2 3.4 1.4 0.2 Positive For Var D < 14 & class == negative: 3/11 6 4.7 3.2 1.6 0.2 Positive 7 4.8 3.1 1.6 0.2 Positive Entropy(8,3) = -1 * ( (8/11)*log2(8/11) + Cases 8 5.4 3.4 1.5 0.4 Positive (3/11)*log2(3/11)) = 0.84532 9 7 3.2 4.7 1.4 Negative Entropy(Target, D) = P(>=1.4) * E(0,5) + P(< 1.4) 10 6.4 3.2 4.5 1.5 Negative * E(8,3) 11 6.9 3.1 4.9 1.5 Negative = 5/16 * 0 + (11/16) * 0.84532 = 0.5811575 12 5.5 2.3 4 1.3 Negative 13 6.5 2.8 4.6 1.5 Negative 𝑰𝒏𝒇𝒐𝒓𝒎𝒂𝒕𝒊𝒐𝒏 𝑮𝒂𝒊𝒏 𝑫 14 5.7 2.8 4.5 1.3 Negative = 𝑬 𝑻𝒂𝒓𝒈𝒆𝒕 − 𝑬 𝑻𝒂𝒓𝒈𝒆𝒕, 𝑫 15 6.3 3.3 4.7 1.6 Negative = 𝟏 − 𝟎. 𝟓𝟖𝟏𝟏𝟓𝟕𝟓 = 𝟎. 𝟒𝟏𝟏𝟖𝟗 16 4.9 2.4 3.3 1 Negative 21 Decision Tree Example: Construct a Decision Tree by using “information gain” as a criterion Features Target Target A B C D E Positive Negative Positive Negative 1 4.8 3.4 1.9 0.2 Positive 2 5 3 1.6 0.2 Positive >=5 5 7 >=3 8 4 3 5 3.4 1.6 0.4 Positive A B = 0 5 10 6.4 3.2 4.5 1.5 Negative C 4.2 D 1.4 11 6.9 3.1 4.9 1.5 Negative