CST8390: Business Intelligence & Data Analytics - Decision Trees PDF
Document Details

Uploaded by PersonalizedCottonPlant
Algonquin College
Tags
Summary
This document from Algonquin College covers decision trees in data analytics, including algorithms like ID3 and CART. It explains entropy, information gain, and the application of decision trees for classification and regression, including concepts such as pruning and random forests, and provides further reading.
Full Transcript
CST8390 BUSINESS INTELLIGENCE & DATA ANALYTICS Week 3 Classification – Decision Trees Structure of Tree Example Decision Trees Decision Trees Decision tree is a tree where: each node represents a feature (attribute) each branch represents a d...
CST8390 BUSINESS INTELLIGENCE & DATA ANALYTICS Week 3 Classification – Decision Trees Structure of Tree Example Decision Trees Decision Trees Decision tree is a tree where: each node represents a feature (attribute) each branch represents a decision (rule) each leaf represents an outcome (categorical or continuous values) Decision Tree One of the most popular ML algorithms Supervised learning method used for classification and regression. Decision trees built for a data set where the target column could be a real number are called regression trees. https://sefiks.com/2018/08/28/a-step-by-step-regression-decision-tree-example/ Decision Tree Algorithms ID3 (Iterative Dichotomiser 3) Uses Entropy function and Information gain as metrics CART (Classification and Regression Trees) Uses Gini Index as metric Classification using ID3 Algorithm Weather Dataset Based on weather conditions, predict Y or N for “Play”. Entropy Measure of the amount of impurity or uncertainty in the dataset Where S – current dataset for which entropy is being calculated C – set of classes in S Example: C= {yes, no} p(c) – The proportion of the number of elements in class c to the number of elements in S In ID3, entropy is calculated for each remaining attribute. The attribute with the smallest entropy is used to split the set Son the current iteration. Entropy If an event is highly predictable, then it has low entropy or low uncertainty Random probabilities have higher entropy, or higher uncertainty. Entropy(p1, p2, p3, … ,pn) = -p1log(p1) –p2log(p2) – p3log(p3).. -pnlog(pn) Information Gain Information gain is a measure used to determine which feature should be used to split the data at each internal node of the decision tree. Steps : 1.Calculate the entropy of the dataset before splitting it 2.Calculate the average entropy of the dataset after splitting it 3.Subtract the average entropy after splitting from the entropy before splitting 4.The result is the information gain Information Gain Where 𝐻 𝑆 - Entropy of set S T – Subset created by splitting S by attribute A p(t) – The proportion of the number of elements in t to the number of elements in S 𝐻(𝑡) – Entropy of subset t Metrics for Weather dataset Steps 1. Compute the entropy for the dataset 2. For every attribute: i. Calculate entropy for all categorical values ii. Take average for the current attribute iii. Calculate gain for the current attribute 3. Pick the attribute with highest gain 4. Repeat until we get the tree we desired Entropy for Weather dataset Out of 14 instances, 9 are classified as Yes and 5 as No Entropy of Outlook feature of Weather dataset Metrics Summary Outlook Temperature Average Entropy: 0.693 Average Entropy: 0.911 Information Gain: 0.247 Information Gain: 0.029 Humidity Windy Average Entropy: 0.788 Average Entropy: 0.892 Information Gain: 0.152 Information Gain: 0.048 As Outlook has the highest Information Gain, our root node is Outlook Initial Tree for Weather Dataset Final Decision Tree Pruning A technique for reducing the number of attributes used in tree- pruning. Prevent decision trees from overfitting the training data. Two types of pruning: Pre-pruning (forward pruning) Post-pruning (backward pruning) In pre-pruning, we decide during the building process when to stop adding attributes (possibly based on their information gain) Post-pruning waits until the full decision tree has been built and then prunes the attributes Random Forest It is a supervised learning ensemble algorithm. Ensemble algorithms are those which combine more than one algorithm of the same or different kind for classifying objects. Random forest builds multiple decision trees and mergers them together to get a more accurate and stable prediction. Bagging Method Weka Demo References Decision Tree: http://www.saedsayad.com/decision_tree.htm Covariance and correlation: http://www.dummies.com/education/math/business-statistics/how-to- measure-the-covariance-and-correlation-of-data-samples/