Document Details

YouthfulLepidolite7339

Uploaded by YouthfulLepidolite7339

Butler University

Tags

data mining data analytics classification models predictive models

Summary

This document introduces core concepts in data mining, including classification, prediction, and association rules. It covers data exploration, model designs, and popular data mining workflows. The course provides an overview for anyone interested in data analytics.

Full Transcript

MS365 Business Analytics Data Mining Data Mining Core Ideas of Data Mining – Classification – Prediction – Association Rules & Recommenders – Visualization – Data Exploration – Data & Dimension Reduction 2 Classi...

MS365 Business Analytics Data Mining Data Mining Core Ideas of Data Mining – Classification – Prediction – Association Rules & Recommenders – Visualization – Data Exploration – Data & Dimension Reduction 2 Classification A class is a category A classification model creates categorical predictions Examples: Buy or don’t buy Spam or not spam Sick, well, or recovered Awake or asleep Eyes closed or eyes open 3 Classification Model Designs Model Type Description Example Binary Classification Two classes with exclusive Spam or Not Spam membership. An Buy or don’t Buy observation/prediction can Sick or Well only belong to one class. Multi-Class Classification More than two classes with Buy, Sell, or Trade exclusive membership. Car, Pedestrian, or Bicycle Multi-Label Classification Multiple classes that do Self driving cars can not have mutually exclusive identify a car, walker, and membership. bicycle simultaneously 4 Name that Model Is this a dog? Binary Classification It’s a dog or not a dog 5 Name that Model Is this a dog or a cat? Multi-Class Classification There are two classes Cat and Dog 6 Name that Model What is in this picture? Multi-Label Classification – There is a person and a dog 7 Predictive Models Similar to classification models, however, the output is a continuous numerical value. – Amount of sale – Amount to manufacture – Amount to hire – Amount to be paid 8 Association Rules & Recommenders 9 Data Exploration & Visualization Aimed at understanding the data Used for data cleaning and manipulation Can be used for hypothesis generation 10 Data Reduction & Dimension Reduction Data Reduction – Reducing the number of cases in the training data sometimes called clustering Dimension Reduction – Used to improve model performance by removing variables that lack strong prediction power. 11 Where to start?!? 12 Popular Data Mining Workflows SEMMA (SAS) CRISP-DM – Sample (SPSS/IBM) – Explore – Business – Modify Understanding – Model – Data Understanding – Assess – Data Preparation – Modeling – Evaluation – Deployment 13 Data Mining Techniques Supervised Learning – The goal of supervised learning is the predication of a single target outcome variable. – Can be used for classification or prediction models – Supervised learning requires training data. – Training data includes source data and corresponding labels. – Requires data partitioning to evaluate (validate) the created model. 14 Data Mining Techniques Unsupervised Learning – Models created when there is no known outcome variable to predict or classify. – Unsupervised modeling explores and identifies naturally occurring patterns and classifications in data. 15 Key Take Aways What are the core ideas of data mining? What is the difference between classification and predictive modeling? What is the difference between multi-class, multi-label, and binary classification models? What is the difference between supervised and unsupervised learning? 16 How To Get Started? 17