Data Mining & Analytics Chapter 1 PDF
Document Details
Uploaded by ResoundingBoston
University of Alabama in Huntsville
Tommie Singleton
Tags
Summary
This document is an introductory chapter on data mining and analytics, outlining the fundamental concepts, aims, and benefits of data-driven decision making in business. It discusses data mining techniques and the importance of understanding data to gain insights for improvements and development of strategies.
Full Transcript
Tommie Singleton, Ph.D, DATA MINING & ANALYTICS CPA, CITP, CISA (256) 762-5252 CHAPTER 1 [email protected] 1 CHAPTER 1: DATA MINING CONCEPTS AIMS OF BOOK/COURSE...
Tommie Singleton, Ph.D, DATA MINING & ANALYTICS CPA, CITP, CISA (256) 762-5252 CHAPTER 1 [email protected] 1 CHAPTER 1: DATA MINING CONCEPTS AIMS OF BOOK/COURSE Built on expertise from business & industry (from an accountant’s perspective) Give tested guidance in developing workable solutions to typical business problems Provide solutions for common business problems adaptable to your area of interest Focus on practical solutions based on sound statistical, data mining theory 2 CHAPTER 1: DATA MINING CONCEPTS AIMS OF BOOK/COURSE Book uses typical marketing and sales problems to illustrate problems in all areas of business The abundance of domain knowledge in this area The ability of students to understand what and why of solutions and to relate to same The possibility of access to sufficient volume of appropriate empirical data But will provide suggestions on how those techniques transfer effectively to other sectors of business 3 CHAPTER 1: DATA MINING CONCEPTS Data Mining, like many other technical solutions, require a modicum of knowledge from other domains None more valuable than statistics Multiple classes in stats might be a good foundation Understanding certain Statistical Tests and their constraints are needed Error, deviations, mean/median/mode, observable least squares, regression analysis, etc. 4 CHAPTER 1: DATA MINING CONCEPTS 5 CHAPTER 1: DATA MINING CONCEPTS Growth of Data in Business Typical processes of recent past Input Devices Data-1 Information Data-2 Data Mining Computers Accounting or Extract data in Data Knowledge Financial meaningful Warehouse Keying in Systems such as Strategic transactions to form Merge in extant Information ERP System e.g., Print data Application Often limited to billing current fiscal statements year Server farms / Mobile phones Before this Virtual Servers century, most Mobile devices businesses File Transfer of stopped here data from input systems OLAP 6 CHAPTER 1: DATA MINING CONCEPTS Growth of Data in Business 27+ Big Data Statistics - How Big It Actually Is in 2022? (techjury.net) 7 CHAPTER 1: DATA MINING CONCEPTS Growth of Data in Business The ability to extract useful, but usually hidden, knowledge from data is becoming increasingly important in today’s competitive world When data is used for various predictions (buyer’s choices, finances, sales, growth, behaviors ) THAT is an advantage, possibly a competitive advantage 8 CHAPTER 1: DATA MINING CONCEPTS The Value of this Data Useful for decision making, especially when all the necessary facts or data cannot be collected or are unknown Can be made into intelligence (knowledge) using your OWN data! Can make unseen, hidden patterns and opportunities become seen, visible and understood WHY it matters, WHAT matters and provide DIRECTION or STRATEGY or SOLUTION Pricing / Financial matters Competitive advantage embedded into business processes or services How customers and prospects are LIKELY to behave 9 CHAPTER 1: DATA MINING CONCEPTS BUT humans are no longer able to perform these objectives in their heads Machine Learning + stats = Data Mining 10 11 CHAPTER 1: DATA MINING CONCEPTS Domain Knowledge (DK) All the ADDITIONAL (The body of) information that we have about a scenario Example: gaps in the data (e.g., missing field/variable-null, zero value where should be a number, etc.) Does that mean it was absent/wrong? Yes. DK may be able to tell us that the production was halted for a period for an incident beyond our control (e.g., key machinery broken, power failure, etc.) Then we can treat the gap data as not really zero, or missing because of omission but zero for a specific reason (WHY) 12 CHAPTER 1: DATA MINING CONCEPTS Domain Knowledge (DK) All the ADDITIONAL information that we have about a scenario DK includes meta-data (data about data: example email headers) Example: monitoring sales via quantities sold and sales price concluding we are not meeting our goals. But metadata about staffing may help understand WHY (e.g., it is NOT the pricing or quantities but insufficient staffing) 13 Data Data Data Mining Model Analytics 14 CHAPTER 1: DATA MINING CONCEPTS Results Modeling data analysis (could be a standard model, or a custom one) Example: Scorecard for each customer (numerical worth of each customer) Recency / Frequency / Monetary Value methodology Recency (bought last time 35 days ago, formula rates it as a 10%, and as 1/3 of SCORE is 3.3) Frequency (bought on average twice a month, formula rates it as 60%, and equals 19) Monetary Value (buys highest priced items and averages $125/item, formula rates it as 100%, 33.3) SCORE = 3.3 + 19 + 33.3 = 55.6 15 CHAPTER 1: DATA MINING CONCEPTS Associated Concepts CRM analysis is complementary to information on company reports and Marketing Dashboard (MD) MD may contain a summary of purchases of customers in different groupings and how they have changed from prior periods Numbers may be actual, predicted, or combination of two EXAMPLES: Customer groups who buy best in summer, response rate of 20% KPIs: Click rate, response rate, cost per order 16 CHAPTER 1: DATA MINING CONCEPTS Associated Concepts Analytics: general name for data analysis and decision making Descriptive Analytics: describing the features of data (Descriptive Stats) Predictive Analytics: Modeling (e.g., linear regression (OLS), Discriminant Analysis, neural networks, ANOVA, etc.) 17 CHAPTER 1: DATA MINING CONCEPTS Global Appeal Companies are increasingly aware that their vast reserves of data contain a wealth of information Some have been exploiting this fact for years (Walmart, Amazon) Now healthcare, government, banks, and most other industries are either using data mining-analytics, or working towards it 18 CHAPTER 1: DATA MINING CONCEPTS Data sets used in this book Mail Order Warehouse (familiarity) Purchase details Communication information Demographics 50k customers 19 20 21 CHAPTER 1: DATA MINING CONCEPTS Recipe Structure Industry: Sector or type of business Mail order, publisher, online shop, supermarket Areas of Interest: Area of interest within the business Marketing, sales, promotions, online, cost (a part, final product, shipping, etc.) Challenge: Key factor needed to reach objective for analysis Defining the problem - solution Right number of customers to optimize ROI Life Cycle Cost Typical Application: More specific factors Necessary Data: All the data this is vital for effectual analysis Has a direct impact on solution/analysis 22 CHAPTER 1: DATA MINING CONCEPTS Recipe Structure Population: The parameters of data needed for quality stats Target Variable: The decisive variable of interest Buying vs Not buying, Quantity/Dollars of sales, Costs Input Data: Variables for analysis Must Haves: Key variables needed Nice to Haves: Other variables that could improve modeling but may be difficult to find or construct Data Mining Methods: Usually a choice between a few different applicable methods How To Do It: Data preparation to Implementation (results of analysis) 23 CHAPTER 1: DATA MINING CONCEPTS Recipe Structure Data Preparation: Specific features of preparing data for each recipe/project Business Issues: Strategy Changes Examples: sales channels, locations, products Transformation: When there is a need to transform one or more variables Example: Input variable to indicator variable, a value between 0-1.0 (normalize) Target Database: Creating the dataset necessary for adequate analysis Analytics: Partitioning to Validation below 24 CHAPTER 1: DATA MINING CONCEPTS Recipe Structure Partitioning the Data: Sample size, Stratification, etc. necessary for stats or analysis; bundling data or segregating data Pre-Analytics: Tasks needed PRIOR to running analysis Examples: Screening out some variables, null/zero values, events/transactions with missing variables Model Building: Models are built by obtaining the best-fit approach, somewhat contingent on methodology/stats Evaluation: How well the analytics process performed in delivering value Quality of the model’s usefulness 25 CHAPTER 1: DATA MINING CONCEPTS Recipe Structure Validation: Making sure solution addresses the business problem in terms of value to the business Sanity / face check – compare results with common viewpoints or common sense How well model fits data (applying data to different data subsets and compare results) Implementation: Address the original objective and discuss how to implement the results Hints & Tips: Proprietary to each recipe/project E.g., Suggestions for future use of model How to Sell to Management: Tables, Plots, etc. (visual aids) 26 Tommie Singleton, Ph.D, CPA, CITP, CISA DATA MINING & ANALYTICS (256) 762-5252 [email protected] 27 VARIABLES: CUSTOMER NAME CNAME CUSTOMER ADDRESS CADDR CUSTOMER CITY CCITY CUSTOMER STATE CST CUSTOMER ZIP CZIP CUSTOMER PHONE CPHN FILE: CUSTOMERS Think of a file as an Excel spreadsheet … Each row contains one and only one set of values for all variables for a distinct customer Each column contains values for all customers for a specific characteristic: e.g., CPHN BACK 28