Business Analytics Overview PDF
Document Details
University of Manitoba
2020
Farhan Islam
Tags
Summary
This document provides an overview of business analytics, including definitions, types, and the CRISP-DM process. It also covers data preparation, modeling, and deployment examples.
Full Transcript
Business Analytics Overview IDM 2020 Farhan Islam umanitoba.ca/asper Objective Business analytics definition and types Business analytics process (CRISP-DM) Data analytics techniques Challenges and limitations umanito...
Business Analytics Overview IDM 2020 Farhan Islam umanitoba.ca/asper Objective Business analytics definition and types Business analytics process (CRISP-DM) Data analytics techniques Challenges and limitations umanitoba.ca/asper Definition Examples? Business Analytics is the practice of using data to make informed business decisions and optimize performance. umanitoba.ca/asper Why is it important? Benefits? Data-driven decision making Competitive advantage Improved customer experience Cost reduction Employee productivity and engagement Forecasting and strategic planning umanitoba.ca/asper Types of Business Analytics Descriptive Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics umanitoba.ca/asper Types of Business Analytics Descriptive Analytics: Descriptive analytics involves examining past data to understand what has happened. It summarizes historical data and provides insights into patterns, trends, and key metrics. Diagnostic Analytics: Diagnostic analytics goes beyond descriptive analytics by aiming to understand why something happened. It involves delving into the data to identify the root causes of a particular outcome. Predictive Analytics: Predictive analytics involves using historical data and statistical techniques to predict future events or outcomes. It utilizes patterns, decision trees, and relationships found in the data to estimate what might happen in the future. Prescriptive Analytics: Prescriptive analytics takes data analysis further by recommending specific actions to optimize outcomes based on predictive models and business objectives. It uses advanced algorithms, optimization techniques, and simulation to generate actionable insights. umanitoba.ca/asper CRISP-DM CRoss-Industry Standard Process for Data Mining Why Should There be a Standard Process? The data mining process must be reliable and repeatable by people with little data mining background. Framework for recording experience Allows projects to be replicated Aid to project planning and management “Comfort factor” for new adopters Demonstrates maturity of Data Mining Reduces dependency on “stars” umanitoba.ca/asper CRISP-DM Data Mining methodology Process Model For anyone Provides a complete blueprint Life cycle: 6 phases umanitoba.ca/asper CRISP-DM (Phases) Business Understanding Understanding project objectives and requirements Data mining problem definition Data Understanding Initial data collection and familiarization Identify data quality issues Initial, obvious results Data Preparation Record and attribute selection Data cleansing Modeling Run the data mining tools Evaluation Determine if results meet business objectives Identify business issues that should have been addressed earlier Deployment Put the resulting models into practice Set up for repeated/continuous mining of the data umanitoba.ca/asper CRISP-DM (Phases and Tasks) Business Data Data Modeling Evaluation Deployment Understanding Understanding Preparation Determine Collect Select Select Evaluate Plan Business Initial Modeling Data Results Deployment Objectives Data Technique Plan Monitering Assess Describe Clean Generate Review & Situation Data Data Test Design Process Maintenance Determine Produce Explore Construct Build Determine Data Mining Final Data Data Model Next Steps Goals Report Verify Produce Integrate Assess Review Data Project Plan Data Model Project Quality Format Data umanitoba.ca/asper CRISP-DM (Phase 1: Business Understanding) Determine business objectives - thoroughly understand, from a business perspective, what the client really wants to accomplish - uncover important factors, at the beginning, that can influence the outcome of the project - neglecting this step is to expend a great deal of effort producing the right answers to the wrong questions Assess situation - more detailed fact-finding about all of the resources, constraints, assumptions and other factors that should be considered - flesh out the details umanitoba.ca/asper CRISP-DM (Phase 1: Business Understanding) Determine data mining goals - a business goal states objectives in business terminology - a data mining goal states project objectives in technical terms ex) the business goal: “Increase catalog sales to existing customers.” a data mining goal: “Predict how many widgets a customer will buy, given their purchases over the past three years, demographic information (age, salary, city) and the price of the item.” Produce project plan - describe the intended plan for achieving the data mining goals and the business goals - the plan should specify the anticipated set of steps to be performed during the rest of the project including an initial selection of tools and techniques umanitoba.ca/asper CRISP-DM (Phase 2: Data Understanding) Explore the Data Verify the Quality examine the quality of the data, addressing questions such as: “Is the data complete?”, Are there missing values in the data?” Find Outliers Starts with an initial data collection and proceeds with activities in order to get familiar with the data, to identify data quality problems, to discover first insights into the data or to detect interesting subsets to form hypotheses for hidden information. umanitoba.ca/asper CRISP-DM (Phase 3: Data Preparation) Data preparation: Takes usually over 90% of the time – Collection Covers all activities to construct – Assessment the final dataset from the initial raw data. Data preparation tasks are – Consolidation and Cleaning likely to be performed multiple table links, aggregation level, missing values, etc times and not in any prescribed – Data selection order. Tasks include table, record active role in ignoring non-contributory data? and attribute selection as well as outliers? transformation and cleaning of data Use of samples for modeling tools. visualization tools – Transformations - create new variables umanitoba.ca/asper CRISP-DM (Phase 3: Data Preparation) Select data - decide on the data to be used for analysis - criteria include relevance to the data mining goals, quality and technical constraints such as limits on data volume or data types - covers selection of attributes as well as selection of records in a table Clean data - raise the data quality to the level required by the selected analysis techniques - may involve selection of clean subsets of the data, the insertion of suitable defaults or more ambitious techniques such as the estimation of missing data by modeling umanitoba.ca/asper CRISP-DM (Phase 3: Data Preparation) Construct data - constructive data preparation operations such as the production of derived attributes, entire new records or transformed values for existing attributes Integrate data - methods whereby information is combined from multiple tables or records to create new records or values Format data - formatting transformations refer to primarily syntactic modifications made to the data that do not change its meaning, but might be required by the modeling tool umanitoba.ca/asper CRISP-DM (Phase 4: Modeling) Select the modeling technique (based upon the data mining objective) Build model (Parameter settings) Assess model (rank the models) Various modeling techniques are selected and applied and their parameters are calibrated to optimal values. Some techniques have specific requirements on the form of data. Therefore, stepping back to the data preparation phase is often necessary. umanitoba.ca/asper CRISP-DM (Phase 4: Modeling) Select modeling technique - select the actual modeling technique that is to be used ex) decision tree, neural network - if multiple techniques are applied, perform this task for each techniques separately Generate test design - before actually building a model, generate a procedure or mechanism to test the model’s quality and validity ex) In classification, it is common to use error rates as quality measures for data mining models. Therefore, typically separate the dataset into train and test set, build the model on the train set and estimate its quality on the separate test set umanitoba.ca/asper CRISP-DM (Phase 4: Modeling) Build model - run the modeling tool on the prepared dataset to create one or more models Assess model - interprets the models according to his domain knowledge, the data mining success criteria and the desired test design - judges the success of the application of modeling and discovery techniques more technically - contacts business analysts and domain experts later in order to discuss the data mining results in the business context - only consider models whereas the evaluation phase also takes into account all other results that were produced in the course of the project umanitoba.ca/asper Digression - Machine Learning Computers can learn from data without explicitly programmed. Artificial Neural Networks (ANN)- networks that learn and are capable of performing tasks that are difficult with conventional computers. ▪ Playing chess, recognizing patterns in faces and objects, and filtering spam e-mail Used for poorly structured problems (data is fuzzy and uncertainty is involved). Use patterns instead of the if-then-else rules used by the expert systems Create a model based on input and output Applied in Bankruptcy Prediction, Credit Rating, and Target Marketing, etc. umanitoba.ca/asper CRISP-DM (Phase 5: Evaluation) Evaluation of model - how well it performed on test data Methods and criteria - depend on model type Interpretation of model - important or not, easy or hard depends on algorithm Thoroughly evaluate the model and review the steps executed to construct the model to be certain it properly achieves the business objectives. A key objective is to determine if there is some important business issue that has not been sufficiently considered. At the end of this phase, a decision on the use of the data mining results should be reached umanitoba.ca/asper CRISP-DM (Phase 6: Deployment) Determine how the results need to be utilized Who needs to use them? How often do they need to be used Deploy Data Mining results by Scoring a database, utilizing results as business rules, interactive scoring on-line The knowledge gained will need to be organized and presented in a way that the customer can use it. However, depending on the requirements, the deployment phase can be as simple as generating a report or as complex as implementing a repeatable data mining process across the enterprise. umanitoba.ca/asper CRISP-DM (Phase 6: Deployment) Plan deployment - in order to deploy the data mining result(s) into the business, takes the evaluation results and concludes a strategy for deployment - document the procedure for later deployment Plan monitoring and maintenance - important if the data mining results become part of the day-to-day business and it environment - helps to avoid unnecessarily long periods of incorrect usage of data mining results - needs a detailed on monitoring process - takes into account the specific type of deployment umanitoba.ca/asper CRISP-DM (Phase 6: Deployment) Produce final report - the project leader and his team write up a final report - may be only a summary of the project and its experiences - may be a final and comprehensive presentation of the data mining result(s) Review project - assess what went right and what went wrong, what was done well and what needs to be improved umanitoba.ca/asper Challenges & Limitations Data Quality and Availability Data Privacy (Surveillance Capitalism) Security Model Limitations Ethical Concerns (Bias and Fairness) … umanitoba.ca/asper