Business Intelligence, Analytics, and Data Science: A Managerial Perspective - PDF
Document Details
Uploaded by PrivilegedReal2707
Ramesh Sharda, Dursun Delen, Efraim Turban
Tags
Related
- Fundamentals of Data Analytics PDF
- Business Intelligence, Analytics, & Data Science: A Managerial Perspective (PDF)
- IT3080 Data Science & Analytics Lecture 01 PDF
- Data Management Course - MSc Data Analytics 2024-2025 Bordeaux PDF
- Introduction to Data Analysis Presentation PDF
- Business Intelligence, Analytics, and Data Science: A Managerial Perspective PDF
Summary
This PDF document is a chapter from a business intelligence textbook. It covers descriptive analytics, introducing various data types and the importance of data quality in business analytics. The text emphasizes the significance of data preprocessing and visualization.
Full Transcript
Business Intelligence, Analytics, and Data Science: A Managerial Perspective Fourth Edition Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization...
Business Intelligence, Analytics, and Data Science: A Managerial Perspective Fourth Edition Chapter 2 Descriptive Analytics I: Nature of Data, Statistical Modeling, and Visualization Copyright © 2018 Pearson Education Ltd. The Nature of Data Data: a collection of facts – usually obtained as the result of experiences, observations, or experiments Data may consist of numbers, words, images, … Data is the lowest level of concept (from which information and knowledge are derived) Data is the source for information and knowledge Data quality and data integrity critical to analytics Data integrity: Accuracy, completeness, consistency, and validity of an organization's data Slide 2-2 Copyright © 2018 Pearson Education Ltd. The Nature of Data Slide 2-3 Copyright © 2018 Pearson Education Ltd. Metrics for Analytics Ready Data Data source reliability “do we have the right confidence and belief in this data source?” Data content accuracy “Do we have the right Data for the job/task?” Data accessibility “can we easily get to data when we need to?” Data security and data privacy “is data secured and only used by people who are authorized to use it?” Data richness “comprehensiveness/ complete or near complete” Data consistency “collected and combined accurately” Data currency/data timeliness “up-to-date, recent/new” Data validity “match between the actual & expected data” and data relevancy “variables in the data are relevant to the study” Data granularity “variables & data defined at the lowest level of detail” Slide 2-4 Copyright © 2018 Pearson Education Ltd. A Simple Taxonomy of Data Data (datum—singular form of data): facts Structured data. Data is in a standardized format, has a well-defined structure, complies to a data model, follows a persistent order, and is easily accessed. Examples of structured data include names, dates, addresses, credit card numbers, stock information, geolocation, and more. Structured data is highly organized and easily understood by machine language Unstructured data. Any combination of textual, imagery, voice, and web content. Semi-structured data? – Extensible Markup language XML, Hyper Text Markup Language HTML, JavaScript Object Notation JSON, Log files, etc. Slide 2-5 Copyright © 2018 Pearson Education Ltd. Slide 1-6 Copyright © 2018 Pearson Education Ltd. A Simple Taxonomy of Data Structured data. I. Categorical. Categorical variables represent types of data which may be divided into groups. Examples of categorical variables are race, sex, age group, and educational level. II. Nominal. Derived from Latin nomenclature “Nomen” (meaning name) Nominal data are used to label variables without any quantitative value. Examples include male/female, hair color, nationalities, names of people, and Marital status as 1. single, 2 married, 3 divorced. III. Ordinal. The key with ordinal data is to remember that ordinal sounds like order - and it's the order of the variables which matters. Not so much the differences between those values. Examples include Likert Scale. V Likely, likely, neutral, unlikely, V, Unlikely. Slide 2-7 Copyright © 2018 Pearson Education Ltd. A Simple Taxonomy of Data 2- Numerical. Numerical data represent values that can be measured and put into a logical order. Examples of numerical data are height, weight, age, number of movies watched, IQ, etc. I. it's concerned with both the order and difference Interval. between your variables. Examples include the classification of people into a teenager, youth, middle age, 50 and above, 50 to 59,….etc. II. Ratio. Ratiodata tells us about the order of variables, the differences between them. Examples include Income, height, weight, annual sales, market share, product defect rates, time to repurchase, unemployment rate, and crime rate. Slide 2-8 Copyright © 2018 Pearson Education Ltd. A Simple Taxonomy of Data Slide 2-9 Copyright © 2018 Pearson Education Ltd. The Art and Science of Data Preprocessing The real-world data is dirty, misaligned, overly complex, and inaccurate – Not ready for analytics! Readying the data for analytics is needed – Data preprocessing Data consolidation Data cleaning Data transformation Data reduction Art – it develops and improves with experience Slide 2-10 Copyright © 2018 Pearson Education Ltd. The Art and Science of Data Preprocessing Data reduction 1. Variables – Dimensional reduction – Variable selection 2. Cases/samples – Sampling – Balancing / stratification Slide 2-11 Copyright © 2018 Pearson Education Ltd. Dimensionality reduction Dimensionality reduction is a process and technique to reduce the number of dimensions -- or features -- in a data set. The goal of dimensionality reduction is to decrease the data set's complexity by reducing the number of features while keeping the most important properties of the original data. For example, imagine a dataset consisting of large, high-resolution images, each made up of millions of pixels. By applying a dimensionality reduction technique, you can reduce the number of features (pixels) into a smaller set of new features that capture the most important visual information Slide 1-12 Copyright © 2018 Pearson Education Ltd. Balancing / stratification Slide 1-13 Copyright © 2018 Pearson Education Ltd. Discretization Data Discretization is the process of converting continuous data into a set of discrete intervals or categories. This technique can be used for data reduction, simplification, or to make the data more suitable for analysis and it typically applied to very large datasets. For example, converting age into age groups (e.g., 0-20, 21-40, etc.) or income into income brackets (e.g., low, medium, high) are common forms of discretization used to simplify analysis or model building processes. Slide 1-14 Copyright © 2018 Pearson Education Ltd. Data Normalization & Creating Attributes When you normalize a data set, you are reorganizing it to remove any unstructured data or redundant data to enable a superior, more logical means of storing that data. The main goal of data normalization is to achieve a standardized data format across your entire system. Create attribute definitions to store more information about the objects in the module. For example, you might create a priority attribute so that you can set a priority level for each object in the module. Slide 1-15 Copyright © 2018 Pearson Education Ltd. Statistical Modeling for Business Analytics Slide 2-16 Copyright © 2018 Pearson Education Ltd. Statistical Modeling for Business Analytics Statistics – A collection of mathematical techniques to characterize and interpret data Descriptive Statistics – Describing the data (as it is) Inferential statistics – Drawing inferences about the population based on sample data Descriptive statistics for descriptive analytics Slide 2-17 Copyright © 2018 Pearson Education Ltd. Descriptive Statistics Measures of Centrality Tendency Arithmetic mean Slide 2-18 Copyright © 2018 Pearson Education Ltd. Descriptive Statistics Measures of Centrality Tendency Median – The number in the middle Mode – The most frequent observation Slide 2-19 Copyright © 2018 Pearson Education Ltd. Descriptive Statistics Measures of Dispersion Dispersion – Degree of variation in a given variable. “Variability, Scatter, Spread” Range – Max - Min Variance Standard Deviation Mean Absolute Deviation (MAD) – Average absolute deviation from the mean Slide 2-20 Copyright © 2018 Pearson Education Ltd. Descriptive Statistics Shape of a Distribution Histogram – frequency chart Skewness – Measure of asymmetry Slide 2-21 Copyright © 2018 Pearson Education Ltd. Descriptive Statistics Shape of a Distribution Kurtosis – Nature of the distribution Slide 2-22 Copyright © 2018 Pearson Education Ltd. Regression Modeling for Inferential Statistics Regression – A part of inferential statistics – The most widely known and used analytics technique in statistics – Used to characterize relationship between explanatory (input) and response (output) variable It can be used for – Hypothesis testing (explanation) – Forecasting (prediction) Slide 2-23 Copyright © 2018 Pearson Education Ltd. Business Reporting Definitions and Concepts Report = Information Decision Report? – Any communication artifact prepared to convey specific information A report can fulfill many functions – To ensure proper departmental functioning – To provide information – To provide the results of an analysis – To persuade others to act – To create an organizational memory… Slide 2-24 Copyright © 2018 Pearson Education Ltd. Slide 1- 25 What is a Business Report? A written document that contains information regarding business matters. Purpose: to improve managerial decisions Source: data from inside and outside the organization (via the use of ETL) Format: text + tables + graphs/charts Distribution: in-print, email, portal/intranet Data acquisition Information generation Decision making Process management Slide 2-25 Copyright © 2018 Pearson Education Ltd. Slide 1- 26 Business Reporting Business Functions UOB 1.0 X UOB 2.1 X UOB 3.0 Data UOB 2.2 Transactional Records Exception Event Symbol Count Description Action Machine 1 Failure (decision) DEPLOYMENT CHART PHASE 1 PHASE 2 PHASE 3 PHASE 4 PHASE 5 DEPT 1 DEPT 2 DEPT 3 Data DEPT 4 4 5 2 3 1 Repositories Decision Information Maker (reporting) Slide 2-26 Copyright © 2018 Pearson Education Ltd. Slide Types of Business Reports 1- 27 Metric Management Reports – Help manage business performance through metrics (Service Level Agreements SLAs for externals; Key Performance Indicators KPIs for internals) – Can be used as part of Six Sigma and/or TQM Dashboard-Type Reports – Graphical presentation of several performance indicators in a single page using dials/gauges Balanced Scorecard–Type Reports Balanced scorecard (BSC) is defined as a management system that provides feedback on both internal business processes and external outcomes to continuously improve strategic – Include financial, customer, business process, and learning & growth indicators Slide 2-27 Copyright © 2018 Pearson Education Ltd. Source: https://asq.org/quality-resources/balanced-scorecard Copyright © 2018 Pearson Education Ltd. Slide 1- 29 Data Visualization “The use of visual representations to explore, make sense of, and communicate data.” Information = aggregation, summarization, and contextualization of data Related to information graphics, scientific visualization, and statistical graphics Often includes charts, graphs, illustrations, … Slide 2-29 Copyright © 2018 Pearson Education Ltd. Slide # of total Copyright © 2018 Pearson Education Ltd. Miderm 30 Marks 25 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Slide # of total Copyright © 2018 Pearson Education Ltd. Miderm 30 Marks 25 20 15 10 5 Slide # of total Copyright © 2018 Pearson Education Ltd. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Slide # of total Copyright © 2018 Pearson Education Ltd. Slide 1- 34 A Brief History of Data Visualization Data visualization can date back to the second century AD.” Anno Domini” Most developments have occurred in the last two and a half centuries Until recently it was not recognized as a discipline Today’s most popular visual forms date back a few centuries Slide 2-34 Copyright © 2018 Pearson Education Ltd. Slide 1- 35 The Emergence of Data Visualization and Visual Analytics Emergence of new companies – Tableau, Spotfire, QlikView, … Increased focus by the big players – MicroStrategy improved Visual Insight – Systems, Applications and Products for data processing SAP launched Visual Intelligence – Statistical Analysis System SAS launched Visual Analytics – Microsoft bolstered PowerPivot with Power View – International Business Machines IBM launched Cognos Insight – Oracle acquired Endeca Slide 2-35 Copyright © 2018 Pearson Education Ltd. Slide 1- 36 Visual Analytics A recently coined term – Information visualization + predictive analytics Information visualization – Descriptive, backward focused – “what happened” “what is happening” Predictive analytics – Predictive, future focused – “what will happen” “why will it happen” There is a strong move toward visual analytics Slide 2-36 Copyright © 2018 Pearson Education Ltd. Performance Dashboards Performance dashboards are commonly used in BPM software suites and BI platforms Dashboards provide visual displays of important information that is consolidated and arranged on a single screen so that information can be digested at a single glance and easily drilled in and further explored Slide 2-37 Copyright © 2018 Pearson Education Ltd. Performance Dashboards Slide 2-38 Copyright © 2018 Pearson Education Ltd. Performance Dashboards Dashboard design – The fundamental challenge of dashboard design is to display all the required information on a single screen, clearly and without distraction, in a manner that can be assimilated quickly Three layer of information – Monitoring – Analysis – Management Slide 2-39 Copyright © 2018 Pearson Education Ltd. Performance Dashboards What to look for in a dashboard – Use of visual components to highlight data and exceptions that require action – Transparent to the user, meaning that they require minimal training and are extremely easy to use – Combine data from a variety of systems into a single, summarized, unified view of the business – Enable drill-down or drill-through to underlying data sources or reports – Present a dynamic, real-world view with timely data – Require little coding to implement, deploy, and maintain Slide 2-40 Copyright © 2018 Pearson Education Ltd. Slide 1- 41 Best Practices in Dashboard Design Benchmark KPIs with Industry Standards Wrap the Metrics with Contextual Metadata Validate the Design by a Usability Specialist Prioritize and Rank Alerts and Exceptions Enrich Dashboard with Business-User Comments Present Information in Three Different Levels Pick the Right Visual Constructs Provide for Guided Analytics Slide 2-41 Copyright © 2018 Pearson Education Ltd.