Introduction to Business Analytics Chapter 3 Slides - McGraw Hill PDF

Document Details

EnchantedOnyx3092

Uploaded by EnchantedOnyx3092

2021

Tags

business analytics descriptive statistics inferential statistics data analysis

Summary

This document is a set of slides from the McGraw Hill textbook, Introduction to Business Analytics. Chapter 3 covers basic statistics, including how to analyze data. It also includes topics such as descriptive and inferential statistics, sampling methods and hypothesis testing.

Full Transcript

Because learning changes everything. ® Introduction to Business Analytics © 2021 McGraw Hill. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or further distribution permitted without the prior written consent of McGraw Hill....

Because learning changes everything. ® Introduction to Business Analytics © 2021 McGraw Hill. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or further distribution permitted without the prior written consent of McGraw Hill. © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Analyze the Data: Basic Statistics and Tools Required in Business Analytics Chapter 3 © 2021 McGraw Hill. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or further distribution permitted without the prior written consent of McGraw Hill. © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Because learning changes everything. ® 3.1 3.5 Distinguish between Interpret summary statistics and populations and samples. visualize them with histograms and box plots. 3.2 Describe sampling 3.6 methods, data reduction, Explain confidence intervals and and types of bias in hypothesis testing. business analytics. 3.3 3.7 Understand the basic Understand which test statistic is statistics used in business appropriate for the data. analytics. 3.4 3.8 Use software tools to Understand correlation and create summary statistics. regression analysis. © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Defining Populations and Samples LO 3.1 © 2021 McGraw Hill. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or further distribution permitted without the prior written consent of McGraw Hill. 4 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Population versus Sample  Population: A group with something in common.  Expensive/impossible to get all  Parameter: characteristic of a population  Example: survey all restaurants in the country 5 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Population versus Sample  Sample: A subset of a population.  Representative  Statistic: characteristic of a sample  Used to make inferences (conclusions about the characteristics of a population)  Example: survey of selected restaurants 6 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Descriptive Statistics, Inferential Statistics, and Hypotheses  Descriptive Statistics: Measures that describe a population/sample  Inferential Statistics: Measures calculated only using a sample  Hypothesis: proposed explanation made on the basis of limited evidence as a starting point for further investigation  Use inferential statistics 7 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Progress Check 3.1 Q: What is the difference between a parameter and a statistic? Why might the values differ? 8 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. © 2021 McGraw Hill. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or further distribution permitted without the prior written consent of McGraw Hill. Sampling Methods, Data Reduction, and Bias LO 3.2 9 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. © 2021 McGraw Hill. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or further distribution permitted without the prior written consent of McGraw Hill. Sampling Methods 1. Simple random sampling 2. Stratified random sampling  Includes all groups (strata) 3. Cluster sampling  Select few groups (clusters) 4. Convenience/non-probability sampling 10 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Data Reduction  Process of reducing the size of the data set to a more manageable and suitable size for a business analysis projects.  Focus on the most critical, interesting, or abnormal items  Speeds up analysis and may reduce analysis cost  Common method: Filtering 11 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Filtering in Excel (Exhibits 3.1 & 3.2) 12 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Bias in Business Analytics  Prejudice in favor of or against a thing, person, or group.  Intentional versus Unintentional  During data collection, analysis, results  Types: 1. Nonresponse 2. Selection 3. Confirmation 4. Outlier 13 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Progress Check 3.2 Q. How can you minimize bias in your research? 14 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. © 2021 McGraw Hill. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or further distribution permitted without the prior written consent of McGraw Hill. Understanding Basic Statistics LO 3.3 15 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. © 2021 McGraw Hill. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or further distribution permitted without the prior written consent of McGraw Hill. Probability Distributions  Random Variable: quantifies the outcomes of random occurrences  Data Distribution: shows all possible values for a variable and how often they (could) occur  Probability Distribution: a statistical function that describes:  the possible values in a population and  the likelihood that any given observation (random variable) can take a given range or value 16 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Probability Distributions EXHIBIT 3.3 Example of a Probability Distribution 17 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Types of Numerical Data Continuous Data Discrete Data  Whole-number (integer  Any numerical value, only) not just whole numbers  Finite set of values  Infinite set of values between any two between any two observations observations  Examples: inventory,  Examples: height, vehicles, manufacturing weight, currency plants 18 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Measures of Central Tendency Describe the center point of a data set.  Mean: average = Sum/n  Median: midpoint of the data distribution  Mode: most common observation in a data set  Kurtosis: distribution shape (i.e., data central or in tails)  Symmetry: Mean = Median = Mode 19 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Skewness EXHIBIT 3.5 A Symmetrical Data Distribution EXHIBIT 3.6 Right-Skewed and Left-Skewed Data Distributions 20 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Measures of Dispersion Describe how dispersed the data set is.  Range: Maximum - Minimum  Interquartile Range: 4 quartiles  Variance: average of squared deviations from the mean  Standard Deviation:  square root of the variance  same unit as data values 21 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Continuous Probability Distributions The Normal Distribution EXHIBIT 3.7 The Normal Distribution 22 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Standard Normal Distribution  Special case of the normal distribution  Theoretical distribution  Used for comparisons  Calculate probabilities of individual observations  Mean = Median = Mode = 0  Standard deviation = 1  Z-score  number of standard deviations a data point is from the  )/standard deviation mean 23 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Continuous Probability Distributions The Uniform Distribution EXHIBIT 3.8 Example Uniform Distribution 24 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Progress Check 3.3 Q: If Data Set A has a higher standard deviation than Data Set B, what does that tell you? 25 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. © 2021 McGraw Hill. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or further distribution permitted without the prior written consent of McGraw Hill. Using Software Tools to Create Summary Statistics LO 3.4 26 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. © 2021 McGraw Hill. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or further distribution permitted without the prior written consent of McGraw Hill. Calculating Summary Statistics - EXHIBIT 3.9 Calculating Summary Statistics with Excel, Power BI, and Tableau 27 © 2021 McGraw Hill. All rights reserved. Authorized only for ©instructor use McGraw Hill in All LLC. therights classroom. reserved. NoNo reproduction reproduction or further or distribution distribution without permitted the prior written consent ofwithout theLLC. McGraw Hill prior written consent of McGraw Hill. Excel Data Analysis ToolPak- Descriptive Statistics Data > Analysis > Data Analysis EXHIBIT 3.12 Selecting Descriptive Statistics from the Excel Data EXHIBIT 3.13 Descriptive Statistics from the Excel Data Analysis Analysis ToolPak ToolPak 28 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Tableau Summary Statistics- Descriptive Statistics EXHIBIT 3.17 Generating Summary Statistics In Tableau 29 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Progress Check 3.4  Q. Which tool (Excel, Power BI, and/or Tableau) can create all of the statistics discussed? 30 © 2021 McGraw Hill. All rights reserved. Authorized only for ©instructor McGraw Hill useLLC. in All therights reserved. NoNo classroom. reproduction or distribution reproduction without or further the prior written distribution consent ofwithout permitted McGraw Hill theLLC. prior written consent of McGraw Hill. Interpreting and Visualizing Statistics LO 3.5 31 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. © 2021 McGraw Hill. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or further distribution permitted without the prior written consent of McGraw Hill. Frequency Distribution  Numerical data  Bins, classes, and intervals  Categories in numerical data  Table that uses bins or categories to list the frequency of various outcomes in a sample. 32 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Histogram Visual representation of frequency distribution EXHIBIT 3.19 Using Histograms to Visualize Data 33 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Box Plot EXHIBIT 3.20 Using Box Plots to Visualize Data 34 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Progress Check 3.5  Q. How is a box plot different from a histogram? 35 © 2021 McGraw Hill. All rights reserved. Authorized only for ©instructor McGraw Hill useLLC. in All therights reserved. NoNo classroom. reproduction or distribution reproduction without or further the prior written distribution consent ofwithout permitted McGraw Hill theLLC. prior written consent of McGraw Hill. Hypothesis Testing LO 3.6 36 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. © 2021 McGraw Hill. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or further distribution permitted without the prior written consent of McGraw Hill. Confidence Interval Point Estimate vs Confidence Interval  Point Estimate: single value calculated from the sample used to estimate population parameter  Difficult to be accurate  Confidence Interval: a range of numbers around the point estimate at a certain level of confidence  Level of confidence: probability that the true value of the population parameter falls within a certain range 37 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Confidence Interval Confidence Interval = Point estimate Margin of Error Lower bound < population parameter < upper bound  Function of desired confidence level plus standard error  Error reflects inability to capture true population parameter 38 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Hypothesis Testing  Hypothesis: proposed explanation  Hypothesis Test: used to determine if there are statistically significant differences between groups  Significant, Not Random (or by chance)  Two-tailed: different?  One-tailed: direction of difference? 39 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Hypothesis Testing Steps 1. Determine Hypotheses 2. Set the Statistical Significance (Alpha, α) 3. Calculate the Test Statistic 4. Reject or Fail to Reject the Null Hypothesis Using p-value 40 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Step 1: Determine Hypotheses  Null Hypothesis (H0)  Base case being tested  No relationship  Not Reject/Reject  Alternative Hypothesis (HA)  Tested against H0  Expected relationship  Assumed to be true if H0 is rejected 41 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Step 1: Determine Hypotheses  Two-Tailed Hypothesis Example  H0: Sales on Saturday are the same as sales on Sunday.  HA : Sales on Saturday are not the same as sales on Sunday.  One-Tailed Hypothesis Example  H0: Sales on Saturday are less than or equal to sales on Sunday.  HA : Sales on Saturday are greater than sales on Sunday. 42 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Step 2: Significance Level How do we come to the correct conclusion?  Significance Level (alpha, α)  How strong does the evidence need to be to reject H0?  Equal to the probability of a Type I error EXHIBIT 3.22 Type I and Type II Errors 43 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Step 3: Calculate the Test Statistic  Use Excel to calculate (appropriate) t- test  Data > Analysis > Data Analysis > t-test EXHIBIT 3.24 Example of Output from a t-Test (from Excel Data Analysis ToolPak) 44 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Step 4: Reject or Fail to Reject the Null Hypothesis Using the p-Value  If p-value > alpha: Fail to reject the H0  Groups not significantly different  If p-value ≤ alpha: Reject the H0  Groups are significantly different 45 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Step 4: Reject or Fail to Reject the Null Hypothesis Using the p-Value Example  α = 0.05  t Stat = 5.16  p–value = 2.23 E-06  p-value ≤ alpha: Reject the H0 EXHIBIT 3.24 Example of Output from a t-Test (from Excel Data Analysis ToolPak) 46 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Progress Check 3.6  Q. Which hypothesis statement, the null or the alternative, is the base case? Which reflects the analysis’s belief? 47 © 2021 McGraw Hill. All rights reserved. Authorized only for ©instructor McGraw Hill useLLC. in All therights reserved. NoNo classroom. reproduction or distribution reproduction without or further the prior written distribution consent ofwithout permitted McGraw Hill theLLC. prior written consent of McGraw Hill. t-Tests, ANOVA Tests, and Chi- Square Tests LO 3.7 48 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. © 2021 McGraw Hill. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or further distribution permitted without the prior written consent of McGraw Hill. t-Tests Which test is appropriate when comparing two groups?  Independent/Unrelated Groups  Called Two-Sample in Excel  Related Groups:  Each observation in one group can be paired with and observation in another group  Pre-test versus Post-test scores  Called Paired t–test in Excel 49 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Other Types of Tests  ANOVA (Analysis of Variance) Test  Numerical data  Compares the means across three or more groups  Example: Are sales for three customer segments/cohorts different?  Chi-Square Test  Categorical data  Tests for the difference between observed and expected categorical data (expected distribution shape)  Example: Is the number of customers evenly distributed across the days of the week? 50 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Progress Check 3.7  Q. A retail store operates in four different cities. As an analyst, you want to determine if there is a significant difference in the average sales during the past quarter across these four stores. Which hypothesis test should you use, and why? 51 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Introduction to Correlation and Regression Analysis LO 3.8 52 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. © 2021 McGraw Hill. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or further distribution permitted without the prior written consent of McGraw Hill. Correlation  Measures the linear association, or the relationship, between two variables by examining how the variables change with respect to each other.  Variable: measurement that changes over time or between individuals/subjects  Excel: Data > Analysis > Data Analysis > Correlation  Calculates a correlation coefficient 53 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Correlation Coefficient  Correlation Coefficient = 0  no relationship Exhibit 3.33 Scatterplots to Illustrate Correlation 54 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Correlation Coefficient  Correlation Coefficient = -1  perfectly negatively correlated Exhibit 3.33 Scatterplots to Illustrate Correlation 55 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Correlation Coefficient  Correlation Coefficient = 1  perfectly positively correlated Completion Rate Exhibit 3.33 Scatterplots to Illustrate Correlation 56 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. (Linear) Regression Analysis y = mx + b  y = dependent variable you are predicting  x = independent variable help explains  m = slope of the line, steepness and direction  b = intercept, where line starts at Y-axis  Measures the relationship between one output variable (y) and one or more input variables (x)  Predict dependent variable 57 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. (Linear) Regression Analysis  Simple: one independent variable (x)  Multiple: more than one independent variable (x)  Line of best fit: regression model that best expresses the relationship between data points 58 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. (Linear) Regression Analysis y = 0.001x + (−0.574) Exhibit 3.37 Regression Analysis: Line of Best Fit 59 © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Progress Check 3.8  Q. Which hypothesis statement, the null or the alternative, is the base case? Which reflects the analysis’s belief? 60 © 2021 McGraw Hill. All rights reserved. Authorized only for ©instructor McGraw Hill useLLC. in All therights reserved. NoNo classroom. reproduction or distribution reproduction without or further the prior written distribution consent ofwithout permitted McGraw Hill theLLC. prior written consent of McGraw Hill. Because learning changes everything. ® Labs Associated with Chapter 3 Lab # Lab Name 3.1 Using Excel Functions to Calculate Descriptive Statistics to Gain Insights About the Distribution of a Sales Data Set 3.2 Excel/Tableau: Calculating Descriptive Statistics to Gain Insights About the Distribution of a Sales Data Set 3.3 Excel: Performing a t-test for Difference in Means to Determine If the Differences Between In-Person and Online Sales Are Statistically Significant 3.4 Excel: Performing an ANOVA Test for Difference in Means to Determine If There Are Significant Differences Between the Average 4- Year Degree Completion Rate/SAT Average for Public, Private, and For- Profit Colleges 3.5 Excel: Deriving Cost Drivers for Activity-Based Costing (Regression Analysis) © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC. Because learning changes everything. ® © McGraw Hill LLC. All rights reserved. No reproduction or distribution without the prior written consent of McGraw Hill LLC.