Statistical Inference PDF
Document Details
Uploaded by JudiciousDetroit6938
Harvard Business School
2022
Iavor I. Bojinov, Michael Parzen, Paul J. Hamilton
Tags
Related
- Lecture 3: Hypothesis Testing (Part I) PDF
- MÓDULO 2 Probabilidad e Inferencia Estadística PDF
- Business Statistics for Contemporary Decision Making (6th Edition) PDF
- Chapter 1: Statistics, Data, and Statistical Thinking
- Chapter 20 Learning Objectives (LOs) Nonparametric Tests - PDF
- Data Analysis for Marketing Decisions (PDF)
Summary
This document provides detailed explanations and examples on statistical inference, including hypothesis testing and confidence intervals. It gives real-world business examples and uses R code for calculations.
Full Transcript
9 -6 2 2 -0 9 9 REV: JULY 20, 2022 IAVOR I. BOJINOV MICHAEL PARZEN PAUL J. HAMILTON...
9 -6 2 2 -0 9 9 REV: JULY 20, 2022 IAVOR I. BOJINOV MICHAEL PARZEN PAUL J. HAMILTON Statistical Inference Broadly speaking, the field of statistics allows us to make more informed business decisions by providing tools to analyze data and model uncertainty. In the ideal world, we would make decisions by studying a population of interest and calculating summary statistics over all individuals in that population. However, this is often infeasible due to the time and expense of collecting data on the entire population. Instead, statistical inference allows us to draw conclusions based on only a randomly- selected collection of individuals, which is called a sample. Confidence intervals allow us to make estimates about population quantities using only the sample data. If we have specific hypotheses about the population of interest, hypothesis testing allows us to rigorously test the validity of those hypotheses using the sample data.a Samples & Populations In statistics, we generally want to study a population. You can think of a population as an entire collection of persons, things, or objects under study. For example, a population could be all current MBA students in accredited USA universities. To study the larger population, we select a sample. The idea of sampling is to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. Because it takes a lot of time and money to examine an entire population, sampling is a very practical technique. If you wished to compute the overall grade point average at your school, it would make sense to select a sample of students who attend the school. The data collected from the sample would be the students’ grade point averages. In presidential elections, opinion poll samples of 1,000 to 2,000 people are taken. The opinion poll is supposed to represent the views of the people in the entire country. Manufacturers of canned carbonated drinks take samples to determine if the manufactured 16-ounce containers do indeed contain 16 ounces of the drink. From the sample data, we can calculate a statistic. A statistic is a number that is a property of the sample. Common sample statistics include sample means, sample proportions, and sample variances. For example, if we consider one math class to be a sample of the population of all math classes, then a For a browser-based version of this note with integrated R code, see the Statistical Inference chapter under the Inference module of dsm.business. Sections marked with (§) contain additional content not covered in this note. Professor Iavor I. Bojinov, Senior Lecturer Michael Parzen, and Research Associate Paul J. Hamilton prepared this note as the basis for class discussion. Copyright © 2022 President and Fellows of Harvard College. To order copies or request permission to reproduce materials, call 1-800-545-7685, write Harvard Business School Publishing, Boston, MA 02163, or go to www.hbsp.harvard.edu. This publication may not be digitized, photocopied, or otherwise reproduced, posted, or transmitted, without the permission of Harvard Business School. This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. 622-099 Statistical Inference the average number of points earned by students in that one math class at the end of the term is an example of a statistic. The statistic can also be used as an estimate of a population parameter. A parameter is a number that is a property of the population. Since we considered all math classes to be the population, then the average number of points earned per student over all the math classes is an example of a parameter. One of the main concerns in the field of statistics is how accurately a statistic estimates a parameter. The accuracy depends on how well the sample represents the population. The sample must contain the characteristics of the population in order to be a representative sample. As an example, suppose that we wanted to investigate whether smoking during pregnancy leads to lower birth weight of babies. To determine whether smoking during pregnancy leads to lower birth weight of babies, we would compare a random sample of weights of newborn babies whose mothers smoke, with a random sample of weights of newborn babies of non-smoking mothers. By analyzing the sample data, we would hope to be able to draw conclusions about the effects on birth weight of smoking during pregnancy for all babies (i.e. the population). The process of using a random sample to draw conclusions about a population is called statistical inference. If we do not have a random sample, then sampling bias can invalidate our statistical results. For example, the birth weights of twins are generally lower than the weights of babies born alone. If all the non-smoking mothers in the sample were giving birth to twins, and all the smoking mothers were giving birth to single babies, then the conclusions we draw about the effects of smoking in pregnancy will not necessarily be correct as they are affected by sampling bias. Confidence Intervals As described in the previous section, we often want to use sample data to estimate population quantities. Due to the randomness inherent to sampling, an observed sample statistic is almost certainly not equal to the true population parameter. To quantify the variability surrounding the sample statistic, we can compute a confidence interval, which provides a lower and upper bound for where we think the true population value lies. Note that unless we take a sample that consists of the entire population (often called a census), we will never know the true population parameter with absolute certainty. Confidence Intervals for Proportions Suppose that in a survey of one hundred adult cell phone users, 30% switched carriers in the past two years. Based on this sample statistic (which we denote 𝑝̂ ), what can we conclude about the proportion of all adult cell phone users who switched carriers (which we denote p)? Our sample estimate of 𝑝̂ = 30% is based on a random sample of one hundred users and not the entire population, so we cannot conclude that the true population parameter is p = 30%. Instead, we must calculate a confidence interval to understand the range of plausible values for the population proportion p. For proportions, we generally use the following formula to calculate a confidence interval from our sample estimate: 𝑝̂ (1 − 𝑝̂ ) 𝑝̂ ± 𝑧 ∗ √ 𝑛 where 𝑝̂ is our estimate from the sample, 𝑛 is the number of observations in the sample, and 𝑧 ∗ is a constant that determines our desired level of confidence in the interval (we typically use a value of 1.96, which corresponds to a 95% level of confidence). 2 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. Statistical Inference 622-099 Now imagine we wanted to construct a 95% confidence interval based on our sample of one hundred cell phone users. Applying the above formula: 𝑝̂ (1 − 𝑝̂ ) 0.30(0.70) 𝑝̂ ± 𝑧 ∗ √ = 0.30 ± 1.96√ 𝑛 100 = [0.2124, 0.3998] We interpret this confidence interval as follows: “we are 95% confident that the true population proportion of adult cell phone users who switched carriers in the past two years is between 21.24% and 39.98%.” Note that we can adjust our desired level of confidence by changing the value of 𝑧 ∗. For example, imagine we only desired 80% confidence in our interval estimate. In this case, we would apply the same formula but use a 𝑧 ∗ value of 1.28. 𝑝̂ (1 − 𝑝̂ ) 0.30(0.70) 𝑝̂ ± 𝑧 ∗ √ = 0.30 ± 1.28√ 𝑛 100 = [0.2398, 0.3664] We interpret this confidence interval as follows: “we are 80% confident that the true population proportion of adult cell phone users who switched carriers in the past two years is between 23.98% and 36.64%.” In this case, relative to the 95% confidence interval we are slightly less confident that the interval contains the true population proportion, but the interval is narrower. This implies that as we decrease our level of confidence, our interval becomes more precise (and vice versa). The table below shows the appropriate value of 𝑧 ∗ for several common confidence levels. Table 1 Value of 𝑧 ∗ by confidence level Confidence Level 𝒛∗ 80% 1.28 90% 1.645 95% 1.96 98% 2.33 99% 2.58 Source: Casewriter. In R, we can calculate a confidence interval for proportions with the binom.test() command, which uses the following syntax: 3 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. 622-099 Statistical Inference Applying this to our sample of cell phone users, we can run binom.test() with x equal to thirty and n equal to one hundred: For now, all we care about in this output is the line that reads 95 percent confidence interval:. We interpret this output as follows: we are 95% confident that the true population proportion of adult cell phone users who switched carriers in the past two years is between 21.24% and 39.98%. Confidence Interval for Means Now suppose that in our sample of one hundred adult cell phone users, we also asked them to rate their satisfaction with their most recent service provider on a continuous scale from one to ten. In this sample, the average rating was 6.72 with a standard deviation of 1.72. Based on this sample statistic (which we denote 𝑥̅ ), what can we conclude about the average level of satisfaction among the entire population of adult cell phone users (which we denote µ)? Our sample estimate of 𝑥̅ = 6.72 is based on a random sample of one hundred users and not the entire population, so we cannot conclude that the true population parameter is µ = 6.72. Instead, we must calculate a confidence interval to understand the range of plausible values for the population mean µ. When working with means, we use a different formula to calculate the confidence interval: s 𝑥̅ ± 𝑡 ∗ , 𝑑𝑓 = 𝑛 − 1 √𝑛 where 𝑥̅ is our estimate from the sample, 𝑛 is the number of observations in the sample, s is the standard deviation from the sample, df is the degrees of freedom (described below), and 𝑡 ∗ is a constant that determines our desired level of confidence in the interval. 4 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. Statistical Inference 622-099 The standard deviation from the sample (s) is calculated using the following formula: ∑(𝑥𝑖 − 𝑥̅ )2 s = √ 𝑛 − 1 Unlike the 𝑧 ∗ from the previous section, the value 𝑡 ∗ depends on both our desired level of confidence and something called the degrees of freedom of the test. In the case of confidence intervals for means, the degrees of freedom is simply the sample size (n) minus one. To determine the appropriate value of 𝑡 ∗ for our 95% confidence interval, we therefore need to look up the value of 𝑡 ∗ that corresponds to a confidence level of 0.95 with 99 degrees of freedom. This can be done by searching the web for a calculator that returns the appropriate value of 𝑡 ∗ based on our degrees of freedom and confidence level; in this case, the value is approximately 1.98 (also see the table in Appendix A). Note that in practice, one would often use statistical software (such as the R or Python programming languages, SPSS, Excel, etc.) to calculate confidence intervals instead of calculating them by hand. These packages determine the appropriate value of 𝑡 ∗ for you behind the scenes, so you will rarely need to look it up yourself. Now let’s apply the above formulas to calculate a 95% confidence interval (note there may be some error due to rounding): s 1.72 𝑥̅ ± 𝑡 ∗ = 6.72 ± 1.98 √𝑛 √100 = [6.38, 7.06] We interpret this confidence interval as follows: “we are 95% confident that the true population average satisfaction among all adult cell phone users is between 6.38 and 7.06.” Suppose we have a data frame in R called employees with information on 1,000 employees from a software company. Imagine that based on this sample, we would like to estimate the true average salary of all employees at the company. For means, we need to use the t.test() function to calculate the 95% confidence interval: 5 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. 622-099 Statistical Inference Applying this to our sample of employees: As with the output for proportions, we can (for now) ignore everything besides the line that reads 95 percent confidence interval:. We interpret this output as follows: we are 95% confident that the true population average salary at the company is between $153,931.5 and $159,040.5. We could also calculate confidence intervals for specific subgroups. Let’s use the employees data to calculate the salary confidence intervals for men and women separately: Note that the 95% confidence intervals do not overlap, which suggests that on average, women at the company are paid lower than men. To investigate this further, we need a formal method to compare groups statistically. This topic is covered in the next section. 6 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. Statistical Inference 622-099 Hypothesis Testing An important component of inference is hypothesis testing, which allows us to analyze the evidence provided by the sample to assess claims about the population. A one-sample hypothesis test compares a single population parameter to a specified value. For example, you may wonder whether your local barista pours a full twelve ounces of coffee in each cup. By drawing a sample of the cups poured by your barista, you could use a one-sample hypothesis test to answer this question. A two-sample hypothesis test assesses the equality of parameters from two different populations. For example, you may wonder whether the barista near your home and the barista near your work pour similar amounts on average, or if one pours more than the other. By drawing a sample of the cups poured by each barista, you could use a two-sample hypothesis test to answer this question. Formulating Hypotheses Hypothesis testing consists of setting up two hypotheses, called the null hypothesis (𝐻𝑜 ) and the alternative hypothesis (𝐻𝑎 ). It is important to note that these hypotheses must be formulated before any data are observed. In the one-sample case, the null hypothesis specifies a specific value of the population parameter of interest. In the one-sample barista example above, the null hypothesis would be that on average, the barista pours twelve ounces of coffee. In the two-sample case, the null hypothesis specifies that the two parameters of interest are equal. In the two-sample barista example above, the null hypothesis would be that on average, the two baristas pour the same amount of coffee. The alternative hypothesis in the one-sample case expresses how the population parameter may differ from the value specified in the null. In some cases, we simply hope to test whether the population parameter is not equal to the value specified in the null; this is referred to as a two-sided test because we will reject the null hypothesis if the finding from the sample is sufficiently greater than or less than the specified value. In other cases, we hope to test whether the population parameter is greater than (or less than) the value specified in the null; these are referred to as right-sided (and left-sided) tests, respectively. For example, in the one-sample barista example the alternative is that the true average amount poured is not equal to twelve ounces, so we would run a two-sided test. If we wanted to specifically test whether our barista pours more than twelve ounces, we would run a right-sided test. For the two-sample case, a two-sided alternative hypothesis would state that the two population parameters are not equal. The right-sided alternative hypothesis would state that the population one parameter is greater than the population two parameter, and the left-sided alternative hypothesis would state the opposite. In the two-sample two-sided barista example, the alternative is that the two baristas do not pour the same amount, on average. If we wanted to test whether the home barista poured more than the work barista, we would run a right-sided test (or a left-sided test, depending on which barista we arbitrarily assign as 1). The Logic of Hypothesis Testing In a hypothesis test, we start by assuming the null hypothesis is true. We then gather our evidence (data from a sample). Based on the evidence we can draw only one of two inferences: reject the null hypothesis 𝐻𝑜 , or 7 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. 622-099 Statistical Inference fail to reject the null hypothesis 𝐻𝑜 If the data indicate we should “reject 𝐻𝑜 ,” we can say that it is likely that 𝐻𝑎 is true based on the observed data. If instead, the data indicate we should “fail to reject 𝐻𝑜 ”, we conclude that our sample did not provide sufficient evidence to support 𝐻𝑎. Note that based on sample data, we can never accept the null hypothesis. We can only conclude that we have insufficient evidence to reject it. This distinction is subtle but important. The language of hypothesis tests is a bit arcane, so it can be useful to look at a concrete analog. In the U.S. jury system, the defendant is assumed innocent unless proven otherwise. That is, the null hypothesis is that the defendant is innocent. Based on trial evidence, a jury can only reject the null hypothesis that the defendant is innocent (that is, find that the defendant is guilty), or fail to reject the null hypothesis that the defendant is innocent (that is, the defendant is acquitted). The jury cannot conclude that the defendant is innocent (that is, that the null hypothesis is true), only that there is insufficient evidence to demonstrate that the defendant is guilty. Formulating the null and alternative hypotheses is a challenging part of hypothesis testing. One begins by identifying an assertion about a population parameter and then translating the assertion into symbols. We give some examples of this process below. Problem: Suppose that an airline company claims that the average weight of checked baggage is less than 15 pounds. To support the claim, the airline company conducts a random sample of 150 passengers and finds that the average weight of checked baggage is 14.2 pounds, with a standard deviation of 6.5 pounds. Do these data indicate that the average weight of checked baggage is less than 15 pounds? State the null and alternative hypotheses for this problem. Note that 𝜇 is the symbol for the population mean. Solution: The first sentence contains an assertion about the population parameter: “the average weight of checked baggage is less than 15 pounds.” Because this is the assertion we wish to support with evidence, we write the alternative hypothesis (𝐻𝑎 ) as 𝜇 < 15. This then implies that our null (𝐻𝑜 ) is 𝜇 = 15. Problem: Consumer Reports wants to compare the average lifetime for two brands of incandescent light bulbs. Specifically, it would like to test whether there is a difference between the average lifetime of bulbs made by each of the two companies. State the null and alternative hypotheses for this problem. Solution: In symbols, let 𝜇1 represent the average lifetime of bulbs of Company 1 and 𝜇2 represent the average lifetime of bulbs of Company 2. Consumer Reports wonders whether there is evidence to suggest that the mean lifetime is different for the two companies, so the alternative hypothesis (𝐻𝑎 ) would be that 𝜇1 ≠ 𝜇2. Our null (𝐻𝑜 ) is then 𝜇1 = 𝜇2. The P-Value The question remains of how to decide, based on our sample data, whether to reject or fail to reject the null hypothesis. This is done using probability theory by calculating what is called a probability value, or p-value for short. The p-value is always between 0 and 1 and indicates how consistent our 8 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. Statistical Inference 622-099 observed sample is with the given null hypothesis. The higher the p-value, the more consistent our sample is with 𝐻𝑜 ; the lower the p-value, the more consistent our sample is with 𝐻𝑎. (Technically, the p-value tells us, if the null hypothesis is true, what the likelihood is that we would obtain a sample that is “as extreme” as the sample we gathered. Thus, a high p-value indicates that it is quite likely we would obtain a sample as extreme as ours if the null hypothesis were true, and a low p-value indicates that it is unlikely we would obtain a sample like ours if the null hypothesis were true.) We designate a threshold value for a p-value called a significance level, typically denoted 𝛼. The convention is to set 𝛼 = 0.05, but more generally, the choice of 𝛼 depends upon the problem context (e.g., a test comparing a new drug to an existing drug might use a very small 𝛼, whereas a test of a minor change might use a larger 𝛼). Calculating the p-value is an involved mathematical exercise; for our purposes, we will simply read it from the R output. We formally use the p-value to interpret the test results as follows: If p-value ≤ 𝛼 we reject the null hypothesis and say our result is statistically significant. If p-value > 𝛼 we fail to reject the null hypothesis and say our result is not statistically significant. In what sense are we using the word significant? Webster’s Dictionary gives two interpretations of significance: “(1) having or signifying meaning; or (2) important or momentous.” In statistical work, significance does not necessarily imply momentous importance. For us, “significant” at the 𝛼 level has a special meaning. It is the likelihood (or “risk”) that we reject the null hypothesis when it is in fact true. Some P-Value Cautions The American Statistical Association issued an advisory article in 2019 urging caution in how p- values are used.1 In fact, many users of statistics interpret p-values incorrectly. The p-value is not the probability that the null hypothesis is true. That would actually be a very useful value to have, but unfortunately, we usually don't have the ability to find it. The p-value is a conditional probability that says “assuming the null hypothesis is true, how likely is it that we would draw a sample as unusual (i.e., “extreme”) as ours?” It does not say, “given our data, what's the chance our null hypothesis is true”, which is a source of confusion for many people. The safest way to think about a p-value is as a measure of consistency. Given my observed sample data, is it consistent with my null hypothesis view of the world? If not, then I will reject that null view of the world and conclude that the alternative view is likely the correct one. Type I and Type II Errors The point of a hypothesis test is to make the correct decision about 𝐻𝑜. Unfortunately, hypothesis testing is not a simple matter of being right or wrong. Hypothesis testing is based on sample data and probability, so there is always a chance that an error has been made. In fact, there are two primary errors one can make: A Type I error is made if we reject 𝐻𝑜 when in fact 𝐻𝑜 is true. A Type II error is made if we fail to reject 𝐻𝑜 when in fact 𝐻𝑎 is true. 9 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. 622-099 Statistical Inference The hypothesis test is calibrated so that the probability of making a Type I error equals 𝛼. If we choose a significance level (𝛼) of 0.05, this means there is a 5% chance that our hypothesis test will mistakenly reject 𝐻𝑜 , given that 𝐻𝑜 is actually true. The probability of making a Type II error is denoted β. For a fixed sample size, the probability of making a Type I error (𝛼) and the probability of making a Type II error β) are inversely related; as 𝛼 is increased, β is decreased, and vice versa. Therefore, 𝛼 cannot be arbitrarily small, since β likely will then become large. As we can see, the process of hypothesis testing allows you to control the risk of a Type I error because you set the value for 𝛼. However, (ordinarily) you do not have the same control over β, or the probability of failing to reject a null hypothesis that is actually false. For this reason, it is best to avoid making Type II errors. Therefore, rather than “accepting” 𝐻𝑜 when the sample data fail to provide sufficient evidence to overturn 𝐻𝑜 , we instead say we “fail to reject” 𝐻𝑜. Choosing the Appropriate Test The process we have described so far is common to all forms of hypothesis testing. We always start by defining null and alternative hypotheses, then calculate a p-value from those hypotheses using sample data. However, the statistical test we use to calculate the p-value depends on the type of data we are working with. The appropriate test depends on: Whether you are conducting a one-sample, two-sample, or more-than-two-sample test, and Whether you are comparing means or proportions. One-Sample Hypothesis Testing We use a one-sample hypothesis test when we want to compare a population parameter to a specified value. The following are all examples of scenarios where a one-sample hypothesis test would be appropriate: An automobile manufacturer received a shipment of light bulbs from a supplier, and would like to verify that less than 2% of the bulbs are defective. Is the true proportion (p) less than 0.02? We would like to determine whether the majority of the electorate supports the Democratic candidate for president. Does the true proportion (p) exceed 0.50? A food processing plant received a truckload of chickens from a local farmer and needs to verify that the average chicken weighs at least two pounds. Does the true mean (𝜇) exceed 2 lbs? A software company wants to determine whether its users interact with the homepage for at least ten seconds, on average. Does the true mean (𝜇) exceed 10 seconds? The first two examples concerned questions about population proportions (p), whereas the second two questions concern population means (𝜇). Testing Means To illustrate a one-sample test of means, let’s return to our employees data set. Suppose that the HR department of the company is thinking about re-calibrating the employee performance scale, which is currently measured from one to ten. If the scale were calibrated properly the average score would be 10 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. Statistical Inference 622-099 around five, but the team suspects there might be some “rating inflation” occurring. To investigate this, they would like to test whether the average employee Rating is different than five. Under this scenario, the null and alternative hypotheses are: Ho : The true average rating of all employees at the company is five. In mathematical notation, we write this as 𝜇 = 5. 𝐻𝑎 : The true average rating of all employees at the company is not equal to five. In mathematical notation, we write this as 𝜇 ≠ 5. Recall that employees is a data frame with a random sample of 1,000 employees from the company. Using this sample data, we can apply t.test() to calculate the appropriate p-value for the hypothesis test: Our null hypothesis states that μ equals five, so we set the mu parameter equal to five in the function call. This output provides several key pieces of information: The mean employee rating in the sample (𝑥̅ ) is 6.993. The alternative hypothesis for the test (𝐻𝑎 ) is that the “true mean is not equal to 5”. 11 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. 622-099 Statistical Inference The p-value for the test is quite small (less than 0.00000000000000022). Because the p-value is so small, we reject the null hypothesis (𝐻𝑜 ) and conclude it is likely that the average employee rating at the company is not equal to five. Note that we do not say that we “accept the alternative hypothesis (𝐻𝑎 ).” Testing Proportions Now imagine that we would like to determine whether the majority of the electorate in the United States supports the Democratic candidate for president. Under this scenario, the null (𝐻𝑜 ) and alternative (𝐻𝑎 ) hypotheses are: 𝐻𝑜 : The true population proportion of voters who support the Democratic candidate equals fifty percent. In mathematical notation, we write this as 𝑝 = 0.50. 𝐻𝑎 : The true population proportion of voters who support the Democratic candidate exceeds fifty percent. In mathematical notation, we write this as 𝑝 > 0.50. Note that now we are testing the alternative that the value of interest is greater than our null value (0.50), so we need to conduct a right-sided test. To study this question, we randomly poll one thousand U.S. voters, and 540 say that they support the Democratic candidate. To conduct a one-sample hypothesis test of proportions, we use a procedure called the one-sample test of proportions. We can use binom.test() in R to calculate the appropriate p-value from this sample data: Because our alternative hypothesis states that the population proportion is greater than the value specified in the null, we are conducting a right-sided test and must set the alternative parameter equal to "greater". 12 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. Statistical Inference 622-099 This output provides several key pieces of information: The proportion of voters who support the Democratic candidate in the sample (𝑝̂ ) is 0.54. The alternative hypothesis for the test (𝐻𝑎 ) is that the “true probability of success is greater than 0.5”. The p-value for the test is relatively small (0.006222). Because the p-value is so small, we reject the null hypothesis (𝐻𝑜 ) and conclude it is likely that the majority of the electorate prefers the Democratic candidate. Two-Sample Hypothesis Testing Although analyzing one sample of data is useful for problems like gauging public opinion or testing the stability of a manufacturing process, there are more advanced analyses which involve comparing the responses of two or more groups. This can be in the form of comparing means or comparing proportions. Testing Means Many business applications involve a comparison of two population means. For instance, a company may want to know if a new logo produces more sales than the previous logo, or a consumer group may want to test whether two major brands of food freezers differ in the average amount of electricity they use. In this section, we extend our knowledge of hypothesis testing on one population mean to comparing two population means. To use these tests, you need to have a sample from each of the two populations of interest. For the tests to be valid, the samples must be randomly selected. They can be either independent or dependent. This is an important distinction because it determines which statistical method is used and how one controls for sources of variation. Independent samples are selected from each population separately. If we selected a random sample of customers of one domestic gas supplier and a random sample of customers from a rival gas supplier, the samples would be independent. Dependent samples consist of matched or paired values that are inherently related to each other. If we selected a sample of athletes and compared their pulse rates before and after an exercise routine, the samples would be paired, or dependent, because we drew the two samples of observations from the same set of athletes. This allows us to control for the variability between athletes and focus on the pulse rate difference in each individual due to the control condition (i.e., the exercise routine). The choice of independent or dependent samples depends on the context of the test. 13 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. 622-099 Statistical Inference The independent samples t-test is used to compare the means of two independent samples. It can be used to test whether: Biology graduates have a different average annual income than chemistry graduates. Length of life, on average, is shorter for never-married persons than for people who are or have been married. The mean years of schooling of Republicans is different than the mean years of schooling of Democrats. Men average more hours of sleep per night than women. The PE (price to earnings) ratio for tech stocks is on average higher than for financial services stocks. When performing two sample tests of means, the null hypothesis is always that the population means of the two groups are the same. Formally, if we denote 𝜇1 the population mean of group 1 and 𝜇2 the population mean of group 2, our null hypothesis is 𝐻𝑜 : 𝜇1 = 𝜇2. There are three possible alternative hypotheses one can test, as listed in the table below. Table 2 Alternative hypotheses for two-sample t-test Alternative Hypothesis Terminology 𝐻𝑎 : 𝜇1 − 𝜇2 < 0 Left-sided 𝐻𝑎 : 𝜇1 − 𝜇2 > 0 Right-sided 𝐻𝑎 : 𝜇1 − 𝜇2 ≠ 0 Two-sided Source: Casewriter. As an example, the General Social Survey (GSS) has been tracking American attitudes on a wide variety of topics.2 The GSS is one of the most frequently used sources of information about American society. The surveys are now conducted every other year and measure hundreds of variables on thousands of observations. The table below shows the first few rows of a recent GSS data set. Table 3 Example of GSS data Id Employer Hours Income Years Employed 1 2 42 45,000 3 2 2 40 82,500 10 3 2 35 16,250 2 4 1 43 32,500 1 5 2 56 93,000 7 Source: Casewriter. These variables are defined in the table below. 14 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. Statistical Inference 622-099 Table 4 Variable definitions of GSS data Variable Definition Id The unique ID of each observation Employer 1 = Government, 2 = Private Employer Hours Hours worked per week Income Yearly income from primary job Years Employed How many years the employee has been with their current employer Source: Casewriter. To conduct a two-sample hypothesis test of means, we use a procedure called the two-sample t- test. Imagine we wanted to test whether there is sufficient evidence to conclude that people who currently work for the government (group 1) have been with their employer shorter or longer on average than those currently working in the private sector (group 2). Our hypotheses for this test would be: 𝐻𝑜 : On average, government workers have been with their employer for the same amount of time as private sector workers. In mathematical notation, we write this as µ1 = µ2. 𝐻𝑎 : On average, government workers have not been with their employer for the same amount of time as private sector workers. In mathematical notation, we write this as µ1 ≠ µ2. Fortunately, we can use the same t.test() function that we saw in Testing Means to conduct a two-sample test of means. For a two-sample test, the syntax is slightly different. Applying this to the gss data: 15 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. 622-099 Statistical Inference This output provides several key pieces of information: The average years employed of government employees in the sample (𝑥̅1 ) is 11.11, and the average years employed of private-sector employees in the sample (𝑥̅2 ) is 7.90. The alternative hypothesis for the test (𝐻𝑎 ) is that the “true difference in means is not equal to 0” (i.e. 𝜇1 − 𝜇2 ≠ 0). This means we are conducting a two-sided test. The p-value for the test is quite small (0.0002462). In this case, the small p-value indicates that there is strong evidence to reject the null hypothesis of equal means. We may conclude that in 2008, it is very likely that on average, government employees had been at their current jobs for a different length of time than private sector employees. For another example, suppose we want to test if people who work for the government earn more than those who have private-sector jobs. Because we are testing whether the mean in group 1 (government workers) is greater than the mean in group 2 (private-sector workers), our hypotheses are: 𝐻𝑜 : On average, government workers earn the same as those in the private sector. In mathematical notation, we write this as µ1 = µ2. 𝐻𝑎 : On average, government workers earn more than those in the private sector. In mathematical notation, we write this as µ1 > µ2 , or µ1 − µ2 > 0. Because we are conducting a right-sided test, we set the alternative parameter to "greater". This output provides several key pieces of information: The average income of government employees in the sample (𝑥̅1 ) is $44,621.83, and the average income of private-sector employees in the sample (𝑥̅2 ) is $40,847.81. The alternative hypothesis for the test (𝐻𝑎 ) is that the “true difference in means is greater than 0” (i.e. 𝜇1 − 𝜇2 > 0). This means we are conducting a right-sided test. The p-value for the test is just larger than 0.05 (0.06765). In this case, using a threshold of 0.05 we would fail to reject the null hypothesis that the true difference in means is equal to zero. This is because the p-value of the test is greater than our threshold. This means that our sample provided insufficient evidence to show that government workers earned more than private sector workers. 16 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. Statistical Inference 622-099 Testing Proportions Besides comparing two population means, one might be interested in comparing two population proportions. For example, a political candidate might want to estimate the difference in the proportions of voters in two districts who favor her candidacy. In this section, we look at how to do hypothesis testing on proportions from two independent samples. Similar in spirit to the method for comparing two means, we have two population proportions which we denote 𝑝1 and 𝑝2. The null hypothesis is that 𝐻𝑜 : 𝑝1 = 𝑝2. As with tests of population means, there are three alternative hypotheses, shown in the table below. Table 5 Alternative hypotheses for two-sample test of proportions Alternative Hypothesis Terminology 𝐻𝑎 : 𝑝1 − 𝑝2 < 0 Left-sided 𝐻𝑎 : 𝑝1 − 𝑝2 > 0 Right-sided 𝐻𝑎 : 𝑝1 − 𝑝2 ≠ 0 Two-sided Source: Casewriter. As an example, suppose that Professor Parzen and Professor Bojinov were both given a section of entering MBA students for a statistics boot camp before fall classes started. After the boot camp ended, a survey was given to all the participants. Of the 75 who had Professor Bojinov as an instructor, 45 said they were satisfied, whereas 48 of the 90 who had Professor Parzen were satisfied. Is there a statistically significant difference in the percentage of students who were satisfied between the two instructors? To test this, our null and alternative hypotheses would be: 𝐻𝑜 : There is no difference in the proportion of satisfied students in the two classes. In mathematical notation, we write this as 𝑝1 − 𝑝2 = 0. 𝐻𝑎 : There is a difference in the proportion of satisfied students in the two classes. In mathematical notation, we write this as 𝑝1 − 𝑝2 ≠ 0. To conduct a two-sample hypothesis test of proportions, we use a procedure called the two-sample test of proportions. We can use prop.test() in R to calculate the appropriate p-value from this sample data: 17 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. 622-099 Statistical Inference Because the p-value is greater than 0.05, we fail to reject the null hypothesis and cannot conclude there is a difference between the proportion of satisfied students in the two classes. Note that like all the other tests shown in this note, we can conduct a one-sided two-sample test of proportions if our alternative hypothesis indicates the direction of the difference between the two population proportions. 18 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. Statistical Inference 622-099 Appendix A Section Command Explanation One-sample, two-sided t-test where a is the mean t.test(x, mu=a, alt="two.sided") under the null and x is a vector with the sample values One-sample, one-sided t-test where a is the mean t.test(x, mu=a, alt="less") under the null and x is a vector with the sample values Hypothesis One-sample, one-sided t-test where a is the mean t.test(x, mu=a, alt="greater") Testing under the null and x is a vector with the sample values t.test(df$y ~ df$x, alt="two.sided") Two-sample, two-sided t-test t.test(df$y ~ df$x, alt = "less") Two-sample, one-sided t-test t.test(df$y ~ df$x, alt = "greater") Two-sample, one-sided t-test Source: Casewriter. 19 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024. 622-099 Statistical Inference Endnotes 1 Wasserstein, et al., Ronald L. 2019. “Moving to a World Beyond ‘P < 0.05.” The American Statistician. 2 NORC at the University of Chicago, The General Social Survey (GSS), http://gss.norc.org/. 20 This document is authorized for use only in Prof. M P Ram Mohan & Prof. Viswanath Pingali's Senior Management Programme (SMP-BL13) 2024 at Indian Institute of Management - Ahmedabad from Apr 2024 to Oct 2024.