Statistics: Hypothesis Testing with Means and Proportions (PDF)

Book Title: eTextbook: Statistics: A Tool for Social Research and Data Analysis, Fifth Canadian Edition Chapter 10. Hypothesis Testing with Means and Proportions: The One- Sample Case Chapter 10. Hypothesis Testing with Means and Proportions: The One-Sample Case Learning Objectives By the end of this chapter, you will be able to 1. Explain the logic of hypothesis testing as applied to the one- sample case 2. Identify and cite examples of situations in which one-sample tests of hypotheses are appropriate 3. Test the significance of single-sample means and proportions using the five-step model and correctly interpret the results 4. Explain the difference between one- and two-tailed tests and specify when each is appropriate 5. Conduct a single-sample hypothesis test using a confidence interval Book Title: eTextbook: Statistics: A Tool for Social Research and Data Analysis, Fifth Canadian Edition Chapter 10. Hypothesis Testing with Means and Proportions: The One- Sample Case 10.1. Introduction 10.1. Introduction 327 Chapter 7 introduced the techniques of hypothesis testing (significance testing) for nominal and ordinal bivariate relationships. Chapters 8 and 9 then showed how to use measures of association to determine the strength, and for ordinal variables also the direction, of these relationships. This chapter introduces the techniques for hypothesis testing of a single sample mean (when you have an interval-ratio variable) or sample proportion (nominal or ordinal variable). It also lays the groundwork for Chapters 11 and 12, in which we look at how to test a hypothesis about a relationship between a nominal or ordinal independent variable and an interval-ratio dependent variable. Single-sample hypothesis tests can used in situations such as the following: 1. A researcher has selected a sample of 789 older adults who live in a particular province. The researcher also has information on the percentage of the entire population of the province that was victimized by crime during the past year. Are older adults, as represented by this sample, more or less likely to be victimized than the population in general? 2. Are the GPAs of university student athletes different from the GPAs of the student body as a whole? To investigate, the academic records of a random sample of 105 student athletes from a large university are compared with the overall GPA of all students. 3. The Law School Admission Test (LSAT) is a standardized test required for admission to most Canadian and American law schools. The LSAT assesses reading and verbal reasoning skills and, along with GPA, is considered a critical factor in determining admission to law school. Companies offer an LSAT preparation (i.e., training) course, for a fee, with the claim that graduates of their course on average obtain higher scores on the LSAT than the general population of LSAT test takers. To test this claim, you randomly sample 100 graduates of LSAT training courses and find that, on average, those in the sample have higher LSAT scores than the population of LSAT writers as a whole. Do training- course graduates score higher than LSAT writers in general? In each of these situations, we have randomly selected samples (of older adults, student athletes, or graduates of an LSAT training course) that we want to compare to a population (the entire province, student body, or community of LSAT writers). As with all situations in which we use hypothesis testing, we are not interested in the sample per se but in the larger group from which it was selected (all older adults in the province, all student athletes on this campus, or all graduates of an LSAT training course). Specifically, we want to know if the groups represented by the samples are different from the populations on a specific trait or variable (victimization rates, GPAs, or LSAT scores). 328 Book Title: eTextbook: Statistics: A Tool for Social Research and Data Analysis, Fifth Canadian Edition Chapter 10. Hypothesis Testing with Means and Proportions: The One- Sample Case 10.2. Hypothesis Testing with the One-Sample Case 10.2. Hypothesis Testing with the One- Sample Case Let us use our third research situation above as an example of hypothesis testing in the one-sample case. The main question here is, “Do graduates of an LSAT preparation course have an average LSAT score that is higher than others who write the LSAT?” In other words, the researcher wants to compare the LSAT scores of all graduates of an LSAT training course with the LSAT scores of all test takers (the entire population of LSAT test takers). If they had complete information for both of these groups (all graduates of an LSAT preparation course and all LSAT writers), they could answer the question easily and completely. The problem is that the researcher does not have the time and/or money to gather information on the thousands of people who have graduated from an LSAT preparation course. So instead, the researcher draws a random sample, following the rule of EPSEM, of 100 graduates from records provided by the companies offering an LSAT training course. Information on LSAT scores for the sample of LSAT preparation-course graduates and the entire population of LSAT test takers in 2019–2020 is as follows: Entire Population of Sample of Graduates from LSAT Preparation LSAT Test Takers Course (fictitious data) ¯ μ = 152 X = 156 σ = 10 n = 100 Information from the Law School Admission Council, the organization that administers the LSAT, shows that the population of test takers has a mean LSAT score of 152. At 156, the average LSAT score for the sample is higher than the average score for the entire population. (To put this information in context, there are about 100 multiple choice questions on the LSAT. One point is given for each question answered correctly. Total test scores are then converted to an LSAT score ranging from 120, the lowest possible score, to 180, the highest possible score.) Although it is tempting, we cannot draw any conclusions yet because we are working with a random sample of the population we are interested in, not the population itself (all graduates of an LSAT preparation course). Figure 10.1 should clarify these relationships. The entire population of LSAT test takers is symbolized by the largest circle because it is the largest group. The population of graduates of an LSAT training course is also symbolized by a large circle because it is a sizable group, although only a fraction of the population of LSAT writers as a whole. The random sample of 100 graduates of an LSAT training course, the smallest of the three groups, is symbolized by the smallest circle. 329 Figure 10.1 A Test of Hypothesis for Single Sample Means The arrows between the circles show how they are connected in this research situation. The researcher wants to know if the average LSAT score of all graduates of a preparation course is the same as or different from the average LSAT score of the entire population of LSAT writers. Instead of comparing all graduates (a group that is too large to gather information on) with the entire population of LSAT writers, the researcher compares the LSAT scores of a random sample of graduates with those of the entire population. We observe that the mean of the sample is higher than the mean of the population (156 vs. 152). This suggests that graduates of a preparation course do better on the LSAT. However, the graduates are represented by a random sample, and we know that even the most carefully chosen sample may, on rare occasion, be unrepresentative. Does the difference between the sample mean and the population mean reflect a real difference between all graduates of the preparation course and the entire population of LSAT writers? Or was this difference caused by mere random chance? This is the question that a test of hypothesis is designed to answer. In other words, there are two possible explanations for the difference, and we will consider them one at a time. The first explanation, which we will label explanation A, is that the difference between the population mean of 152 and the sample mean of 156 reflects a real difference in LSAT scores between all graduates and the entire population. The difference is statistically significant in the sense that it is very unlikely to have occurred by random chance alone. If explanation A is true, then the population of all graduates of the preparation course is different from the entire 330 population of LSAT writers. The sample did not come from a population with a mean LSAT score of 152. The second explanation, or explanation B, is that the observed difference between sample and population means was caused by mere random chance. There is no important difference between graduates and the population of LSAT writers as a whole, and the difference between the sample mean of 156 and the population mean of 152 is trivial and due to random chance. If explanation B is true, the population of LSAT preparation course graduates is just like everyone else and has a mean LSAT score of 152. Which explanation is correct? As long as we are working with a sample rather than the entire group, we cannot know the answer to this question for sure. However, we can set up a decision-making procedure so conservative that one of the two explanations can be chosen, with the knowledge that the probability of choosing the incorrect explanation is very low. This decision-making process, in broad outline, begins with the assumption that explanation B is correct. Symbolically, the assumption that the mean LSAT score for all graduates of a preparation course is the same as the mean LSAT score for the population of LSAT writers as a whole can be stated as μ = 152 Remember that this μ refers to the mean for all LSAT preparation course graduates, not just the 100 in the sample. This assumption, μ = 152 , can be tested statistically. If explanation B (the population of graduates of the preparation course is not different from the population of LSAT writers as a whole and has a μ of 152) is true, then the probability of getting the observed sample outcome (X = 156) can be found. Let us add an objective decision rule in advance. If ¯ the probability of getting the observed difference is less than 0.05 (5 out of 100, or 1 in 20), we will reject explanation B. If, on the other hand, this explanation were true, a difference of this size (152 vs. 156) would be a very rare event, and in hypothesis testing, as we saw in Chapter 7, we always bet against rare events. How can we estimate the probability of the observed sample outcome (X = 156) if explanation B is correct? This value can be determined by using ¯ our knowledge of the sampling distribution of all possible sample outcomes. Looking back at the information we have and applying the Central Limit Theorem (see Chapter 5), we can assume that the sampling distribution is normal in shape, has a mean of 152 (because μX = μ ), and has a standard ¯ deviation of 10/√100 (because σX = σ/√n ). We also know that the ¯ standard normal distribution can be interpreted as a distribution of probabilities (see Chapter 4) and that the particular sample outcome noted above (X = 156) is one of thousands of possible sample outcomes. The ¯ sampling distribution, with the sample outcome noted, is depicted in Figure 10.2. Figure 10.2 The Sampling Distribution of All Possible Sample Means Using our knowledge of the standardized normal (Z) distribution, we can add further useful information to this sampling distribution of sample means. 331 Specifically, with Z scores, we can depict the decision rule stated previously: any sample outcome with probability less than 0.05 (assuming that explanation B is true) will cause us to reject explanation B. The probability of 0.05 can be translated into an area and divided equally into the upper and lower tails of the sampling distribution. Using Appendix A, we find that the Z score equivalent of this area is ±1.96. The areas and Z scores are depicted in Figure 10.3. Figure 10.3 The Sampling Distribution of All Possible Sample Means, with Rejection Areas Shaded The decision rule can now be rephrased. Any sample outcome falling in the shaded areas depicted in Figure 10.3 by definition has a probability of occurrence of less than 0.05. Such an outcome would be a rare event and would cause us to reject explanation B. All that remains is to translate our sample outcome to a Z score so that we can see where it falls on the curve. To do this, we use the standard formula for locating any particular raw score under a normal distribution. When we use 332 known or empirical distributions, this formula (as described in Chapter 4) is expressed as ¯ Xi − X Z= s Or, to find the equivalent Z score for any raw score, subtract the mean of the distribution from the raw score and divide by the standard deviation of the distribution. Because we are now concerned with the sampling distribution of all sample means rather than an empirical distribution, the symbols in the formula will change, but the form remains the same: Formula 10.1 ¯ X −μ Z(obtained) = σ/√n So, to find the equivalent Z score for any sample mean, subtract the mean of the sampling distribution, which is equal to the population mean or μ , from the sample mean and divide by the standard deviation of the sampling distribution. (Recall from Chapter 5 that the standard deviation of the sampling distribution of sample means—called the standard error of the mean—is equal to the population standard deviation divided by the square root of n.) Continuing with the LSAT example, we can now find the Z score equivalent of the sample mean as 156 − 152 4 Z= = = +4.00 10/√100 1 In Figure 10.4, this Z score of 14.00 is noted on the distribution of all possible sample means, and we see that the sample outcome does fall in the shaded area. If explanation B is true, this particular sample outcome has a probability of occurrence of less than 0.05. The sample outcome ( X = 156 or ¯ Z = +4.00 ) would therefore be rare if explanation B were true, and the researcher may reject explanation B. If explanation B were true, this sample outcome would be extremely unlikely. The sample of 100 graduates of an LSAT preparation course comes from a population that is significantly different from the population of LSAT writers as a whole on LSAT scoring. Or, to put it another way, the sample does not come from a population that has a mean LSAT score of 152. Figure 10.4 The Sampling Distribution of Sample Means with the Sample Outcome (X = 156) Noted in Z Scores, ¯ with Rejection Areas Shaded Remember that our decisions in significance testing are based on information gathered from random samples. On rare occasions, a sample may not be representative of the population from which it was selected. The decision- making process outlined above has a very high probability of resulting in correct decisions, but as long as we must work with samples rather than populations, we face an element of risk. That is, the decision to reject explanation B might be incorrect if this sample happens to be one of the few that is unrepresentative of the population of graduates of the LSAT training 333 course. Fortunately, as we have already seen (in Chapter 7), one important strength of hypothesis testing is that the probability of making an incorrect decision can be estimated. In the example at hand, explanation B was rejected and the probability of this decision being incorrect is 0.05—the decision rule established at the beginning of the process. To say that the probability of rejecting explanation B incorrectly is 0.05 means that, if we repeated this same test an infinite number of times, we would incorrectly reject explanation B only 5 times out of every 100. We are now ready to put our LSAT example into the five-step model of hypothesis testing, introduced in Chapter 7, as follows: Step 1. Make Assumptions and Meet Test Requirements. Three criteria have to be satisfied when conducting a test of hypothesis with a single sample mean. First, all tests of hypothesis must be based on a random sample that has been selected according to the rules of EPSEM. Second, to justify computation of a mean, the variable being tested must be interval-ratio in level of measurement. Finally, we must assume that the sampling distribution of all possible sample means is normal in shape so that we may use the standardized normal (Z) distribution to find areas under the sampling distribution. We can be sure that this assumption is satisfied by either using a large sample—and applying the Central Limit Theorem—or assuming that the population is normally distributed. Model: Random sampling Level of measurement is interval-ratio Sampling distribution is normal Step 2. State the Null Hypothesis. The null hypothesis is represented by explanation B in our LSAT example and is always a statement of “no difference.” Thus, in the single-sample case, the null hypothesis states that the sample comes from a population with a certain characteristic. In our example, the null hypothesis is that the population of LSAT preparation course graduates is “no different” from the population of LSAT writers as a whole, that their average LSAT score is also 152, and that the difference between 152 and the sample mean of 156 is caused by random chance. Symbolically, the null hypothesis is stated as H0: μ = 152 where μ refers to the mean of the population of graduates of an LSAT preparation course. As we saw in Chapter 7, the null hypothesis is always the central element in any test of hypothesis because the entire process is aimed at rejecting or failing to reject the H0. By contrast, in our example, the research hypothesis (H1) simply asserts that the population the sample was selected from does not have a certain characteristic or, in terms of our example, has a mean that is not equal to a specific value: (H1: μ ≠ 152) where ≠ means “not equal to” Symbolically, this statement asserts that the sample does not come from a population with a mean of 152, or that the population of graduates of an LSAT preparation course is different from the population of LSAT writers as a whole. Even though the research hypothesis has no formal standing or role in the hypothesis-testing process, it will help us to choose between one-tailed and two-tailed tests, as we shall see in Section 10.3. 334 Step 3. Select the Sampling Distribution and Establish the Critical Region. By assuming that the null hypothesis is true, we can attach values to the mean and standard deviation of the sampling distribution and thus measure the probability of any specific sample outcome. With the large sample size (n = 100) used in our LSAT example, we can use the sampling distribution described by the standard normal (Z) curve as summarized in Appendix A. Recall that the critical region (or region of rejection) consists of the areas under the sampling distribution that include unlikely sample outcomes, and we must define, prior to the test of hypothesis, what we mean by “unlikely.” The critical region allows us to specify in advance those sample outcomes that are so unlikely that they will lead us to reject the H0. In our example, this area corresponds to a Z score of ±1.96 , called Z (critical), which was graphically displayed in Figure 10.4. The shaded area is the critical region. Any sample outcome for which the Z score equivalent falls in this area (i.e., below −1.96 or above +1.96 ) causes us to reject the null hypothesis. This gives us an alpha level (i.e., the size of the critical region) of 0.05. In abbreviated form, all of the decisions made in this step are noted below. The critical region is noted by the Z scores that mark its beginnings: Sampling distribution = Z distribution α = 0.05 Z (critical) = ±1.96 (For practice in finding Z (critical) scores, see Problem 10.1a.) 335 Step 4. Compute the Test Statistic. To evaluate the probability of any given sample outcome, we must convert the sample value to a Z score. Solving the equation for Z score equivalents gives us the test statistic, referred to as Z (obtained) in order to differentiate the test statistic from the critical region. In our example, we found a Z (obtained) of +4.00. (For practice in computing obtained Z scores for means, see Problems 10.2, 10.4, and 10.6.) Step 5. Make a Decision and Interpret the Results of the Test. We are now ready to compare the test (obtained) statistic with the critical statistic in order to determine whether or not the test statistic falls in the critical region. Recall that if the test statistic does fall in the critical region, our decision is to reject the null hypothesis. If the test statistic does not fall in the critical region, we fail to reject the null hypothesis. In our example, the two values are Z (critical) = ±1.96 Z (obtained) = +4.00 Because Z (obtained) falls in the critical region (see Figure 10.4), our decision is to reject the null hypothesis that there is no difference between LSAT preparation course graduates and LSAT writers as a whole. When we reject this null hypothesis, we are saying that graduates do not have a mean LSAT score of 152 and that there is a difference between them and the population of LSAT writers as a whole. In other words, the difference between the sample mean of 156 and the mean of 152 for the entire population of LSAT writers is statistically significant or unlikely to be caused by random chance alone. panitanphoto/Shutterstock.com Research indicates that graduates of LSAT preparation courses have higher scores on the LSAT exam than the average tester. 336 One Step at a Time Completing Step 4 of the Five-Step Model: Compute Z (Obtained) Use these procedures if the population standard deviation (σ) is known and the sample size is large ( n = 100 or more) or the population is normally distributed; otherwise, see Section 10.4, “The Student’s t Distribution and the One-Sample Case.” To Compute the Test Statistic Using Formula 10.1 1: Find the square root of n. 2: Divide the population standard deviation by the value you found in step 1. 3: Subtract the population mean (μ) from the sample mean (X ). ¯ 4: Divide the value you found in step 3 by the value you found in step 2. This value is Z (obtained). Book Title: eTextbook: Statistics: A Tool for Social Research and Data Analysis, Fifth Canadian Edition Chapter 10. Hypothesis Testing with Means and Proportions: The One- Sample Case 10.3. One-Tailed and Two-Tailed Tests of Hypothesis 10.3. One-Tailed and Two-Tailed Tests of Hypothesis The five-step model for hypothesis testing is fairly rigid, and the researcher has little room for making choices. Nonetheless, the researcher must still make another crucial decision when conducting a significance test in the one-sample case. Specifically, they must decide between a one-tailed and a two-tailed test. This decision concerns the placement of alpha in the left tail only, the right tail only, or divided equally between the left and right tails. This decision did not arise when we examined the chi square test because the right-skewed nature of the chi square distribution necessarily placed all the alpha in the right tail only. When we are using the symmetrical normal distribution (or the Student’s t distribution), we must also address this new consideration. The choice between a one- and a two-tailed test is based on the researcher’s expectations about the population from which the sample was selected. These expectations are reflected in the research hypothesis (H1) , which is contradictory to the null hypothesis and usually states what the researcher believes to be “the truth.” In most situations, the researcher will wish to support the research hypothesis by rejecting the null hypothesis. 337 One Step at a Time Completing Step 5 of the Five-Step Model: Make a Decision and Interpret the Results of the Test 1: Compare Z (obtained) to Z (critical). If Z (obtained) is in the critical region, reject the null hypothesis. If Z (obtained) is not in the critical region, fail to reject the null hypothesis. 2: Interpret your decision in terms of the original question. For example, our conclusion for the example problem is “Graduates of an LSAT preparation course do significantly better on the LSAT test than LSAT writers as a whole.” The format for the research hypothesis can take either of two forms, depending on the relationship between what the null hypothesis states and what the researcher believes to be the truth. The null hypothesis states that the population has a specific characteristic. In the example that has served us so far in this chapter, the null hypothesis states that the “population of graduates of an LSAT preparation course has the same mean LSAT score (152) as the entire population of LSAT writers.” The researcher might believe that the population of graduates actually scores lower on the LSAT (the population mean is less than the value stated in the null hypothesis), or higher on the LSAT (the population mean is greater than the value stated in the null hypothesis), or they might be unsure about the direction of the difference. If the researcher is unsure about the direction, the research hypothesis states only that the population mean is not equal to the value stated in the null hypothesis. The research hypothesis stated in Section 10.2 (μ ≠ 152) is in this format. This is called a two-tailed test of significance because it means that the researcher is equally concerned with the possibility that the true population value is greater than the value specified in the null hypothesis and the possibility that the true population value is less than the value specified in the null hypothesis. In other situations, the researcher might be concerned only with differences in a specific direction. If the direction of the difference can be predicted, or if the researcher is concerned only with differences in one direction, a one-tailed test can be used. A one-tailed test may take one of two forms, depending on the researcher’s expectations about the direction of the difference. If the researcher believes that the true population value is greater than the value specified in the null hypothesis, the research hypothesis will reflect that belief. In our example, if we predict that graduates of an LSAT preparation course have higher LSAT scores than the entire population of LSAT writers (or an average LSAT score greater than 152), our research hypothesis is as follows: (H1: μ > 152) where > signifies “greater than” In this situation, the critical region is defined to include all values less than or equal to 152, thus giving us the following revised null hypothesis: (H0: μ ≤ 152) where ≤ signifies “less than or equal to” Notice that the null hypothesis must always include an equal sign, even when a directional symbol is also included. This is one of the features that distinguishes it from the research hypothesis, whose possible directional signs include only < , > , or ≠ , but never =. 338 On the other hand, if we predict that graduates have lower LSAT scores than the entire population of LSAT writers (or an average LSAT score less than 152), our research hypothesis is as follows: (H1: μ < 152) where < signifies “less than” In this situation, the critical region is defined to include all values greater than or equal to 152, thus giving us the following revised null hypothesis: (H0: μ ≥ 152) where ≥ signifies “greater than or equal to” One-tailed tests are often appropriate when programs designed to solve a problem or improve a situation are being evaluated. If an LSAT training course results in lower LSAT scoring, for example, the course will be considered a failure. In a situation like this, the researcher may well focus only on outcomes that indicate that the program is a success (i.e., when graduates of an LSAT preparation course have higher LSAT scores) and conduct a one-tailed test with a research hypothesis in the form H1: μ > 152. As another example, consider the evaluation of a program designed to increase youth employment. The evaluators would be concerned only with outcomes that show an increase in the youth employment rate. If the rate shows no change or if youth employment decreases, the program is a failure. Thus, the evaluators could legitimately use a one-tailed test that stated that youth employment rates for graduates of the program are greater than (>) rates of employment among all youth. In terms of the five-step model, the choice of a one-tailed or a two-tailed test determines what we do with the critical region under the sampling distribution in Step 3. This decision shows how important it is to state the null and research hypotheses carefully and precisely! In a two-tailed test, we split the critical region equally into the upper and lower tails of the sampling distribution. In a one-tailed test, we place the entire critical region in one tail of the sampling distribution. If we believe that the population characteristic is greater than the value stated in the null hypothesis (if the H1 includes the > symbol), we place the entire critical region in the upper tail. If we believe that the characteristic is less than the value stated in the null hypothesis (if the H1 includes the < symbol), the entire critical region goes in the lower tail. For example, in a two-tailed test with alpha equal to 0.05, the critical region begins at Z (critical) = ±1.96. In a one-tailed test at the same alpha level, the Z (critical) is +1.65 if the upper tail is specified and −1.65 if the lower tail is specified. Table 10.1 summarizes the procedures to follow in terms of the nature of the research hypothesis. The difference in placing the critical region is graphically summarized in Figure 10.5, and the critical Z scores for the most common alpha levels are given in Table 10.2 for both one- and two-tailed tests. 339 340 Table 10.1 One-Tailed versus Two-Tailed Tests, α = 0.05 If the Research The Test Is And Is Z (critical) Hypothesis Uses Concerned with ≠ Two-tailed Both tails ±1.96 > One-tailed Upper tail +1.65 < One-tailed Lower tail −1.65 Figure 10.5 Establishing the Critical Region, One-Tailed Tests versus Two-Tailed Tests, with Rejection Region for α = 0.05 Shaded Table 10.2 Finding Critical Z Scores for One- and Two- Tailed Tests (Single Sample Means) One-Tailed Value Alpha Two-Tailed Value Upper Tail Lower Tail 0.10 ±1.65 +1.29 −1.29 0.05 ±1.96 +1.65 −1.65 0.01 ±2.58 +2.33 −2.33 0.001 ±3.29 +3.10 −3.10 Note that the critical Z values for one-tailed tests are always closer to the mean of the sampling distribution. Thus, a one-tailed test is more likely to reject the H0 without changing the alpha level (assuming that we have specified the correct tail). One-tailed tests should be used whenever (1) the direction of the difference can be confidently predicted, or (2) the researcher is concerned only with differences in one tail of the sampling distribution. An example will clarify how the one-tailed test is used. The Community Well- Being (CWB) index is tool that was developed to measure socio-economic well- being in Canadian communities. The index combines several key indicators of socioeconomic well-being (income, education, housing, and labour force activity) from Canadian Census data into a composite number ranging from 0 to 100, with higher scores indicating greater community well-being. Looking at First Nations communities, for example, the average CWB score has increased steadily over the last few decades. In 1981, the average score for all First Nations communities was 45.0 points. By 2016, the average score (N = 623) was 58.4, with a standard deviation of 10.3 points. Based on this trajectory, it can be assumed that socioeconomic development of First Nations communities has continued to improve since 2016. To test this assumption, let’s imagine that we selected a random sample of 100 First Nation communities in 2021 and found an average CWB score of 63.1: All First Nations Sample of First Nations ¯ μ = 58.4 X = 63.1 σ = 10.3 n = 100 We will use the five-step model to test the H0 of no difference between the level of socioeconomic well-being, as measured by the CWB, in First Nations communities in 2021 and in 2016. Step 1. Make Assumptions and Meet Test Requirements. Because we are using a mean to summarize the sample outcome, we must have interval-ratio-level data. It is common practice to treat a composite index, such as the CWB, as an interval-ratio variable. With a sample size of 100, the Central Limit Theorem applies, and we can assume that the sampling distribution is normal in shape. Model: Random sampling Level of measurement is interval-ratio. Sampling distribution is normal 341 Step 2. State the Null Hypothesis. The null hypothesis states that there is no difference in the average CWB score for First Nations communities in 2021 when compared to 2016. The research hypothesis (H1) is also stated at this point. Because we have predicted a direction for the difference (average CWB score for First Nations communities has improved since 2016), a one-tailed test is justified. The two hypotheses may be stated as H0: μ ≤ 58.4 (H1: μ > 58.4) Step 3. Select the Sampling Distribution and Establish the Critical Region. We will use the standardized normal distribution (see Appendix A) to find areas under the sampling distribution. If alpha is set at 0.05, the critical region begins at the Z score of +1.65. That is, we have predicted that CWB scores for First Nations communities, on average, have increased in 2021 in comparison to 2016 and that this sample comes from a population that has a mean greater than 58.4, so we are concerned only with sample outcomes in the upper tail of the sampling distribution. If the average CWB score for First Nations communities in 2021 is the same as the average CWB score for First Nations communities in 2016 (if the H0 is true), or the average CWB score in 2021 is less than the average CWB score in 2016 (and comes from a population with a mean less than 58.4), the theory is disproved. These decisions may be summarized as follows: Sampling distribution = Z distribution α = 0.05 Z (critical) = +1.65 Step 4. Compute the Test Statistic. ¯ X −μ 63.1 − 58.4 Z (obtained) = = σ/√n 10.3/√100 4.7 = = +4.56 1.03 Step 5. Make a Decision and Interpret the Results of the Test. Comparing the Z (obtained) with the Z (critical): Z (critical) = +1.65 Z (obtained) = +4.56 342 We see that the test statistic falls in the critical region. This outcome is depicted graphically in Figure 10.6. We reject the null hypothesis because, if the H0 were true, a difference of this size would be very unlikely. There is a significant difference between the average CWB score for First Nations communities in 2021 and in 2016. Because the null hypothesis has been rejected, the research hypothesis (average CWB score for First Nations communities has improved between 2016 and 2021) is supported. Figure 10.6 Z (obtained) versus Z (critical) for the One- Tailed Test, with Rejection Region for α = 0.05 Shaded Now that we have completed a one-tailed example, let’s consider the implications of one-tailed and two-tailed decisions for the risk of a Type I error (i.e., rejecting a true null hypothesis) and a Type II error (i.e., retaining a false null hypothesis). Which decision about the placement of alpha—one-tailed or two-tailed—do you think most reduces the risk of a Type I error, the type of error that scientists usually prefer to minimize? If you said “two-tailed,” you are correct! Two-tailed tests are sometimes regarded as more conservative because the division of alpha equally between the two tails increases the magnitude of the critical statistic. This means that you will need a test statistic with an even larger magnitude than you would have needed if you had placed all the alpha in one tail only. Compare, for the sake of illustration, the following alpha levels and values for Z (critical) for two-tailed tests. As you may recall, this information was also presented in Table 6.1. 343 If Alpha Equals The Two-Tailed Critical Region Begins at Z (critical) Equal to 0.10 ±1.65 0.05 ±1.96 0.01 ±2.58 0.001 ±3.29 For instance, with an obtained Z score of +4.56 for the previous example, we rejected the null hypothesis that the average CWB score for First Nations communities is the same in 2021 as in 2016 (or that the level of community well-being in 2021 is less than the level in 2016) in a one-tailed test at the 0.05 alpha level, where the critical region begins at +1.65. Clearly, the obtained Z score of +4.56 is quite a bit larger than the critical Z score of +1.65. If we had conducted a two-tailed test instead, also with a 5 0.05, the critical Z score would have been ±1.96. We would still have been able to reject the null hypothesis, since +4.56 is larger than +1.96 ; however, the difference between the critical and obtained values of Z is smaller. Suppose we had had an obtained Z score of +1.75 instead. In this case, we would only have been able to reject the null hypothesis in the one- tailed case, since +1.75 is larger than +1.65 , but we would not have been able to reject it if the test had been two-tailed, since +1.75 is not larger than +1.96. The choice of one-tailed or two-tailed can thus make a difference in our decision to reject the null hypothesis, and it can likewise affect our risk of making a Type I error. (For practice in dealing with tests of significance for means that may call for one-tailed tests, see Problems 10.2b, 10.3, 10.6, 10.8, and 10.17.) Book Title: eTextbook: Statistics: A Tool for Social Research and Data Analysis, Fifth Canadian Edition Chapter 10. Hypothesis Testing with Means and Proportions: The One- Sample Case 10.4. The Student’s t Distribution and the One-Sample Case 10.4. The Student’s t Distribution and the One-Sample Case To this point, we have only considered hypothesis tests involving single sample means where the value of the population standard deviation (σ) is known. Needless to say, the value of σ is unknown in most research situations. When σ is unknown (and the sample size is large, with 100 or more cases, or the population from which the sample is taken is normally distributed), we can use the sample standard deviation (s) with the Student’s t distribution to find areas under the sampling distribution and establish the critical region. As we saw in Chapter 6, the shape of the t distribution varies as a function of sample size or, more specifically, as a function of degrees of freedom (df). Degrees of freedom are equal to n − 1 in the case of a single sample mean, just as when we constructed a confidence interval. The t distribution is summarized in Appendix B. To find a t (critical) score, we follow three steps. First, compute the degrees of freedom. Second, choose between a one-tailed and a two-tailed test. If the test is one-tailed, use the top row, labelled “Level of Significance for One-Tailed Test”; if the test is two-tailed, use the bottom row, labelled “Level of Significance for Two-Tailed Test.” Third, select the desired alpha level. The entries in the table are the t scores, marking 344 the beginnings of the critical regions. It is important to remember, as discussed in Chapter 6, that Appendix B presents a limited number of t (critical) scores for degrees of freedom over 30. For instance, if the degrees of freedom for a specific problem are 165 and alpha equals 0.05, two-tailed, we have a choice between a t score of ±1.980 (df = 120) and a t score of ±1.960 (df = ∞, infinity). In situations such as these, take the larger table value as t (±1.980). This will make it slightly more difficult to reject the null hypothesis and hence is a more conservative course of action. In terms of the five-step model, the changes required by using t scores occur mostly in Steps 3 and 4. In Step 3, the sampling distribution is the t distribution, and degrees of freedom must be computed before locating the critical region. In Step 4, a slightly different formula for computing the test statistic, t (obtained), is used. Compared with the formula for Z (obtained), s replaces σ and n − 1 replaces n. (Recall from Chapter 6 that n − 1 , rather than n, is used to correct for the bias in estimating σ with s.) Specifically, we have the following formula: Formula 10.2 ¯ X −μ t (obtained) = s/√n − 1 To demonstrate the use of the t test, let’s work through the following problem. If you are a large business owner in British Columbia that aims for high productivity, you may have to consider the average amount of time your employees spend getting to work. If their commute is longer than average for the province, you might be sacrificing valuable time by not letting them work from home more often. But what is the average commute for your workers? Is it different from the average in British Columbia? To find out, you gather a random sample of 30 of your employees and find that it takes them an average of 28.6 minutes to commute to work with a standard deviation of 3.7 minutes. Using data from Statistics Canada, you find that the average commute time for all British Columbians is 25.9 minutes. You also assume that commute times are normally distributed in the population (all commuters in British Columbia). Obviously, your workers have a different commute time, but we need to use the five-step-model hypothesis test to see if we can be 99% sure that it is really different. In this case, the population standard deviation is unknown, and we will use the t distribution. Sample data are organized in the table below. 345 All Commuters in BC Employees at Business ¯ μ = 25.9 (= μX ) ¯ X = 28.6 σ=? s = 3.7 n = 30 Step 1. Make Assumptions and Meet Test Requirements. Model: Random sampling Level of measurement is interval-ratio Sampling distribution is normal ( n ≥ 100 or normally distributed population) Step 2. State the Null Hypothesis. The null hypothesis states that the average commute time for your employees is the same as for all of British Columbia. In symbols: H0: μ = 25.9 Remember, the μ above refers to the population mean. Our research hypothesis is that it is significantly different from the overall population of commuters in British Columbia, but note that it does not specify a direction; it only asks if your employees’ commute time is “different from” (not higher or lower than) the provincial average. This suggests a two-tailed test: (H1: μ ≠ 25.9) Step 3. Select the Sampling Distribution and Establish the Critical Region. Because σ is unknown, we use the t distribution (see Appendix B) to find the critical region. We set alpha at 0.01: Sampling distribution = t distribution α = 0.01, two-tailed test df = (n − 1) = 29 t (critical) = ±2.756 Step 4. Compute the Test Statistic. ¯ X −μ 28.6 − 25.9 t (obtained) = = s/√n − 1 3.7/√29 2.70 = = +3.91 0.69 Step 5. Make a Decision and Interpret the Results of the Test. With alpha set at 0.01, the critical region begins at t (critical) = ±2.756. With an obtained t score of 3.91, the null hypothesis is rejected. The difference between the commuting time for your employees and all of those in the province of British Columbia is statistically significant. The difference is so large that we may conclude that it did not occur by random chance. The decision to reject the null hypothesis has a 0.01 probability of being wrong. The test statistic and critical regions are displayed in Figure 10.7. Figure 10.7 Sampling Distribution Showing t (obtained) versus t (critical) for the Two-Tailed Test, with Rejection Region for α = 0.01 (df = 29) Shaded 346 To summarize, when testing single sample means, we must make a choice regarding the theoretical distribution we will use to establish the critical region. The choice is straightforward. If the population standard deviation (σ) is known and the sample size is large ( n = 100 or more cases) or the population the sample is taken from is normally distributed, the Z distribution (Appendix A) is used. If σ is unknown, the t distribution (Appendix B) is used. These decisions are summarized in Table 10.3. (For practice in using the t distribution in a test of hypothesis, see Problems 10.3, 10.5, 10.7 to 10.10, 10.15 e and f, and 10.17. ) Table 10.3 Choosing a Sampling Distribution When Testing Single Sample Means for Significance If the Population Standard Deviation (σ) is Sampling Distribution known and n ≥ 100 or population normally Z distribution distributed unknown and n ≥ 100 or population normally t distribution distributed One Step at a Time Completing Step 4 of the Five-Step Model: Compute t (obtained) Follow these procedures when using the Student’s t distribution when the population standard deviation is unknown and the sample size is large or the population is normally distributed. To Compute the Test Statistic Using Formula 10.2 1: Find the square root of n−1. 2: Divide the sample standard deviation (s) by the value you found in step 1. 3: Subtract the population mean (μ) from the sample mean (X ). ¯ 4: Divide the value you found in step 3 by the value you found in step 2. This value is t (obtained). 347 Applying Statistics 10.1. Testing a Sample Mean for Significance Despite many characteristics favourable to labour market success, such as better than average education and health, studies show that recent immigrants to Canada (those who have been in Canada for less than 10 years) face many obstacles and challenges in the labour market. We know that, based on data from the 2016 Canadian Census, the mean employment (i.e., paid employment and self-employment) income of the Canadian population of full-time workers was $59,824. A random sample (n = 23,771) from these data also shows that recent immigrants employed full-time had an average employment income of $44,437 with a standard deviation of $55,044 in 2016. Is employment income of recent immigrants significantly different from that of the population of full-time workers as a whole? We will use the five-step model to organize the decision-making process. Step 1. Make Assumptions and Meet Test Requirements. Model: Random sampling Level of measurement is interval-ratio Sampling distribution is normal From the information given (this is a large sample with n > 100 , and income is an interval-ratio variable), we can conclude that the model assumptions and test requirements are satisfied. Step 2. State the Null Hypothesis. The null hypothesis says that the average income of all recent immigrants to Canada is equal to the national average. In symbols: H0: μ = 59,824 The question does not specify a direction; it only asks whether the incomes of recent immigrants are “different from” (not higher or lower than) the national average. This suggests a two-tailed test: H1: μ ≠ 59,824 Step 3. Select the Sampling Distribution and Establish the Critical Region. Sampling distribution = t distribution df = n − 1 = 23,770 α = 0.05, two-tailed test t (critical) = ±1.980 Step 4. Compute the Test Statistic. The necessary information for conducting a test of the null hypothesis is Recent Immigrants Nation ¯ X = 44,437 μ = 59,824 s = 55,044 n = 23,771 The test statistic, t (obtained), is ¯ X −μ 44,437 − 59,824 t (obtained) = = s/√n − 1 55,044/√23,771 − 1 −15,387 = 357.03 = −43.10 Step 5. Make a Decision and Interpret the Results of the Test. With alpha set at 0.05 (two-tailed test), the critical region begins at t (critical) = ±1.980. With an obtained t score of −43.10 , the null hypothesis is rejected. This means that the difference between the employment incomes of recent immigrants to Canada and the employment incomes of Canadians as a whole is statistically significant. The difference is so large that we may conclude that it did not occur by random chance. The decision to reject the null hypothesis has a 0.05 probability of being wrong. Source: Data from Statistics Canada. 2016 Census of Population. 348 One Step at a Time Completing Step 5 of the Five-Step Model: Make a Decision and Interpret the Results of the Test 1: Compare the t (obtained) to the t (critical). If the t (obtained) is in the critical region, reject the null hypothesis. If the t (obtained) is not in the critical region, fail to reject the null hypothesis. 2: Interpret your decision in terms of the original question. For example, our conclusion for the example problem used in this section is, “There is no significant difference between the incomes of recent immigrants and the national average.” Book Title: eTextbook: Statistics: A Tool for Social Research and Data Analysis, Fifth Canadian Edition Chapter 10. Hypothesis Testing with Means and Proportions: The One- Sample Case 10.5. Tests of Hypotheses for Single Sample Proportions (Large Samples) 10.5. Tests of Hypotheses for Single Sample Proportions (Large Samples) In many cases, the sample variables we are interested in are not measured in a way that justifies the assumption of the interval-ratio level of measurement. One alternative in this situation is to use a sample proportion (Ps) rather than a sample mean as the test statistic. As we shall see, the overall procedures for testing single sample proportions are the same as those for testing means. The central question is still, “Does the population the sample was drawn from have a certain characteristic?” We still conduct the test based on the assumption that the null hypothesis is true, and we still evaluate the probability of the obtained sample outcome against a sampling distribution of all possible sample outcomes. Our decision at the end of the test is also the same. If the obtained test statistic falls in the critical region (i.e., is unlikely, given the assumption that the H0 is true), we reject the H0. Having stressed the continuity in procedures and logic, we must point out the important differences as well. These differences are best related in terms of the five-step model for hypothesis testing. In Step 1, when working with sample proportions, we assume that the variable is measured at the nominal or ordinal level of measurement. In Step 2, the symbols used to state the null hypothesis are different even though the null hypothesis is still a statement of “no difference,” “greater than or equal to,” or “less than or equal to.” In Step 3, we use only the standardized normal curve (the Z distribution) to find areas under the sampling distribution and locate the critical region. This is appropriate as long as the sample size is large (we do not consider small- sample hypothesis tests for proportions in this textbook). Recall from Chapter 5 that, unlike the sampling distribution of sample means, what is meant by a large sample is more involved than n ≥ 100. The sample size is considered large, and the sampling distribution of proportions is approximately normal, if 349 both nPμ and n(1 − Pμ) are at least 15, where Pμ is the value in the null hypothesis. In Step 4, computing the test statistic, the form of the formula remains the same. That is, the test statistic, Z (obtained), equals the sample statistic minus the mean of the sampling distribution, divided by the standard deviation of the sampling distribution. However, the symbols change because we are basing the tests on sample proportions. The formula can be stated as Formula 10.3 Ps − Pu Z (obtained) = √Pu(1 − Pu)/n Step 5 is exactly the same as before. If the test statistic, Z (obtained), falls in the critical region, as marked by Z (critical), reject the H0 ; if Z (obtained) does not fall in the critical region, then fail to reject the H0. An example should clarify these procedures. A random sample of 122 households in a low-income neighbourhood reveals that 53 (or a proportion of 0.43) of the households are headed by females. In the city as a whole, the proportion of female-headed households is 0.39. Are households in the low- income neighbourhood significantly different from the city as a whole in terms of this characteristic? For this example, let us use the 90% level of confidence. Step 1. Make Assumptions and Meet Test Requirements. Model: Random sampling Level of measurement is nominal or ordinal Sampling distribution is normal in shape We can assume that the sampling distribution is approximately normal because both nPμ and n(1 − Pμ) are ≥ 15 , or 122(0.39) = 47.58 and 122(0.61) = 74.42 respectively. Step 2. State the Null Hypothesis. The research question, as stated above, asks only if the sample proportion is different from the population proportion. Because no direction is predicted for the difference, a two-tailed test will be used: H0: Pμ = 0.39 (H1: Pμ ≠ 0.39) Step 3. Select the Sampling Distribution and Establish the Critical Region. Sampling distribution = Z distribution α = 0.10, two-tailed test Z (critical) = ±1.65 Marina Andrejchenko/Shutterstock.com Research indicates that female-headed households are not disproportionately represented in low-income neighbourhoods. 350 Step 4. Compute the Test Statistic. Ps − Pμ Z (obtained) = √Pμ(1 − Pμ)/n 0.43 − 0.39 = √0.39(0.61)/122 0.04 = = +0.91 0.044 Step 5. Make a Decision and Interpret the Results of the Test. The test statistic, Z (obtained), does not fall in the critical region. Therefore, we fail to reject the H0. There is no statistically significant difference between the low-income community and the city as a whole in terms of the proportion of households headed by females. Figure 10.8 displays the sampling distribution, the critical region, and the Z (obtained). (For practice in tests of significance using sample proportions, see Problems 10.11 to 10.14, 10.15 a, b, c, and d, and 10.16.) Figure 10.8 Sampling Distribution Showing Z (obtained) versus Z (critical) for the Two-Tailed Test, with Critical Region for Alpha = 0.10 Shaded 351 352 Applying Statistics 10.2. Testing a Sample Proportion for Significance It was pointed out in Applying Statistics 10.1 that immigrants arriving in Canada in recent years tend to be well educated. In a random sample from the 2016 Canadian Census, 49% of 40,141 recent immigrants (those who have been in Canada for less than 10 years) aged 25+ had a university education (bachelor’s degree or higher) in 2016. The percentage of the Canadian population age 25 or older with a university degree or higher was 26% in that year. Are recent immigrants significantly more likely to have a university education than the population as a whole? Step 1. Make Assumptions and Meet Test Requirements. Model: Random sampling Level of measurement is nominal or ordinal Sampling distribution is normal This is a large sample—that is, nPμ and n(1 − Pμ) ≥ 15— so we can assume a normal sampling distribution. The variable “has a university degree” is nominal in level of measurement. Step 2. State the Null Hypothesis. The null hypothesis says that recent immigrants are not different from the nation as a whole: H0: Pu ≤ 0.26 The original question (“Are recent immigrants more likely to have a university education”) suggests a one-tailed research hypothesis: (H1: Pu > 0.26) The research hypothesis says that we will be concerned only with outcomes in which recent immigrants are more likely to hold a degree or with sample outcomes in the upper tail of the sampling distribution. Step 3. Select the Sampling Distribution and Establish the Critical Region. Sampling distribution = Z distribution α = 0.05 Z (critical) = +1.65 Step 4. Compute the Test Statistic. The information necessary for a test of the null hypothesis, expressed in the form of proportions, is Recent Immigrants Nation Ps = 0.49 Pu = 0.26 n = 40,141 The test statistic Z (obtained) is Ps − Pμ Z (obtained) = √Pμ(1 − Pμ)/n 0.49 − 0.26 = √0.26(1 − 0.26)/40, 141 0.23 = = +104.55 0.0022 Step 5. Make a Decision and Interpret the Results of the Test. With alpha set at 0.05, one-tailed, the critical region begins at Z (critical) = +1.65. With an obtained Z score of +104.55 , the null hypothesis is rejected. The difference between recent immigrants and Canadians as a whole is statistically significant and in the predicted direction. Recent immigrants to Canada are significantly more likely to have a university degree. Source: Data from Statistics Canada. 2016 Census of Population. One Step at a Time Completing Step 4 of the Five-Step Model: Compute Z (obtained) To Compute the Test Statistic Using Formula 10.3 1: Start with the denominator of Formula 10.3, and substitute in the value for Pμ. This value is given in the statement of the problem. 2: Subtract the value of Pμ from 1. 3: Multiply the value you found in step 2 by the value of Pμ. 4: Divide the value you found in step 3 by n. 5: Take the square root of the value you found in step 4. 6: Subtract Pμ from Ps. 7: Divide the value you found in step 6 by the value you found in step 5. This value is Z (obtained). Book Title: eTextbook: Statistics: A Tool for Social Research and Data Analysis, Fifth Canadian Edition Chapter 10. Hypothesis Testing with Means and Proportions: The One- Sample Case 10.6. Hypothesis Testing Using Confidence Intervals 10.6. Hypothesis Testing Using Confidence Intervals Hypothesis testing and interval estimation (see Chapter 6) are the two main applications of inferential statistics. While the objective of each technique is different—testing a claim about a population parameter versus estimating a population parameter, respectively—they are in reality just different ways of expressing the same information. This is especially easy to see when we compare the process of forming confidence intervals to estimate population means or proportions with the process of hypothesis testing with means or proportions. Specifically, if an interval estimate (i.e., confidence interval) does not contain the value of the parameter specified by the null hypothesis, then a hypothesis test will reject the null hypothesis, and vice versa. That is, if the value of H0 is contained within the confidence interval at a given alpha level, then H0 is not rejected at that level. On the other hand, if the value of H0 is not contained within the confidence interval at a given alpha level, then H0 is rejected at that level. So, if a 99% confidence interval does not contain the value of the 353 parameter given by the null hypothesis, then the null hypothesis is rejected at the 0.01 level; if a 95% confidence interval does not contain it, then the null hypothesis is rejected at the 0.05 level; and so on. The ability to use confidence intervals to test hypotheses applies to all situations, including means and proportions, as well as to the one-sample case covered in this chapter and the two-sample case to be discussed in Chapter 11. One Step at a Time Completing Step 5 of the Five-Step Model: Make a Decision and Interpret the Results of the Test 1: Compare your Z (obtained) to your Z (critical). If the Z (obtained) is in the critical region, reject the null hypothesis. If the Z (obtained) is not in the critical region, fail to reject the null hypothesis. 2: Interpret the decision in terms of the original question. For example, our conclusion for the example problem used in this section is “There is no significant difference between the low-income community and the city as a whole in the proportion of households that are headed by females.” To see how the confidence interval for the sample mean corresponds to the hypothesis test for the sample mean, let us take another look at the LSAT example in Section 10.2. Recall that the null hypothesis states that the population of graduates of an LSAT preparation course is just like everyone else and has a mean LSAT score of 152; that is, there is no difference in LSAT scoring between the graduates and LSAT writers as a whole, or H0: μ = 152 (H1: μ ≠ 152) For a two-tailed test (the research hypothesis does not predict a direction for the difference) with alpha set at 0.05, the critical region begins at Z (critical) = ±1.96 , or Sampling distribution = Z distribution α = 0.05, two-tailed test Z (critical) = ±1.96 We obtained a Z score of +4.00 , or ¯ X −μ 156 − 152 4 Z= = = = +4.00 σ/√n 10/√100 1 Then, we reject the null hypothesis, H0 , because the test statistic, Z (obtained), falls in the critical region, as marked by Z (critical). Next, let us construct a 95% confidence interval for the sample mean using Formula 6.1 (see Chapter 6): ¯ σ 10 c. i. = X ± Z ( ) = 156 ± 1.96 ( ) √n √100 = 156 ± 1.96(1.00) = 156 ± 1.96 Based on this result, we estimate that the population mean is greater than or equal to 154.04 and less than or equal to 157.96 at the 95% level of confidence. 354 The relationship between the confidence interval and the hypothesis test can now be seen. Because this interval does not include the value of the parameter specified by the null hypothesis, 152, we reject the null hypothesis. The difference between graduates of an LSAT training course and the entire population of LSAT writers is statistically significant at the 0.05 level. This is precisely the decision we made above using the formal hypothesis-testing approach. In sum, it will always be true that if the confidence interval contains the value of the parameter specified by the null hypothesis, then the null hypothesis cannot be rejected at the stated alpha level. If it does not, then the null hypothesis can be rejected. Book Title: eTextbook: Statistics: A Tool for Social Research and Data Analysis, Fifth Canadian Edition Chapter 10. Hypothesis Testing with Means and Proportions: The One- Sample Case Summary Summary 1. This chapter extended the examination of hypothesis testing that we began in Chapter 7. We saw how to test the null hypothesis of “no difference” for single sample means and proportions. In both cases, the central question is whether the population represented by the sample has a certain characteristic. 2. If we can predict a direction for the difference in stating the research hypothesis, a one-tailed test is called for. If we can’t predict a direction, a two-tailed test is appropriate. 3. The choice of a one-tailed test or a two-tailed test can affect the risk of making a Type I error. In general, a two-tailed test is more conservative and can lessen the risk of making a Type I error. 4. When testing sample means, the Z distribution is used to find the critical region when the population standard deviation is known and the t distribution is used when it is unknown. 5. Sample proportions can also be tested for significance. Tests are conducted using the five-step model. Compared to the test for the sample mean, the major differences lie in the level-of-measurement assumption (Step 1), the statement of the null hypothesis (Step 2), and the computation of the test statistic (Step 4). 6. Hypothesis testing and interval estimation (see Chapter 6) are just different ways of expressing the same information. If the confidence interval contains the value of the parameter specified by the null hypothesis in a one-sample test, then the null hypothesis is not rejected at the stated alpha level. Alternatively, the null hypothesis is rejected if the confidence interval does not contain the value. Summary of Formulas Single sample means, large ¯ X −μ Z (obtained) = samples (or normally σ/√n distributed population) and population standard deviation is known Single sample means, large ¯ X −μ t (obtained) = samples (or normally s/√n − 1 distributed population), and population standard deviation unknown Single sample proportions, Z (obtained) = Ps − Pμ large samples √Pμ(1 − Pμ)/n 355 Glossary One-tailed test t (critical) t (obtained) Two-tailed test Z (critical) Z (obtained) Multimedia Resources Visit the companion website for the fifth Canadian edition of Statistics: A Tool for Social Research and Data Analysis to access a wide range of student resources: www.cengage.com/healey5ce. Problems 10.1 a. For each situation, find Z (critical). Alpha Form Z (Critical) 0.05 One-tailed 0.10 Two-tailed 0.06 Two-tailed 0.01 One-tailed 0.02 Two-tailed SHOW ANSWER b. For each situation, find the critical t score. Alpha Form n t (Critical) 0.10 Two-tailed 31 0.02 Two-tailed 24 0.01 Two-tailed 121 0.01 One-tailed 31 0.05 One-tailed 61 SHOW ANSWER c. Compute the appropriate test statistic (Z or t) for each situation: ¯ μ = 2.40 X = 2.20 1. σ = 0.75 n = 200 ¯ μ = 17.10 X = 16.80 2. s = 0.90 n = 45 ¯ μ = 10.20 X = 9.40 3. s = 1.70 n = 150 Pμ = 0.57 Ps = 0.60 4. n = 117 Pμ = 0.32 Ps = 0.30 5. n = 322 SHOW ANSWER 10.2 SOC a. The student body at Algebra University attends an average of 3.3 parties per month, with a standard deviation of 0.53. A random sample of 117 sociology majors averages 3.8 parties per month. Are sociology majors significantly different from the student body as a whole? (HINT: The wording of the research question suggests a two-tailed test. This means that the alternative or research hypothesis in Step 2 is stated as H1: μ ≠ 3.3 and that the critical region is split between the upper and lower tails of the sampling distribution. See Table 10.2 for the value of Z (critical) for various alpha levels.) b. What if the research question were changed to “Do sociology majors attend a significantly greater number of parties?” How would the test conducted in part a change? (HINT: This wording implies a one-tailed test of significance. How would the research hypothesis change? For the alpha you used in part a, what would 356 the value of Z (critical) be?) 10.3 SW a. Nationally, social workers average 10.2 years of experience. In a random sample, 203 social workers in a city average only 8.7 years with a standard deviation of 0.52. Are social workers in this city significantly less experienced? (HINT: Note the wording of the research hypothesis. This situation may justify a one-tailed test of significance. If you chose a one-tailed test, what form would the research hypothesis take, and where would the critical region begin?) SHOW ANSWER b. The same sample of social workers reports an average annual salary of $65,782 with a standard deviation of $622. Is this figure significantly higher than the national average of $64,509? (HINT: The wording of the research hypothesis suggests a one-tailed test. What form would the research hypothesis take, and where would the critical region begin?) SHOW ANSWER 10.4 SOC Nationally, the average score on the GRE (Graduate Record Examinations) verbal test is 453 with a standard deviation of 95. A random sample of 152 first-year graduate students entering Algebra University shows a mean score of 502. Is there a significant difference? 10.5 SOC According to Statistics Canada, Canadians spend $755 per year on alcoholic beverages. As an employee of a research firm working for a local beer distributor, you have accessed data on a random sample of 570 London, Ontario, residents to see if their spending is higher than average, and perhaps a good area for your client to make a local investment. The average spending in the sample is $740, with a standard deviation of $202. Does this represent a significant difference? SHOW ANSWER 10.6 SOC A sample of 105 part-time workers in the Roadster Division of the Toy Car Factory earns an average of $24,375 per year. The average salary for all part-time workers is $24,230, with a standard deviation of $523. Are workers in the Roadster Division overpaid? Conduct both one- and two-tailed tests. 10.7 SOC a. Nationally, the population as a whole watches an average of 6.2 hours of TV per day. A random sample of 1,017 older adults reports watching an average of 5.9 hours per day, with a standard deviation of 0.70. Is the difference significant? SHOW ANSWER b. The same sample of older adults reports that they belong to an average of 2.1 voluntary organizations and clubs, with a standard deviation of 0.50. Nationally, the average is 1.7. Is the difference significant? SHOW ANSWER 10.8 SOC A school system has assigned several hundred students with learning disabilities to an alternative educational experience. To assess the program, a random sample of 35 has been selected for comparison with all students in the system. (NOTE: For each variable below, the distribution of all students is normally distributed.) a. In terms of GPA, did the program work? Systemwide GPA Program GPA ¯ μ = 2.47 X = 2.55 s = 0.70 n = 35 b. In terms of absenteeism (number of days missed per year), what can be said about the success of the program? Systemwide Program ¯ μ = 6.13 X = 4.78 s = 1.11 n = 35 c. In terms of standardized test scores in math and reading, was the program a success? Math Test Systemwide Math Test Program ¯ μ = 103 X = 106 s = 2.0 n = 35 357 Reading Test Systemwide Reading Test Program ¯ μ = 110 X = 113 s = 2.0 n = 35 (HINT: Note the wording of the research questions. Is a one-tailed test justified? Is the program a success if the students in the program are no different from students systemwide? What if the program students were performing at lower levels? If a one-tailed test is used, what form should the research hypothesis take? Where will the critical region begin?) 10.9 SOC According to Statistics Canada, the mean age of mothers at first birth is 29.2 years. To determine how different urban working-class mothers are from this norm, you have gathered a random sample of 105 urban working-class mothers and found that their average age when their first child was born is 26, with a standard deviation of 2.50. Are working- class mothers in urban areas different than Canadian mothers in general? Conduct both one- and two-tailed tests. SHOW ANSWER 10.10 PA Nationally, the per capita property tax is $130 per month (and normally distributed). A random sample of 36 western cities averages $98 per month with a standard deviation of $5. Is the difference significant? Summarize your conclusions in a sentence or two. 10.11 GER/CJ A survey shows that 10% of the population is victimized by property crime each year. A random sample of 527 older adults (65 years or more of age) shows a victimization rate of 14%. Are older adults more likely to be victimized? Conduct a one-tailed test of significance. SHOW ANSWER 10.12 CJ A random sample of 113 convicted male sex offenders in a provincial prison system completed a program designed to change their attitudes toward sex and violence before being released on parole. Fifty- eight eventually became repeat sex offenders. Is this recidivism rate significantly different from the rate for all offenders (57%) in that province? Summarize your conclusions in a sentence or two. (HINT: You must use the information given in the problem to compute a sample proportion. Remember to convert the population percentage to a proportion.) 10.13 PS In a recent election, 55% of the voters rejected a proposal to institute a new provincial lottery. In a random sample of 150 voters from rural communities, 49% rejected the proposal. Is the difference significant? Summarize your conclusions in a sentence or two. SHOW ANSWER 10.14 CJ Provincially, the police clear by arrest 35% of the robberies and 42% of the aggravated assaults reported to them. A researcher takes a random sample of all the robberies (n = 207) and aggravated assaults (n = 178) reported to a city police department in one year and finds that 83 of the robberies and 80 of the assaults were cleared by arrest. Are the local arrest rates significantly different from the provincial rate? Write a sentence or two interpreting your decision. 10.15 SOC/SW A researcher has compiled a file of information on a random sample of 317 families in a city that has chronic, long-term patterns of child abuse. Below are reported some of the characteristics of the sample, along with values for the city as a whole. For each trait, test the null hypothesis of “no difference” and summarize your findings. a. Mothers’ educational level (proportion completing high school): City Sample Pu = 0.63 Ps = 0.61 358 SHOW ANSWER b. Family size (proportion of families with four or more children): City Sample Pμ = 0.21 Ps = 0.26 SHOW ANSWER c. Mothers’ work status (proportion of mothers with jobs outside the home): City Sample Pμ = 0.51 Ps = 0.27 SHOW ANSWER d. Relations with relatives (proportion of families that have contact with relatives at least once a week): City Sample Pμ = 0.82 Ps = 0.43 SHOW ANSWER e. Fathers’ educational achievement (average years of formal schooling): City Sample ¯ μ = 12.30 X = 12.50 s = 1.70 SHOW ANSWER f. Fathers’ occupational stability (average years in present job): City Sample ¯ μ = 5.20 X = 3.70 s = 0.50 SHOW ANSWER 10.16 SW You are the head of an agency seeking funding for a program to reduce unemployment among young people aged 18 to 25. Nationally, the unemployment rate for this group is 18%. A random sample of 323 people in this age group in your area reveals an unemployment rate of 21.7%. Is the difference significant? Can you demonstrate a need for the program? Should you use a one-tailed test in this situation? Why or why not? Explain the result of your test of significance as you would to a funding agency. 10.17 PA Management of a large factory has received a grievance about low wages from the union representing factory workers. Not having much time, the managers gather the records of a random sample of 27 factory workers and find that their average salary is $38,073, with a standard deviation of $575. If they know that salaries of factory workers (at the national level) are normally distributed with a mean of $38,202, how can they respond to the complaint? Should they use a one-tailed test in this situation? Why or why not? What would they say in a memo to the union to respond to the complaint? SHOW ANSWER 10.18 The following essay questions review the basic principles and concepts of inferential statistics. The order of the questions roughly follows the five-step model. a. Hypothesis testing or significance testing can be conducted only with a random sample. Why? b. Under what specific conditions can it be assumed that the sampling distribution is normal in shape? c. Explain the role of the sampling distribution in a test of hypothesis. d. The null hypothesis is an assumption about reality that makes it possible to test sample outcomes for their significance. Explain. e. What is the critical region? How is the size of the critical region determined? f. Describe a research situation in which a one-tailed test of hypothesis would be appropriate. g. Thinking about the shape of the sampling distribution, why does use of the t distribution (as opposed to the Z distribution) make it more difficult to reject the null hypothesis? h. What exactly can be concluded in the one-sample case when the test statistic falls in the critical region? 10.19 SOC A researcher is studying changes in the student body at their university and has selected a random sample of 163 first-year students. The table below compares their characteristics to those of the student body as a whole. Which differences are significant? 359 SHOW ANSWER You Are the Researcher Using SPSS to Conduct a One-Sample Test with the 2018 CCHS The demonstration and exercise below use the shortened version of the 2018 CCHS data. Start SPSS for Windows, and open the CCHS_2018_Shortened.sav file. SPSS DEMONSTRATION 10.1 Using the Select Cases Command to Conduct a One-Sample Test In this demonstration, we’ll conduct a one-sample test to compare the number of hours worked per week of a sample of people with poor health, which we will define as those with fair or poor self-perceived health, to the typical number of hours worked of all Canadians, which we will assume is 40 hours per week. We predict that individuals in poor health will work fewer hours than the general population. If we find that the average number of hours worked for the sample of people in poor health is significantly less than that of the general population, we can conclude that individuals in poor health in Canada tend to work fewer hours per week than the population as a whole. First, we need to use the Select Cases command to select those with fair or poor health for the analysis. Click Data from the menu bar of the Data Editor window, and then click Select Cases. The Select Cases window appears and presents a number of different options. Click the button next to “If condition is satisfied,” and then click on the If button. The Select Cases: If dialog box will open, where you specify the cases to be included in the analysis. Find and highlight gen_005 (perceived health) from the variable list on the left side of the dialog box, and then click the arrow to move gen_005 into the text box. In this text box, type >= 4 immediately to the right of the variable name gen_005. This statement instructs SPSS to select any case with a value more than or equal to 4 (i.e., any case with a value of 4 or 5). A value of 4 on gen_005 indicates a person with fair health and a value of 5 a person with poor health. 360 The expression gen _ 005 >= 4 should appear in the text box. Click Continue and then OK. The Select Cases command confines all subsequent analysis to this subset of cases, individuals with fair or poor health. This is easily verified, because the status bar at the bottom of the SPSS window displays the message “Filter On.” It is important to note that the unselected cases in the data file, while not included in the analysis, do remain in the data set. Because the population standard deviation is unknown, we will use the One- Sample T Test procedure to test whether the mean number of hours worked per week for people with poor health differs from that of all Canadians. From the main menu bar, click Analyze, Compare Means, and then One-Sample T Test. The One-Sample T Test dialog box will open with the usual list of variables on the left. Find and move the cursor over lbfdghpw (total usual hours worked per week), and click the top arrow in the middle of the window to move lbfdghpw to the Test Variable(s) box. Next, click the Test Value box and type 40. Click OK, and the following output will be produced. (NOTE: Do not forget to turn filtering off after finishing the One-Sample T Test procedure. To do this, return to the Select Cases dialog box, select the “All cases” button, and then click OK.) In the first block of output (“One-Sample Statistics”) are some descriptive statistics. There are 73 people with poor health with a mean number of hours worked per week of 38.2, which is different from the mean number of hours worked of 40 for the general population. Is the difference in means significant? The results of the test for significance are reported in the next block (“One- Sample Test”) of output. In the second output block, “One-Sample Test,” we are given the value of t (obtained) (−0.878) and the degrees of freedom (72) needed to test whether the difference between the sample mean of 38.2 hours and the population mean of 40 hours is statistically significant. To test for significance, we look up the t (critical) in the t table in Appendix B and then compare the t (obtained) value to the t (critical) value, as practised throughout this chapter. 361 Because we predicted a direction for this difference (individuals in poor health work fewer hours than the general population), a one-tailed tes

Statistics: Hypothesis Testing with Means and Proportions (PDF)

Document Details

Tags

Related

Summary

Full Transcript