Hypothesis Testing PDF
Document Details
Uploaded by Deleted User
Arijit Mitra
Tags
Summary
These lecture notes cover hypothesis testing, a core concept in statistics. The document details the process of formulating and testing hypotheses, differentiating between null and alternative hypotheses, and explains the concepts of Type I and Type II errors. The notes also provide examples illustrating different ways to apply hypothesis testing.
Full Transcript
Hypothesis Testing Arijit Mitra Hypothesis Testing: the Term – Inferential Statistics We draw inferences about the population (about their descriptive statistics i.e., population parameters) based on the results that we obtain from the collected data At first, we develop something cal...
Hypothesis Testing Arijit Mitra Hypothesis Testing: the Term – Inferential Statistics We draw inferences about the population (about their descriptive statistics i.e., population parameters) based on the results that we obtain from the collected data At first, we develop something called hypothesis, i.e., what we expect about the population (may be for any parameter) and then accordingly we collect required data and check the estimator (we say test statistic) from the dataset The gap between the Statistic and the actual parameter expected decides whether our hunch is correct or not (with a certain probability) Unfortunately, the difference between the hypothesized population parameter and the actual statistic is more often neither so large that we automatically reject our hypothesis nor so small that we just as quickly accept it So, in hypothesis testing, as in most significant real-life decisions, clear-cut solutions are the exception, not the rule. Hypothesis Testing Hypothesis testing can be used to determine whether a statement about the value of a population parameter should or should not be rejected. The null hypothesis, denoted by H0 , is a tentative assumption about a population parameter. The alternative hypothesis, denoted by Ha, is the opposite of what is stated in the null hypothesis. The hypothesis testing procedure uses data from a sample to test the two competing statements indicated by H0 and Ha. The null and alternative hypotheses ( H0 and Ha ) must be mutually exclusive and collectively exhaustive Developing Null and Alternative Hypotheses It is not always obvious how the null and alternative hypotheses should be formulated. Care must be taken to structure the hypotheses appropriately so that the test conclusion provides the information the researcher wants. The context of the situation is very important in determining how the hypotheses should be stated. In some cases it is easier to identify the alternative hypothesis first. In other cases the null is easier. Correct hypothesis formulation will take practice. Developing Null and Alternative Hypotheses Alternative Hypothesis as a Research Hypothesis Many applications of hypothesis testing involve an attempt to gather evidence in support of a research hypothesis. In such cases, it is often best to begin with the alternative hypothesis and make it the conclusion that the researcher hopes to support. The conclusion that the research hypothesis is true is made if the sample data provides sufficient evidence to show that the null hypothesis can be rejected. Example 1: A new teaching method is developed that is believed to be better than the current method. Null Hypothesis: The new method is no better than the old method. Alternative Hypothesis: The new teaching method is better. Example 2: A new sales force bonus plan is developed in an attempt to increase sales. Null Hypothesis: The new bonus plan will not increase sales. Alternative Hypothesis: The new bonus plan will increase sales. Example 3: A new drug is developed with the goal of lowering blood pressure more than the existing drug. Null Hypothesis: The new drug does not lower blood pressure more than the existing drug. Alternative Hypothesis: The new drug lowers blood pressure more than the existing drug. Developing Null and Alternative Hypotheses Null Hypothesis as an Assumption to be Challenged We might begin with a belief or assumption that a statement about the value of a population parameter is true. We then use a hypothesis test to challenge the assumption and determine if there is statistical evidence to conclude that the assumption is incorrect. In these situations, it is helpful to develop the null hypothesis first. Example: The label on a soft drink bottle states that it contains 67.6 fluid ounces. 6 Summary of Forms for Null and Alternative Hypotheses The equality part of the hypotheses always appears in the null hypothesis. In general, a hypothesis test about the value of a population mean μ must take one of the following three forms (where μ0 is the hypothesized value of the population mean). Type I and Type II Error Because hypothesis tests are based on sample data, we must allow for the possibility of errors. A Type I error is rejecting H0 when it is true. The probability of making a Type I error when the null hypothesis is true as an equality is called the level of significance. Applications of hypothesis testing that only control for the Type I error are often called significance tests. A Type II error is accepting H0 when it is false. It is difficult to control for the probability of making a Type II error. Statisticians avoid the risk of making a Type II error by using “do not reject H0” rather than “accept H0”. p-Value Approach to One-Tailed Hypothesis Testing The p-value is the probability, computed using the test statistic, that measures the support (or lack of support) provided by the sample for the null hypothesis. If the p-value is less than or equal to the level of significance α, the value of the test statistic is in the rejection region. Reject H0 if the p-value ≤ α. Suggested Guidelines for Interpreting p-Values Less than 0.01: Overwhelming evidence to conclude Ha is true. Between 0.01 and 0.05: Strong evidence to conclude Ha is true. Between.05 and.10: Weak evidence to conclude Ha is true. Greater than.10: Insufficient evidence to conclude Ha is true. p-Value Approach Lower-Tailed Test About a Upper-Tailed Test About a Population Mean: σ Known Population Mean: σ Known Critical Value Approach to One-Tailed Hypothesis Testing The test statistic 𝑧 has a standard normal probability distribution. We can use the standard normal probability distribution table to find the 𝑧-value with an area of α in the lower (or upper) tail of the distribution. The value of the test statistic that established the boundary of the rejection region is called the critical value for the test. The rejection rule is: Lower tail: Reject H0 if 𝑧 ≤ –𝑧α Upper tail: Reject H0 if 𝑧 ≥ 𝑧α 11 Critical Value Approach Lower-Tailed Test About a Upper-Tailed Test About a Population Mean: σ Known Population Mean: σ Known Steps of Hypothesis Testing Step 1. Develop the null and alternative hypotheses. Step 2. Specify the level of significance α. Step 3. Collect the sample data and compute the value of the test statistic. p-Value Approach Step 4. Use the value of the test statistic to compute the p-value. Step 5. Reject H0 if p-value ≤ α. Critical Value Approach Step 4. Use the level of significance α to determine the critical value and the rejection rule. Step 5. Use the value of the test statistic and the rejection rule to determine whether to reject H0. Example: Metro EMS The response times for a random sample of 40 medical emergencies were tabulated. The sample mean is 13.25 minutes. The population standard deviation is believed to be 3.2 minutes. The EMS director wants to perform a hypothesis test, with a.05 level of significance, to determine whether the service goal of 12 minutes or less is being achieved. 1. Develop the hypotheses. 2. Specify the level of significance. α =.05 3. Compute the value of the test statistic. p –Value Approach 4. Compute the p –value. For z = 2.47, the cumulative probability is 0.9932. p-value = 1 – 0.9932 = 0.0068 5. Determine whether to reject H0. Because p-value = 0.0068 ≤ α = 0.05, we reject H0. There is sufficient statistical evidence to infer that Metro EMS is not meeting the response goal of 12 minutes. Critical Value Approach p-Value Approach to Two-Tailed Hypothesis Testing Compute the p-value using the following three steps: 1. Compute the value of the test statistic 𝑧. 2. If 𝑧 is in the upper tail (𝑧 > 0), compute the probability that 𝑧 is greater than or equal to the value of the test statistic. If 𝑧 is in the lower tail (𝑧 < 0), compute the probability that 𝑧 is less than or equal to the value of the test statistic. 3. Double the tail area obtained in step 2 to obtain the p-value. The rejection rule: Reject H0 if the p-value ≤ α. Critical Value Approach to Two-Tailed Hypothesis Testing The critical values will occur in both the lower and upper tails of the standard normal curve. Use the standard normal probability distribution table to find 𝑧α/2 (the 𝑧-value with an area of α/2 in the upper tail of the distribution). The rejection rule is: Ans: Develop the hypothesis (H0 and Ha ) H0 : The hypothesized population mean is μ = 0.04 inch (Contractor’s Claim) Ha : The hypothesized population mean is μ ≠ 0.04 inch (Rejection case) It is a two-tailed hypothesis testing for the single population mean; Calculate the sample standard deviation first and then calculate Z value. 𝜎 0.004 𝑠= 𝑛 = 100 = 0.0004 𝑖𝑛𝑐ℎ In other word, now the problem is that, if the true mean is 0.04 inch and the standard deviation is 0.004 inch, what are the chances of getting a sample mean that differs from 0.04 inch by 0.0008 (= 0.0408 – 0.04) inch or more? ẍ−μ (0.0408 −0.04) 𝑧= 𝑠 = 0.0004 = 2 ; find out the probability that IzI ≥ 2. Going by p value approach, this is p = 0.0456 i.e., there is 4.56% chance that the sample mean will differ from the population mean by 2 Standard error or more With this low a chance, Parkhill could conclude that a population with a true mean of 0.04 inch would not be likely to produce a sample like this. The project supervisor would reject the aluminum company’s statement about the mean thickness of the sheets. Example: Glow Toothpaste The production line for Glow toothpaste is designed to fill tubes with a mean weight of 6 oz. Periodically, a sample of 30 tubes will be selected in order to check the filling process. Quality assurance procedures call for the continuation of the filling process if the sample results are consistent with the assumption that the mean filling weight for the population of toothpaste tubes is 6 oz.; otherwise the process will be adjusted. Assume that a sample of 30 toothpaste tubes provides a sample mean of 6.1 oz. The population standard deviation is believed to be 0.2 oz. Perform a hypothesis test, at the 0.03 level of significance, to help determine whether the filling process should continue operating or be stopped and corrected. 1. Develop the hypotheses. 2. Specify the level of significance. α = 0.03 3. Compute the value of the test statistic. p –Value Approach 4. Compute the p –value. For z = 2.74, the cumulative probability is 0.9969. p-value = 2(1 – 0.9969) = 0.0062 5. Determine whether to reject H0. Because p-value = 0.0062 ≤ α = 0.03, we reject H0. There is sufficient statistical evidence to infer that the alternative hypothesis is true (i.e. the mean filling weight is not 6 ounces). Critical Value Approach 4. Determine the critical value and the rejection rule. There is sufficient statistical evidence to infer that the alternative hypothesis is true (i.e. the mean filling weight is not 6 ounces). Confidence Interval Approach to Two-Tailed Tests About a Population Mean Select a simple random sample from the population and use the value of the sample mean 𝑥ҧ to develop the confidence interval for the population mean μ. (Confidence intervals are covered in the previous class) If the confidence interval contains the hypothesized value μ0, do not reject H0. Otherwise, reject H0. (Actually, H0 should be rejected if μ0 happens to be equal to one of the end points of the confidence interval.) Example: hypothesized value for the population mean, μ0 = 6 30 samples have been collected and it was found that the sample Mean is 6.1; population standard deviation is known to be 0.2 The 97% confidence interval for μ is Because the hypothesized value for the population mean, μ0 = 6, is not in this interval, the hypothesis-testing conclusion is that the null hypothesis, H0: μ = 6, can be rejected. Tests About a Population Mean: σ Unknown Test Statistic: This test statistic has a t distribution with n – 1 degrees of freedom. Rejection Rule: p-value approach Rejection Rule: Critical value approach p -Values and the t Distribution The format of the t distribution table provided in most statistics textbooks does not have sufficient detail to determine the exact p-value for a hypothesis test. However, we can still use the t distribution table to identify a range for the p- value. An advantage of computer software packages is that the computer output will provide the p-value for the t distribution. One-Tailed Test About a Population Mean: σ Unknown Example: Highway Patrol A State Highway Patrol periodically samples vehicle speeds at various locations on a particular roadway. The sample of vehicle speeds is used to test the hypothesis H0: μ ≤ 65. The locations where H0 is rejected are deemed the best locations for radar traps. At Location F, a sample of 64 vehicles shows a mean speed of 66.2 mph with a standard deviation of 4.2 mph. Use α = 0.05 to test the hypothesis. 1. Develop the hypotheses. 2. Specify the level of significance. α =.05 3. Compute the value of the test statistic. p –Value Approach Critical Value Approach 4. Compute the p –value. 4. Determine the critical value and the rejection rule. 5. Determine whether to reject H0. 5. Determine whether to reject H0. Because p-value < α = 0.05, we reject H0. Because 2.286 ≥ 1.669, we reject H0. We are at least 95% confident that the mean speed We are at least 95% confident that of vehicles at Location F is greater than 65 mph. the mean speed of vehicles at Location F is greater than 65 mph. Location F is a good candidate for a radar trap. 25 A Summary of Forms for Null and Alternative Hypotheses About a Population Proportion The equality part of the hypotheses always appears in the null hypothesis. In general, a hypothesis test about the value of a population proportion 𝑝 must take one of the following three forms (where 𝑝0 is the hypothesized value of the population proportion). Test Statistic: Rejection Rule: p –Value Approach Reject H0 if p –value ≤ α Rejection Rule: Critical Value Approach 27 Two-Tailed Test About a Population Proportion Example: National Safety Council (NSC) For a Christmas and New Year’s week, the National Safety Council estimated that 500 people would be killed and 25,000 injured on the nation’s roads. The NSC claimed that 50% of the accidents would be caused by drunk driving. A sample of 120 accidents showed that 67 were caused by drunk driving. Use these data to test the NSC’s claim with α = 0.05. 28 p –Value Approach 4. Compute the p –value. For z = 1.28, the cumulative probability = 0.8997. p-value = 2(1 – 0.8997) = 0.2006. 5. Determine whether to reject H0. Because p-value = 0.2006 > α = 0.05, we cannot reject H0. We do not have convincing evidence that the true proportion of accidents that would be caused by drunk driving is different than 50%. Critical Value Approach 4. Determine the critical value and the rejection rule. 5. Determine whether to reject H0. Because 1.278 is not less than −1.96 and is not greater than 1.96, we cannot reject H0. We do not have convincing evidence that the true proportion of accidents that would be caused by drunk driving is different than 50%. Hypothesis Testing: Inference About Means and Proportions with Two Populations Arijit Mitra Estimating the Difference Between Two Population Means ഥ𝟏 − 𝒙 Sampling Distribution of 𝒙 ഥ𝟐 Mean/Expected value: Standard Deviation (Standard Error): Interval Estimation of μ1 – μ2 when σ1 and σ2 are Known Interval Estimate: μ1 – μ2 = D0 = The Test Statistic, thus, maybe calculated in this case as: Example: Par, Inc. Par, Inc. is a manufacturer of golf equipment and has developed a new golf ball that has been designed to provide “extra distance.” In a test of driving distance using a mechanical driving device, a sample of Par golf balls was compared with a sample of golf balls made by Rap, Ltd., a competitor. The sample statistics are given below Empty cell Sample # 1 Sample # 2 Par, Inc. Rap, Ltd. Sample Size 120 balls 80 balls Sample Mean 295 yards 278 yards Based on data from previous driving distance tests, the two population standard deviations are known with σ1 = 15 yards and σ2 = 20 yards. Let us develop a 95% confidence interval estimate of the difference between the mean driving distances of the two brands of golf ball. Interval Estimation of μ1 – μ2 when σ1 and σ2 are Known μ1 – μ2 = D0 = We are 95% confident that the difference between the mean driving distances of Par, Inc. balls and Rap, Ltd. balls is 11.86 to 22.14 yards. Hypothesis Tests About μ1 – μ2 when σ1 and σ2 are Known A hypothesis test about the value of the difference in two population means 𝜇1 −𝜇2 must take one of the following three forms (where D0 is the hypothesized difference in the population means). Test Statistic: Example: Par, Inc. Can we conclude, using α = 0.05 and 0.01, that the mean driving distance of Par, Inc. golf balls is greater than the mean driving distance of Rap, Ltd. golf balls? 1. Develop the hypotheses. 2. Specify the level of significance. α = 0.01 Home Task: Check Critical value for α = 0.05 3. Compute the value of the test statistic. p –Value Approach Critical Value Approach 4. Determine the critical value and the rejection rule. 4. Compute the p –value: For z = 6.49, the p-value < 0.0001 For α = 0.01, 𝑧0.01 = 2.33. We will reject H0 if 𝑧 ≥ 2.33. 5. Determine whether to reject H0. Because p-value < 0.0001 ≤ α = 0.01, we reject H0. 5. Determine whether to reject H0. At the 0.01 level of significance, the sample evidence Because 6.49 ≥ 2.33, we reject H0. indicates the mean driving distance of Par, Inc. golf The sample evidence indicates the mean driving distance balls is greater than the mean driving distance of Rap, of Par, Inc. golf balls is greater than the mean driving Ltd. golf balls. distance of Rap, Ltd. golf balls. Interval Estimation of μ1 – μ2 when σ1 and σ2 are Unknown Interval Estimate Student t test (Pooled t test; Assuming Equal Variance) Welch t test (Assuming Unequal Variance) μ1 – μ2 = D0 = μ1 – μ2 = D0 = Calculating Test Statistic for t tests Student t test (Pooled t test; Assuming Welch t test (Assuming Unequal Variance) Equal Variance) Example: Specific Motors Specific Motors of Detroit has developed a new Automobile known as the M car. 24 M cars and 28 J cars (from Japan) were road tested to compare miles-per-gallon (mpg) performance. Empty cell Sample #1 Sample #2 M Cars J Cars Sample Size 24 cars 28 cars Sample Mean 29.8 miles per gallon 27.3 miles per gallon Sample Std. Dev. 2.56 miles per gallon 1.81 miles per gallon Let us develop a 90% confidence interval estimate of the difference between the mpg performances of the two models of automobile (Assuming unequal variances). Let 𝝁𝟏 = the mean miles per gallon for the population of M cars. 𝝁𝟐 = the mean miles per gallon for the population of J cars. The degrees of freedom for tα/2 for unequal variances are (2.56) 2 (1.81) 2 + df = 24 28 = 40.59 = 40 2 2 2 2 1 (2.56) 1 (1.81) + 24 − 1 24 28 − 1 28 with α/2 = 0.05 and df = 40, tα/2 = 1.684 Interval Estimation of μ1 – μ2 when σ1 and σ2 are Unknown (Using Welch t test i.e., assuming unequal variances) We are 90% confident that the difference between the miles-per-gallon performances of M cars and J cars is 1.449 to 3.551 mpg. Home Task: Check the interval for pooled / Student t test and also check the Critical value for α = 0.05, 0.01 for both t tests Hypothesis Tests About μ1 – μ2 when σ1 and σ2 are Unknown A hypothesis test about the value of the difference in two population means 𝜇1 −𝜇2 must take one of the following three forms (where D0 is the hypothesized difference in the population means). Test Statistic: Example: Specific Motors Can we conclude, using a.05 level of significance, that the miles-per-gallon (mpg) performance of M cars is greater than the miles-per-gallon performance of J cars? 1. Develop the hypotheses. 2. Specify the level of significance. α = 0.05 3. Compute the value of the test statistic The degrees of freedom for 𝑡𝛼 are (2.56) 2 (1.81) 2 + df = 24 28 = 40.59 = 40 2 2 2 2 1 (2.56) 1 (1.81) + 24 − 1 24 28 − 1 28 Critical Value Approach p –Value Approach 4. Determine the critical value and the rejection rule. 4. Compute the p-value. For α = 0.05 and df = 41, 𝑡0.05 = 1.683. We will reject H0 if 𝑡 ≥ 1.683. For t = 4.003 and df = 41 the p-value < 0.005 5. Determine whether to reject H0. 5. Determine whether to reject H0. Because 4.003 ≥ 1.683, we reject H0. Because p-value ≤ α = 0.05, we reject H0. At the 0.05 level of significance, the sample evidence indicates that the miles-per-gallon (mpg) performance of M We are at least 95% confident that the miles-per- cars is greater than the miles-per-gallon performance of J gallon (mpg) performance of M cars is greater than cars. That is, M cars have a worse mpg rating than J cars. the miles-per-gallon performance of J cars. Inferences About the Difference Between Two Population Means: Matched Samples /Paired Sample With a matched-sample design each sampled item provides a pair of data values. This design often leads to a smaller sampling error than the independent-sample design because variation between sampled items is eliminated as a source of sampling error. Example: Express Deliveries A Chicago-based firm has documents that must be quickly distributed to district offices throughout the U.S. The firm must decide between two delivery services, UPX (United Parcel Express) and INTEX (International Express), to transport its documents. In testing the delivery times of the two services, the firm sent two reports to a random sample of its district offices with one report carried by UPX and the other report carried by INTEX. Do the data on the next slide indicate a difference in mean delivery times for the two services? Use a 0.05 level of significance. Paired t test The mean of the differences is The standard deviation of the differences is The Test Statistic Solution of the problem with Paired t test 1. Develop the hypotheses. 2. Specify the level of significance. α = 0.05 3. Compute the value of the test statistic. p –Value Approach Critical Value Approach 4. Compute the p-value. Determine the critical value and the rejection rule. For t = 2.94 and df = 9 the p-value is between 0.02 and 0.01. For α = 0.05 and df = 9, 𝑡0.025 = 2.262. We will reject H0 if 𝑡 ≥ 2.262. Note: This is a two-tailed test, so we doubled the upper-tail areas of 0.005 and 0.01. 5. Determine whether to reject H0. Because 2.94 ≥ 2.262, we reject H0. 5. Determine whether to reject H0. Because p-value ≤ α = 0.05, we reject H0. We are at least 95% confident that there is a difference in mean At the 0.05 level of significance, the sample evidence indicates delivery times for the two services. that there is a difference in mean delivery times for the two services. ഥ𝟏 − 𝒑 Sampling Distribution of 𝒑 ഥ𝟐 Mean/Expected value: Standard Deviation (Standard Error): where: 𝒑𝟏 = proportion for population 1 𝒑𝟐 = proportion for population 2 𝒏𝟏 = sample size from population 1 𝒏𝟐 = sample size from population 2 If the sample sizes are large, the sampling distribution of (p̅1 – p̅2) can be approximated by a normal probability distribution. The sample sizes are sufficiently large if all of these conditions are met: Interval Estimation of p1 – p2 Interval Estimate: (p1 – p2) = ( ) - (p1 – p2) Test Statistic: = Standard error of 𝑝1ҧ − 𝑝2ҧ when 𝑝1 = 𝑝2 = 𝑝: 1 1 Test Statistic 𝜎𝑝ҧ1−𝑝ҧ2 = 𝑝(1 − 𝑝) + 𝑛1 𝑛2 Pooled estimator of 𝑝 when 𝑝1 = 𝑝2 = 𝑝: 𝑛1 𝑝1ҧ + 𝑛2 𝑝ҧ2 𝑝ҧ = 𝑛1 + 𝑛2 Example: Market Research Associates Market Research Associates is conducting research to evaluate the effectiveness of a client’s new advertising campaign. Before the new campaign began, a telephone survey of 150 households in the test market area showed 60 households “aware” of the client’s product. The new campaign has been initiated with TV and newspaper advertisements running for three weeks. A survey conducted immediately after the new campaign showed 120 of 250 households “aware” of the client’s product. Does the data support the position that the advertising campaign has provided an increased awareness of the client’s product? Hence, the 95% confidence interval for the difference in before and after awareness of the product is –0.02 to 0.18. Hypothesis Tests about p1 – p2 Hypotheses: 1. One-tailed, lower tail: 𝐻0 : 𝑝1 −𝑝2 ≥ 0 𝐻𝑎 : 𝑝1 −𝑝2 < 0 2. One-tailed, upper tail: 𝐻0 : 𝑝1 −𝑝2 ≤ 0 𝐻𝑎 : 𝑝1 −𝑝2 > 0 3. Two-tailed: 𝐻0 : 𝑝1 −𝑝2 = 0 𝐻𝑎 : 𝑝1 −𝑝2 ≠ 0 Note: We will focus on tests involving no difference between the two population proportions. Example: Market Research Associates (Contd.) Can we conclude, using a 0.05 level of significance, that the proportion of households aware of the client’s product increased after the new advertising campaign? 2. Specify the level of significance: α = 0.05 3. Compute the value of the test statistic. p –Value Approach 4. Compute the p –value: For z = 1.56, the p-value = Critical Value Approach.0594 4. Determine the critical value and rejection rule. 5. Determine whether to reject H0. For α=.05, z.05 = 1.645 Reject H0 if z > 1.645 Because p-value > α = 0.05, we cannot reject H0. 5. Determine whether to reject H0. We cannot conclude that the proportion of households aware of the client’s product increased after the new Because 1.56 < 1.645, we cannot reject H0. campaign. We cannot conclude that the proportion of households aware of the client’s product increased after the new campaign. Hypothesis Testing For Single and Two Population Variances Arijit Mitra Inferences About a Population Variance A variance can provide important decision-making information. Consider the production process of filling containers with a liquid detergent product. The mean filling weight is important, but also is the variance of the filling weights. By selecting a sample of containers, we can compute a sample variance for the amount of detergent placed in a container. If the sample variance is excessive, overfilling and underfilling may be occurring even though the mean is correct. Chi-Square Distribution The chi-square distribution is based on sampling from a normal population. We can use the chi-square distribution to develop interval estimates and conduct hypothesis tests about a population variance. Examples of Sampling Distribution of Interval Estimation of σ2 Taking the square root of the upper and lower limits of the variance interval provides the confidence interval for the population standard deviation. Example: Buyer’s Digest (A) Buyer’s Digest rates thermostats manufactured for home temperature control. In a recent test, 10 thermostats manufactured by ThermoRite were selected and placed in a test room that was maintained at a temperature of 68oF. We will use the 10 readings below to develop a 95% confidence interval estimate of the population variance. Thermostat 1 2 3 4 5 6 7 8 9 10 Temperature 67.4 67.8 68.2 69.3 69.5 67.0 68.1 68.6 67.9 67.2 Selected Values from the Chi-Square Distribution Table For n – 1 = 10 – 1 = 9 df and α = 0.05 Degrees of.99 Area in.975 Area in.95 Area in.90 Area in.10 Area in.05 Area in.025 Area in.01 Area in Freedom Upper Tail Upper Tail Upper Tail Upper Tail Upper Tail Upper Tail Upper Tail Upper Tail 5 0.554 0.831 1.145 1.610 9.236 11.070 12.832 15.086 6 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 7 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 8 1.647 2.180 2.733 3.490 13.362 15.507 17.535 20.090 9 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 10 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 So, n – 1 = 10 – 1 = 9 degrees of freedom and α= 0.05 The sample variance s2 provides a point estimate of σ2. A 95% confidence interval for the population variance is given by: Hypothesis Testing About a Population Variance Hypothesis Testing About a Population Variance For each type of test, the chi-square critical values are based on a chi-square distribution with 𝑛 − 1 degrees of freedom. 8 Example: Buyer’s Digest (B) Recall that Buyer’s Digest is rating ThermoRite thermostats. Buyer’s Digest gives an “acceptable” rating to a thermostat with a temperature variance of 0.5 or less. Using the 10 readings, we will conduct a hypothesis test (with a = 0.10) to determine whether the ThermoRite thermostat’s temperature variance is “acceptable”. Thermostat 1 2 3 4 5 6 7 8 9 10 Temperature 67.4 67.8 68.2 69.3 69.5 67.0 68.1 68.6 67.9 67.2 For n – 1 = 10 – 1 = 9 df and a = 0.10 Selected Values from the Chi-Square Distribution Table Degrees of.99 Area.975 Area.95 Area.90 Area.10 Area.05 Area.025 Area.01 Area Freedom in Upper in Upper in Upper in Upper in Upper in Upper in Upper in Upper Tail Tail Tail Tail Tail Tail Tail Tail 5 0.554 0.831 1.145 1.610 9.236 11.070 12.832 15.086 6 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 7 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 8 1.647 2.180 2.733 3.490 13.362 15.507 17.535 20.090 9 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 10 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 Rejection Region Using the p-Value Inferences About Two Population Variances We may want to compare the variances in: product quality resulting from two different production processes temperatures for two heating devices, or assembly times for two assembly methods. We use data collected from two independent random samples, one from population 1 and another from population 2. The two sample variances will be the basis for making inferences about the two population variances. Hypothesis Testing About a Population Variance 13 For each type of test, Example: Buyer’s Digest (C) Buyer’s Digest has conducted the same test, as described earlier, on another 10 thermostats, this time manufactured by TempKing. We will conduct a hypothesis test with α = 0.10 to see if the variances are equal for ThermoRite’s thermostats and TempKing’s thermostats. ThermoRite Sample Thermostat 1 2 3 4 5 6 7 8 9 10 Temperature 67.4 67.8 68.2 69.3 69.5 67.0 68.1 68.6 67.9 67.2 TempKing Sample Thermostat 1 2 3 4 5 6 7 8 9 10 Temperature 67.7 66.4 69.2 70.1 69.5 69.7 68.1 66.6 67.3 67.5 15 Hypotheses Rejection Rule The F distribution table shows that with α/2 = 0.05, 9 numerator df and 9 denominator df, F 0.05 = 3.18. We reject H0 if F ≥ 3.18 Test Statistic: TempKing’s sample variance is 1.768. ThermoRite’s sample variance is 0.7. Conclusion: Determining and using the p-Value Because 𝑭 = 2.53 is between 2.44 and 3.18, the area in the upper tail of the distribution is between 0.10 and 0.05. But this is a two-tailed test; after doubling the upper-tail area, the p-value is between 0.20 and 0.10. Because 𝜶 = 0.10, we have p-value > 𝜶 and therefore we cannot reject the null hypothesis. F Test Explanation and Chi Square Basics Arijit Mitra F Distribution and F Test for comparing two population Variances Genesis – See the diagram. Each single population’s test statistic is following a chi-square distribution (with DOF of n1 – 1) and (n2 – 1) respectively. i.e., and While comparing two population’s variances, we use the 𝑠12 test statistics and this follows an F distribution (with 𝑠22 DOF of n1 + n2 – 1) F value is always positive, because of the range and shape of the distribution, so in the table always positive values will be obtained In most of the cases, the hypotheses are about whether the ratio is 1 or not (we can always say either 1 or grater than 1, by interchanging the place of s1 and s2 (the greater one is generally at the numerator’s position and the less one at the denominator’s position) i.e., we can always check whether it is 1 or more than 1. So F test is a single tailed test and most of the cases it is an upper tailed test. Hypothesis Tests for more than two populations For comparing For comparing For comparing proportions Means variances Chi-Square test for Bartlett's test equality of ANOVA (Follows Chi-Square proportions distribution) NOT IN SYLLABUS Chi-Square test has other applications as well; today we will discuss various Chi-Square tests (including Chi-Square test for equality of proportions Various Chi-Square Tests (with Chi-Square distribution) I. Chi-square test for equality of population proportions for more than two populations H0 : p1 = p2 = p3 = ………. = pn and Ha : Not all population proportions are same In case the H0 is rejected, we go for pairwise treatment after the chi-square test, to determine which population proportion(s) is/are significantly different from the others (A follow up test, again with Chi-square distribution) Dof – (n – 1) Sales of cars of various brands, Example (Chapter 12, TB) II. Chi-Square test of independence i.e., whether a categorical variable (e.g., sales count of a particular product type) is independent of another categorical variable (e.g., region / gender / age, etc.) or not (A Non-parametric test in lieu of regression) H0 : Variable B (Categorical) is independent on Variable A (Categorical) Ha : Variable B (Categorical) is NOT independent on Variable A (Categorical) The procedure is same as the previous one, only the follow up part is not required here. The Dof is (r-1)*(c-1) where r is no. of rows (level of the A variable) and c is no. of columns (level of the B variable) Example – The type of beer and the Gender (TB Problem, Chapter – 12) III. Chi-Square test of Goodness of Fit (GOF) – Whether a particular hypothesized multinomial distribution is true of not H0 : pa : pb : pc = 3:5:2 Ha : pa : pb : pc does not follow the proportion of 3:5:2 Here although the procedure of more or less same as the other chi-square tests, the calculation is a bit different Dof : (n – 1) in case of n populations Sales of products of various companies, Example (Chapter 12, TB) Basics of Experimental Design and Analysis of Variance (Single Factor) Arijit Mitra An Introduction to Experimental Design and Analysis of Variance Statistical studies can be classified as being either experimental or observational. In an experimental study, one or more factors are controlled so that data can be obtained about how the factors influence the variables of interest. In an observational study, no attempt is made to control the factors. Cause-and-effect relationships are easier to establish in experimental studies than in observational studies. Analysis of variance (ANOVA) can be used to analyze the data obtained from experimental or observational studies. Three types of experimental designs A completely randomized design A randomized block design A factorial experiment A factor is a variable that the experimenter has selected for investigation. A treatment is a level of a factor. Experimental units are the objects of interest in the experiment. A completely randomized design is an experimental design in which the treatments are randomly assigned to the experimental units. 2 Analysis of Variance: A Conceptual Overview Analysis of Variance (ANOVA) can be used to test for the equality of three or more population means. Data obtained from observational or experimental studies can be used for the analysis. We want to use the sample results to test the following hypotheses: If 𝐻0 is rejected, we cannot conclude that all population means are different. Rejecting 𝐻0 means that at least two population means have different values. Assumptions for Analysis of Variance 1. For each population, the response (dependent) variable is normally distributed. ANOVA can be used for Both 2. The variance of the response variable, denoted σ2, is the same Experimental Study as well as for the observational study. for all of the populations. 3. The observations must be independent. However, the method for both of these cases are same. Testing for the Equality of 𝒌 Population Means: A Completely Randomized Design AutoShine, Inc. is considering marketing a long- lasting car wax. Three different waxes (Type 1, Type 2, and Type 3) have been developed. In order to test the durability of these waxes, 5 new cars were waxed with Type 1, 5 with Type 2, and 5 with Type 3. Each car was then repeatedly run through an automatic carwash until the wax coating showed signs of deterioration. The number of times each car went through the carwash before its wax deteriorated is shown on the next slide. AutoShine, Inc. must decide which wax to market. Are the three waxes equally effective? Factor... Car wax Treatments... Type I, Type 2, Type 3 Experimental units... Cars Response variable... Number of washes 4 Testing for the Equality of 𝒌 Population Means: A Completely Randomized Design Wax Wax Wax Observation Type 1 Type 2 Type 3 1 27 33 29 2 30 28 28 3 29 31 30 4 28 30 32 5 31 30 31 Sample Mean 29.0 30.4 30.0 Sample Variance 2.5 3.3 2.5 Testing for the Equality of 𝒌 Population Means: A Completely Randomized Design Mean Square Between Treatments: (Because the sample sizes are all equal) Mean Square Error: 6 Rejection Rule: Test Statistic: Conclusion: There is insufficient evidence to conclude that the mean number of washes for the three wax types are not all the same. Testing for the Equality of 𝑘 Population Means: A Completely Randomized Design (7 of 7) ANOVA Table Source of Sum of Degrees of Mean Squares F p-Value Variation Squares Freedom Treatments 5.2 2 2.60 0.939 0.42 Error 33.2 12 2.77 EM P TY C ELL EM P TY C ELL Total 38.4 14 EM P TY C ELL EM P TY C ELL EM P TY C ELL Testing for the Equality of 𝒌 Population Means: An Observational Study Example: Reed Manufacturing Janet Reed would like to know if there is any significant difference in the mean number of hours worked per week for the department managers at her three manufacturing plants (in Buffalo, Pittsburgh, and Detroit). An 𝐹 test will be conducted using α = 0.05. A simple random sample of five managers from each of the three plants was taken and the number of hours worked by each manager in the previous week is shown on the next slide. Factor... Manufacturing plant Treatments... Buffalo, Pittsburgh, Detroit Experimental units... Managers Response variable... Number of hours worked Observation Plant 1 Buffalo Plant 2 Pittsburgh Plant 3 Detroit 1 48 73 51 2 54 63 63 3 57 66 61 4 54 64 54 5 62 74 56 Sample Mean 55 68 57 Sample Variance 26.0 26.5 24.5 1. Develop the hypotheses. 2. Specify the level of significance. α = 0.05 3. Compute the value of the test statistic. ANOVA Table Source of Sum of Degrees of Mean F p- Variation Squares Freedom Square Value Treatment 490 2 245 9.55.0033 Error 308 12 25.667 EM P TY C ELL EM P TY C ELL Total 798 14 EM P TY C ELL EM P TY C ELL EM P TY C ELL p-value approach 4. Compute the p –value. With 2 numerator df and 12 denominator df, the p-value is 0.01 for 𝐹 = 6.93. Therefore, the p-value is less than 0.01 for 𝐹 = 9.55. 5. We can conclude that the mean number of hours worked per week by department managers is not the same at all 3 plants. Testing for the Equality of 𝑘 Population Means: An Observational Study (8 of 8) Critical Value Approach 4. Determine the critical value and rejection rule. 5. © 2020 Cengage. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a 13 password-protected website or school-approved learning management system for classroom use. Multiple Comparison Procedures Suppose that analysis of variance has provided statistical evidence to reject the null hypothesis of equal population means. Fisher’s least significant difference (LSD) procedure can be used to determine where the differences occur. Fisher’s LSD Procedure Hypotheses: Rejection Rule: Test Statistic: Fisher’s LSD Procedure Based on the Test Statistic 𝑥ҧ 𝑖 − 𝑥𝑗ҧ 15 Example: Reed Manufacturing Recall that Janet Reed wants to know if there is any significant difference in the mean number of hours worked per week for the department managers at her three manufacturing plants. Analysis of variance has provided statistical evidence to reject the null hypothesis of equal population means. Fisher’s least significant difference (LSD) procedure can be used to determine where the differences occur. Conclusion (A): The mean number of hours worked at Plant 1 is not equal to the mean number worked at Plant 2. Conclusion (B): There is no significant difference between the mean number of hours worked at Plant 1 and the mean number of hours worked at Plant 3. Conclusion (C): The mean number of hours worked at Plant 2 is not equal to the mean number worked at Plant 3