Inference of the mean PDF

Summary

This document provides a detailed explanation of inference for the mean of a population. It covers various aspects of statistical inference, including Z-tests and t-tests. The examples and detailed explanations make it suitable for postgraduate statistics courses.

Full Transcript

Inference for the Mean of a Population RECALL: Z tests for a population mean  To test the hypothesis H0: =0 based on a SRS of size n from a population with unknown mean  and known standard deviation , compute the test statistic: x  0 Z...

Inference for the Mean of a Population RECALL: Z tests for a population mean  To test the hypothesis H0: =0 based on a SRS of size n from a population with unknown mean  and known standard deviation , compute the test statistic: x  0 Z  n 3 January 2014 2 Example for the Population Mean Do middle aged male executives have different average blood pressure than the general population? The National Center for Health Statistics reports that the mean systolic blood pressure for males 35 to 44 years of age is 128 mg/100ml and the standard deviation in this population is 15 mg/100ml. The medical director of a company looks at 72 company executives in this age group and finds that the mean systolic blood pressure in this sample is 126.07 mg/100ml. Is this evidence that the executive blood pressure differs from the national average? 3 January 2014 3 Example for the Population Mean  Step 1: State your hypotheses H0: μ= μ0=128 mg/100ml Ha: μ128 mg/100ml  Step 2 – Calculate your test statistic We make the unrealistic assumption that the population standard deviation is known. x  0 126. 07  128 z    1.09  15 72 n 3 January 2014 4 Example for Population Mean  Step 3 – Calculate the p-value The probability that a standard normal variable Z takes a value at least 1.09 away from zero. P-value=2*P(Z  1.09) = 2* (0.5-0.3621)=0.2758 This means that 27.6% of the time a SRS of size 72 from the general male population would have a mean blood pressure at least as far from 128 mg/100ml as that of the executive sample. 3 January 2014 5 Example for Population mean 3 January 2014 6 Example for Population Mean  Step 4- Make your conclusion At a significance level of 0.05 we would fail to reject H0 and concluded that the data do not provide enough evidence to conclude that the mean blood pressure of executives is different from 128 mg/100ml. 3 January 2014 7 The t- t-Distribution  Both confidence intervals and tests of significance for the mean  of a normal population are based on the sample mean x.  The sampling distribution of x depends on .   is either known or it estimated using the sample standard deviation s.  Setting: SRS of size n from a normally distributed population with mean  and standard deviation . This is based on the results of the CLT:  2 x ~ N ( , n) 3 January 2014 8 Test Statistics  The standardized sample mean or the one- sample z statistic, when  is known: x z ~ N (0,1)  n  When we substitute the sample standard deviation we get: x t  ~ t ( n 1 ) s n 3 January 2014 9 One--sample t- One t-test  An SRS is drawn from a population having unknown mean . Test the hypothesis: H0: = 0  Test statistic: x   t  s n 3 January 2014 10 One sample t- t-test  The random variable T has a t(n-1) distribution so the p-value for the test is:  If :  Ha:  > 0 p-value=P(Tt)  Ha:  < 0 p-value=P(Tt)   Ha:   0 p-value=2*P(T  |t|)  These P-values are exact if the population distribution is normal and are approximately correct for large n in other cases. 3 January 2014 11 Rejection Region  From the table for the t-distribution with (n-1) degrees of freedom and the choice of  the rejection region is determined by: For a one-sided test use the column corresponding to an upper tail area of 0.05  t-tabulated value for Ha:  < 0  t tabulated value for Ha:  > 0 For a two-sided test use the column corresponding to an upper tail of 0.025:  t-tabulated value OR t  tabulated value 3 January 2014 12 Example for 1-sample t- t-test  The following data are amounts of vitamin C (mg/100g) for a random sample corn soy blend (population is normally distributed). 26 31 23 22 11 22 14 31  The specifications are designed to produce a mean vitamin C content of 40 mg/100 g. Test the null hypothesis that the mean vitamin C content of the production run from which we got or sample conforms to these specifications.  We are told that: x  22.5 and s  7.19 3 January 2014 13 Answer: 1. Ho: =40 mg/100g Ha: 40 mg/100g 2. Calculate the test statistic: x   0 22.5  40 t   6.88 s 7.2 8 n 3. This test statistics has the t(7) distribution. Need to calculate the p-value: 2*P(T6.88) =0.00024=2*TDIST(6.88,7,1)=TDIST(6.88,7,2) 4. Therefore we reject the null hypothesis based on an - level of 0.05 and conclude that the vitamin C content for this run does not meet the specifications. 3 January 2014 14 Matched Pairs t Procedures  One common comparative design is the matched- pairs study where subjects are matched in pairs and are compared within each pair.  One example of matched pairs is before-and-after observations on the same subjects.  In this setting the variable that we are measuring on the subjects is a continuous measure. We looked at the design where the outcome was dichotomous. 3 January 2014 15 Matched Pairs t- t-procedure  With large sample size and assuming that the null hypothesis of no difference is true the mean d of these differences is distributed as normal with mean and variance given by: d  0 2  d  d  n  Since we do not know the variance this has to be estimated from our data by the sample variance. d 0  Therefore our test statistic becomes: t ~ t (n  1) sd n 3 January 2014 16 Example-- Matched Pairs Example  20 teachers were tested for their understanding of French before and after a 4 week immersion program. We want to test if the program improved the teacher’s comprehension of spoken French.  We are told the following: The average difference in scores is 2.5 The sample standard deviation of the differences in the scores is 2.893. 3 January 2014 17 Answer for Matched- Matched-Pairs Example 1. H0: d= 0 Ha: d>0 2. Calculate the test statistic: d 0 2.5  0 t   3.86 sd 2.893 20 n 3. This test statistics has the t(19) distribution. Need to calculate the p-value: P(T3.86) = tdist(3.86,19,1)=0.00053 4. Therefore we reject the null hypothesis based on an - level of 0.05 and conclude that there is strong evidence that the program improved comprehension. 3 January 2014 18 Comparing Two Means  A common goal of inference is to compare the responses in two groups.  Each group is considered to be a sample from a distinct population.  The responses in each group are independent of those in the other group.  The sample sizes in the two groups need not be the same. 3 January 2014 19 Two sample Problem  We have two independent samples from 2 distinct populations and the same continuous variable is measured for both samples. Standard Population Variable Mean Deviation 1 x1 1 1 2 x2 2 2 3 January 2014 20 Sample Notation  Inference is based on 2 independent SRS, one from each population. Sample Sample Sample Population Standard Size Mean Deviation 1 n1 x1 s1 2 n2 x2 s2 3 January 2014 21 2-sample z-test (population Standard Deviations are known)  Weuse x1  x2 to estimate 1  2  The sampling distribution of x1  x2 is: mean: 1  2  12  22 Variance:  n1 n2  Ifthe two population distributions are both normal, then the distribution of x1  x2 is also normal. 3 January 2014 22 Two--sample t- Two t-test  Hypotheses for comparing two population means:  One-tailed test: Two-tailed test: H0 : (1  2 )  Do H 0 : ( 1  2 )  Do Ha : (1  2 )  Do H a : ( 1  2 )  Do OR H0 : (1  2 )  Do Ha : (1  2 )  Do 3 January 2014 23 2-sample z statistic  Suppose that x1is the mean of a SRS of size n1 2 drawn from an N (1 1 ) population and that x2 is ,  the mean of a SRS of size n2 drawn from an N(2,22 ) population. Then the 2-sample a statistic is: ( x1  x 2 )  D o z  12  22  n1 n2  Has the N(0,1) sampling distribution 3 January 2014 24 T procedures  If the population standard deviations are not known we estimate them by the sample standard deviations from our two samples.  To simplify the test we will assume that the two normal population distributions have the same standard deviations so we use a pooled standard error in the test statistic. s 2  (n1  1) s12  (n2  1) s22 p n1  n2  2 and ( x1  x2 )  Do t with t (n1  n2  2) distribution 2 1 1 s p     n1 n2  3 January 2014 25 Rejection Region  From the table for the t-distribution with (n-1) degrees of freedom and the choice of  the rejection region is determined by: For a one-sided test use the column corresponding to an upper tail area of 0.05 and Ho is rejected if:  t -tabulated value for Ha: 1< 2  t  tabulated value for Ha: 1 > 2 For a two-sided test use the column corresponding to an upper tail of 0.025 and H0 is rejected if:  t -tabulated value OR t tabulated value 3 January 2014 26 2-sample T- T-test Example  Independent random samples selected from two normal populations produced the sample means and standard deviations shown in the table: Sample 1 Sample 2 Sample size 17 12 Mean 5.4 7.9 Sample Standard deviation 3.4 4.8  Test the null hypothesis that the population means are equal vs. the alternative that they are not equal. 3 January 2014 27 Answer: 1. H0: 1- 2 = 0 Ha: 1- 2  0 2. Calculate the test statistic: 2 2 2 2 2 ( n1  1) s1  (n2  1) s2 (17  1)(3.4 )  (12  1)(4.8 ) sp    16.24 n1  n2  2 17  12  2 and ( x1  x2 )  Do (5.4  7.9)  0 t   1.645 2 1 1 1 1 s p    16.24    n1 n2   17 12  3 January 2014 28 Answer 2. This test statistic follows the t-distribution with 27 degrees of freedom. 3. P-value=2*P(T>1.645) = 2*TDIST(1.645,27,1)=TDIST(1.645,27,2)=0.112 - critical value for rejection region=2.052 4. Therefore we fail to reject the null hypothesis based on an -level of 0.05 and conclude that the two population means are not different. 3 January 2014 29 Two--Sample t- Two t-test Example  Independent random samples from approximately normal populations produced the results shown in the table. Do the data provide sufficient evidence to conclude that (2-1)>10? Sample 1 Sample 2 52 33 42 44 52 43 47 56 41 50 44 51 62 53 61 50 45 38 37 40 56 52 53 60 44 50 43 50 48 60 55 3 January 2014 30 Answer 1. H0: 2- 1= 10 and Ha: 2- 1 >10 2. Calculate the Test statistic: x1  43.6 x2  53.63 s1  5.47 s2  5.41 2 2 2 2 ( n  1) s  ( n  1) s (15  1)(5. 47 )  (16  1)(5. 41 ) s 2p  1 1 2 2   29.58 n1  n2  2 15  16  2 and ( x2  x1 )  Do (53.63  43.6)  10 t   0.015 2 1 1 1 1 s p    29.58    n1 n2   15 16  3 January 2014 31 Answer 2. This test statistic follows the t-distribution with 29 degrees of freedom. 3. P-value=P(T>0.015) =TDIST(0.015,29,1)=0.49 - critical value for rejection region=1.697 4. Therefore we fail to reject the null hypothesis based on an -level of 0.05 and conclude that the two population means are not different. 3 January 2014 32 Analysis of Variance  Toassess the equality of several population means we compare the variation among the means of several groups with the variation within groups.  This method is called Analysis of Variance. 3 January 2014 33 Examples of Data for ANOVA  A medical researcher wants to compare the effectiveness of 3 different treatments to lower the cholesterol of patients with high blood cholesterol levels. He assigns 60 individuals at random to the 3 treatments and records the reduction in cholesterol for each patient.  An ecologist is interested in comparing the concentration of the pollutant cadmium in 5 streams. She collects 50 water specimens from each stream and measures the concentration of cadmium in each specimen. 3 January 2014 34 Comparing Means  For the cholesterol example the mean reduction in each treatment group was: 0.2 1.5 0.8  Is the observed difference the result of chance variation?  Would not expect the sample means to be equal even if the population means are identical.  To answer this we need to know the variation within the groups under observation and the sizes of the samples. 3 January 2014 35 Hypotheses  The null and alternative hypotheses for the one-way ANOVA are: H o : 1   2   3 ....   k  Ha: at least one µi is different than some other µj 3 January 2014 36 ANOVA Table Source of Variation SS df MS F-stat P-value Between Groups SSB k-1 MSB=SSB/(k--1) MSB=SSB/(k MSB p MSW Within Samples SSW n-k MSW=SSW/(n MSW=SSW/(n--k) TOTAL SST n-1 Where: k=# of groups n=total number of subjects (all groups combined) 3 January 2014 37

Use Quizgecko on...
Browser
Browser