Lecture 5: Sampling Distribution Theory PDF
Document Details
Uploaded by TidyColumbus7402
HKU Business School
Ping Yu
Tags
Summary
This document details lecture notes on sampling distribution theory, focusing on sampling from a population and sampling distributions of sample means, proportions, and variances. The lecture slides also discuss properties of point estimators.
Full Transcript
Lecture 5. Sampling Distribution Theory (Chapter 6) Ping Yu HKU Business School The University of Hong Kong Ping Yu (HKU) Sampling Distribution Theory 1 / 49...
Lecture 5. Sampling Distribution Theory (Chapter 6) Ping Yu HKU Business School The University of Hong Kong Ping Yu (HKU) Sampling Distribution Theory 1 / 49 Plan of This Lecture Sampling from a Population Sampling Distributions of Sample Means Sampling Distributions of Sample Proportions Sampling Distributions of Sample Variances Properties of Point Estimators (Section 7.1) Ping Yu (HKU) Sampling Distribution Theory 2 / 49 [Review] Descriptive and Inferential Statistics Descriptive statistics: collecting, presenting, and describing data. Inferential statistics: drawing conclusions and/or making decisions concerning a population based only on sample data. [figure here] - Estimation, e.g., estimate the population mean weight using the sample mean weight. - Hypothesis Testing, e.g., use sample evidence to test the claim that the population mean weight is 120 pounds. Ping Yu (HKU) Sampling Distribution Theory 3 / 49 Sampling from a Population Sampling from a Population Ping Yu (HKU) Sampling Distribution Theory 4 / 49 Sampling from a Population Population and Simple Random Sample Statistical analysis requires that we obtain a proper sample from a population of items of interest that have measured characteristics. Recall that a population means all (say, N) items of interest. - If N is large enough, N can be treated as ∞. - A population is generated by a process that can be modeled as a series of random experiments (see Lecture 2). A (simple) random sample is a sample of n objects drawn randomly. - Recall the definition of random sampling in Lecture 1. Random sampling with replacement means drawing a member from the population by chance (i.e., with probability 1/N), putting it back to the population, and then independently drawing the next one. - This is the random sampling in Lecture 1. Ping Yu (HKU) Sampling Distribution Theory 5 / 49 Sampling from a Population continue Random sampling without replacement means randomly drawing each group of n distinct items with probability 1/CnN , which seems easier in practice. - The first item is sampled with probability 1/N; conditional on the first item was chosen, the second item is sampled with probability 1/ (N 1), etc. Why Sample? Less time consuming than a census. Less costly to administer than a census. It is possible to obtain statistical results of a sufficiently high precision based on samples. Samples can be obtained from a table of random numbers or computer random number generators. Ping Yu (HKU) Sampling Distribution Theory 6 / 49 Sampling from a Population Sampling Distributions The randomness of a random sample comes from the random drawing, i.e., not all items are drawn (n < N) so the identities of the random sample are not determined beforehand. Let X be the population r.v. taking each value in fxi gN i =1 with probability 1/N, and fxi gni=1 be a random sample.1 The population mean µ = E [X ] = N1 ∑N 1 n i =1 xi , and the sample mean x̄ = n ∑i =1 xi is a natural estimator of µ. h i The population variance σ 2 = E (X µ )2 = N1 ∑N i =1 (xi µ )2 , and the sample variance s2 = n 1 1 ∑ni=1 (xi x̄ )2 is a natural estimator of σ 2. [The reason of n 1 instead of n will be explained below] - The population counterpart of s2 should be S 2 = N 1 1 ∑N i =1 (xi µ )2. p - The sample standard deviation is s = s2. The sampling distribution of a statistic such as the sample mean and sample variance is the probability distribution obtained from all possible samples of the same number of observations drawn from the population. 1 The textbook uses fXi gni=1 to emphasize the randomness of xi , but the notations are not consistent. In my lectures, you can tell from the context whether fxi gni=1 are random or just realizations. Ping Yu (HKU) Sampling Distribution Theory 7 / 49 Sampling from a Population Development of a Sampling Distribution Assume there is a population Population size N = 4. Random variable, X , is age of individuals. Values of X : 18, 20, 22, 24 (years). Ping Yu (HKU) Sampling Distribution Theory 8 / 49 Sampling from a Population continue In this example the Population Distribution is uniform: Ping Yu (HKU) Sampling Distribution Theory 9 / 49 Sampling from a Population continue Now consider all possible samples of size n = 2: Ping Yu (HKU) Sampling Distribution Theory 10 / 49 Sampling from a Population continue Sampling Distribution of All Sample Means: - Notation: P X̄ should be p (x̄ ) in our notations. When N is large, it is impossible to list all possible outcomes, so the abstract analysis in the next section is helpful. Ping Yu (HKU) Sampling Distribution Theory 11 / 49 Sampling from a Population Comparing the Population with Its Sampling Distribution Population Sample Means Distribution N =4 n=2 µ = 21, σ = 2.236 µ x̄ = 21, σ x̄ = 1.58 q x µ 2 = 21, and σ = ∑i =1 (Ni ) = 2.236. N ∑ xi 18+20+22+24 µ= N = 4 µ x̄ = E [x̄ ] = ∑Nx̄i = 18+19+16 21+ +24 = 21 = µ, and q 2 q (18 21) +(19 21)2 + +(24 21)2 2 σ x̄ = ∑i =1 Ni x̄ = N (x̄ µ ) 16 = 1.58 = 2.236 p = pσ. n 2 Ping Yu (HKU) Sampling Distribution Theory 12 / 49 Sampling Distributions of Sample Means Sampling Distributions of Sample Means Ping Yu (HKU) Sampling Distribution Theory 13 / 49 Sampling Distributions of Sample Means Mean of the Sample Means For random sampling with replacement, 1 nµ E [x̄ ] = E (x + + xn ) = = µ. n 1 n - In random sampling with replacement, xi (for each i) and X have the same N distribution because xi takes each value in xj j =1 with probability 1/N, which is exactly the distribution of X. [check the N = 4 and n = 2 example above] - This means that if we draw n samples repeatedly, and for each draw we calculate x̄, then the average of these x̄’s is the population mean. - A particular x̄ value can be considerably far from µ. - Here, xi is treated as a r.v. rather than a realization. (*) For random sampling without replacement, N ! 1 1 Cn n 1 1 N N 1 N n CnN j∑ ∑ ∑ xi = N ∑ xi = µ, j 1 E [x̄ ] = xi = C =1 i =1 n CnN n 1 i =1 i =1 j where xi is the ith draw in the jth sampling, and the second equality is from nCnN = NCnN 11 (intuition? for rigorous analysis, see the next2 slide). Ping Yu (HKU) Sampling Distribution Theory 14 / 49 Sampling Distributions of Sample Means Variance of the Sample Means For random sampling with replacement, n 2 1 1 1 nσ 2 σ2 Var (x̄ ) = Var x + + xn = ∑ σ 2i = =. n 1 n i =1 n n2 n - Var (x̄ ) decreases with n, i.e., larger sample sizes result in more concentrated sampling distributions. - Denote Var (x̄ ) as σ 2x̄ ; then the standard deviation of x̄ is σ x̄ = pσn. (*) For random sampling without replacement, σ2 N n S2 N n 1 1 Var (x̄ ) = = = S2. n N 1 n N n N - Why? the variances of a hypergeometric distribution and a binomial distribution are np (1 p ) N n N n N 1 and np (1 p ), respectively. The difference term N 1 appears due to the same reason as here. [for more details, see the next slide] Ping Yu (HKU) Sampling Distribution Theory 15 / 49 Sampling Distributions of Sample Means (**) Rigorous Analysis for Random Sampling Without Replacement Note that x̄ = n1 ∑N i =1 Ri xi , where Ri ’s are exchangeable Bernoulli r.v.’s, and ∑N i =1 R i = n, so 1 n! (N n)! P ((R1 , , RN ) = (r1 , , rN )) = = CnN N! for any (r1 , , rN ) such that ∑N i =1 ri = n. Conditional on n, E [Ri ] = Nn because ∑N i =1 E [Ri ] = N E [Ri ] = n, and Var (Ri ) = Nn 1 Nn. However, Ri ’s are not independent. To study their covariances, note that for 1 j < k N, ! N 0 = Var ∑ Ri = N Var (Ri ) + N (N 1) Cov Rj , Rk , i =1 so Var (Ri ) n n Cov Rj , Rk = = 1 < 0, N 1 N (N 1) N and n (n 1) E Rj Rk = Cov Rj , Rk + E Rj E [Rk ] = , (1) N (N 1) where E Rj Rk will be used in the future. Ping Yu (HKU) Sampling Distribution Theory 16 / 49 Sampling Distributions of Sample Means (**) continue First, 1 N 1 N E [x̄ ] = ∑ n i =1 E [Ri ] xi = N i∑ xi = µ. =1 After some calculation, we can show N N N 2 NS 2 = ∑ xi2 N 1∑ ∑ xi xj. (2) i =1 i =1 j =i +1 Therefore, 1 N 2 N N Var (x̄ ) = 2 ∑ n i =1 Var (Ri ) xi2 + 2 ∑ ∑ Cov Ri , Rj xi xj n i =1 j =i +1 ! N 1 2 N N Var (R1 ) ∑ xi n 1 i∑ ∑ xi xj 2 = n2 i =1 =1 j =i +1 1 2 S2 n = Var ( R 1 ) NS = 1. n2 n N Ping Yu (HKU) Sampling Distribution Theory 17 / 49 Sampling Distributions of Sample Means Finite Population Correction Factor N n N 1 is often called a finite population correction factor. - When N is large, the differences between the two random sampling schemes can be neglected: N N 1 ! 1 as N ! ∞ and N ! 0. n n - In business applications such as auditing, N is indeed not large. - n rather than the fraction of the sample in the population, Nn , is the dominant factor of Var (x̄ ). Without special mention, we always mean random sampling with replacement or without replacement but N is large enough and n is a small proportion of N. Ping Yu (HKU) Sampling Distribution Theory 18 / 49 Sampling Distributions of Sample Means Sampling Distribution of the Sample Means If the population follows the normal distribution, then x̄ follows a normal 2 distribution N µ, σn since it is a linear combination of xi ’s which follow the same normal distribution as the population. - Implicitly, N = ∞ because the normal distribution is continuous. - Recall that a normal distribution is determined only by its mean and variance. - Both X and x̄ follow the normal distribution with the same mean, but x̄ has a smaller variance (more so as n gets larger). [figure here] The standardized normal random variable x̄ E [x̄ ] x̄ µ z= = p N (0, 1). (3) σ x̄ σ/ n Terminology: the standard error (SE) of a statistic (usually an estimate of a parameter) is the standard deviation of its sampling distribution or an estimate of that standard deviation. p p - In our case, σ / n and s/ n are both called the SE of x̄. - Usually, only the latter is called the SE of x̄ because it is feasible and the former already has a name – standard deviation. Ping Yu (HKU) Sampling Distribution Theory 19 / 49 Sampling Distributions of Sample Means Ping Yu (HKU) Sampling Distribution Theory 20 / 49 Sampling Distributions of Sample Means Example 6.3: Spark Plug Life A spark plug manufacturer claims that the lives of its plugs follow N 60, 000, 40002. If we observed that the sample mean of a random sample of size 16 is 58,500 miles. Do you think the manufacturer’s claim is credible? Since x̄ µ 58, 500 60, 000 P (x̄ 58, 500) = P p p = P (z 1.50) =.0668, σ/ n 4000/ 16 which is quite small, so the claim of the manufacturer is skeptical. Figure: (a) P (x̄ 58, 500); (b) P (z 1.50) Ping Yu (HKU) Sampling Distribution Theory 21 / 49 Sampling Distributions of Sample Means The Law of Large Numbers (LLN) Without normality, what is the distribution of x̄? When n is fixed, there is no tractable description in general, while when n is large, we can say something. First, the distribution of x̄ will degenerate at µ. LLN: If xi , i = 1, , n, are independent and identically distributed (i.i.d.) with mean µ (as in a random sample with replacement), then x̄ approaches µ as n ! ∞. - Not N ! ∞. - Only requires E [xi ] = µ < ∞, regardless of what the distribution of xi is. - This is different from E [x̄ ] = µ (which fixes n and repeatedly samples fxi gni=1 ): it claims that if for any random sample fxi g∞ i =1 , we calculate x̄ for the first n samples to obtain a sequence of numbers, say x̄n , then x̄n ! µ as n ! ∞. N - Intuitively, µ = N1 ∑N i =1 xi involves all values of fxi gi =1 ; in E [x̄ ] = µ, although n is fixed, we repeatedly sampled so that all values of fxi gni=1 would be sampled; in x̄n ! µ, by letting n ! ∞, we potentially sampled all values fxi gN i =1. (*) Rigorously, "approach µ" means "is consistent to µ", where consistency is defined in the Appendix of Chapter 7, Page 330. Jacob Bernoulli proved the first LLN with fxi gni=1 being Bernoulli trials (i.e., iid xi Bernoulli(p )); the current form of LLN is attributed to Khinchin [figure here], so is called the Khinchin LLN. Ping Yu (HKU) Sampling Distribution Theory 22 / 49 Sampling Distributions of Sample Means History of the LLN Aleksandr Khinchin (1894-1959), Moscow State University Ping Yu (HKU) Sampling Distribution Theory 23 / 49 Sampling Distributions of Sample Means The Central Limit Theorem (CLT) CLT: If xi , i = 1, , n, are i.i.d. with mean µ and variance σ 2 , then the distribution of z = σx̄/pµn approaches that of N (0, 1) as n ! ∞. - The result of CLT is stronger than that of LLN since it not only claims that x̄ approaches µ, but also claims that the variance of x̄ approaches σ 2 /n (which is expected), and the standardized x̄ is eventually bell-shaped as n ! ∞ (which is surprising). That is, the distribution p of x̄ not only degenerates at µ, but degenerates to µ in the rate n and in the bell shape. - Require Var (xi ) = σ 2 < ∞ besides E [xi ] = µ < ∞, i.e., a stronger result need stronger assumptions. - It does not require xi to be normally distributed. [figure here] - Intuitively, when n is large enough, the claim for the normally distributed xi in (3) is roughly correct, or x̄ N ( µ, σ 2 /n). - How large n is required for satisfactory approximation? If xi is symmetrically distributed, then n = 20 to 25 is enough; otherwise, n needs to be much larger, e.g., > 50. The De Moivre-Laplace theorem is a special case of the CLT with fxi gni=1 being Bernoulli trials; the current form of CLT is attributed to Lindeberg and Lévy [figure here], so is called the Lindeberg-Lévy CLT. - As mentioned in the De Moivre-Laplace theorem, we can use a continuous r.v. to approximate a discrete r.v. when n is large. Ping Yu (HKU) Sampling Distribution Theory 24 / 49 Sampling Distributions of Sample Means As n increases, the sampling distribution of x̄ The Sampling p Distribution becomes almost normal of n (x̄ µ ) /σ regardless of shape of Compared with N (0, 1); population (n = 1) xi is discrete p p Intuition: n ! ∞, x̄ µ ! 0, but n (x̄ µ ) will not diverge or degenerate! Ping Yu (HKU) Sampling Distribution Theory 25 / 49 Sampling Distributions of Sample Means History of the CLT Jarl W. Lindeberg (1876-1932), Paul P. Lévy (1886-1971), University of Helsinki École Polytechnique Ping Yu (HKU) Sampling Distribution Theory 26 / 49 Sampling Distributions of Sample Means [Example] Applying CLT Suppose a large population has mean µ = 8 and σ = 3. Suppose a random sample of size n = 36 is selected. What is the probability that the sample mean is between 7.8 and 8.2? Solution: Even if the population is not normally distributed, the central limit theorem can be used (n > 25), so the sampling distribution of x̄ is approximately 2 32 N µ, σn = N 8, 36. As a result, 7.8 µ x̄ µ 8.2 µ P (7.8 < x̄ < 8.2) = P p < p < p σ/ n σ/ n σ/ n = P ( 0.4 < z < 0.4) = 0.3108. Ping Yu (HKU) Sampling Distribution Theory 27 / 49 Sampling Distributions of Sample Means (**) CLT for Random Sampling Without Replacement CLT for finite polulations: Consider a finite population fxN1 , , xNN g with N units. Define 1 N N i∑ µN = xNi , =1 N 1 1 ∑ Ni 2 SN = (x µ N )2 N i =1 and mN = max (xNi µ N )2. 1 i N Assume that 1 mN 2 !0 (4) min (n, N n) SN as N ! ∞. Then with x̄ = ∑ni=1 Ri xNi we have x̄ µN d r ! N (0, 1). 2 1 1 SN n N Ping Yu (HKU) Sampling Distribution Theory 28 / 49 Sampling Distributions of Sample Means (**) Discussion Condition (4) implies n ! ∞ and N n ! ∞ because mN SN2 1 N/(N 1) =1 1 N. Suppose limN!∞ Nn =: α 2 [0, 1], and we scale xNi as xNi /SN such that 1. 2 = SN Then when α = 0, (4) is equivalent to mN /n ! 0; when α 2 (0, 1), (4) is equivalent to mN /N ! 0; when α = 1, (4) is equivalent to mN / (N n) ! 0. When supi jxNi j c < ∞, then (4) is equivalent to n ! ∞ and N n ! ∞. When α 2 (0, 1), and limN!∞ SN 2 = S 2 , then ∞ d 2 x̄ µ N ! N 0, (1 α ) S∞. 2 by We can estimate SN N 1 1 ∑ i Ni 2 ŜN = R (x x̄ )2 , n i =1 which is shown to be unbiased below. 2 /S 2 p Actually, ŜN N ! 1 as N ! ∞ under condition (4). So x̄ µN d r ! N (0, 1). 2 1 1 ŜN n N Ping Yu (HKU) Sampling Distribution Theory 29 / 49 Sampling Distributions of Sample Proportions Sampling Distributions of Sample Proportions Ping Yu (HKU) Sampling Distribution Theory 30 / 49 Sampling Distributions of Sample Proportions Sampling Distribution of the Sample Proportion Everything is the same as in the last section except that xi can only take 0 or 1 and follows the Bernoulli(p ) distribution. Now, X := ∑ni=1 xi Binomial(n, p ), and the sample proportion X p̂ =. n q p (1 p ) E [p̂ ] = p, and σ p̂ = n , where recall that Var (xi ) = p (1 p ) is a function of only p - its mean. As n ! ∞, p̂ approaches p by the LLN, and p̂ p z= σ p̂ approaches N (0, 1) by the CLT. [figure here] of normality is good if np (1 p ) > 5.2 - Recall that the approximationp - Note that X np = σ p̂ nz = np (1 p )z N (0, np (1 p )), where np (1 p ) ! ∞, so the difference between the observed number of success and the expected number of success might increase with n. 2 Since p (1 p ) 14 , n > 20. Ping Yu (HKU) Sampling Distribution Theory 31 / 49 Sampling Distributions of Sample Proportions Figure: Density for p̂ with p = 0.80 σ p̂ decreases with n, and p̂ is approximately normally distributed. Ping Yu (HKU) Sampling Distribution Theory 32 / 49 Sampling Distributions of Sample Proportions Example 6.8: Business Course Selection Suppose 43% of business graduates believe that a course in business ethics is very important. What is the probability of more than half of a random sample of 80 business graduates have this belief? Solution: Given that r r p (1 p ) 0.43 (1 0.43) σ p̂ = = = 0.055, n 80 we have p̂ p 0.5 0.43 P (p̂ > 0.5) = P > σ p̂ 0.055 = P (z > 1.27) = 1 Φ (1.27) = 0.102, where z N (0, 1) by the CLT. Ping Yu (HKU) Sampling Distribution Theory 33 / 49 Sampling Distributions of Sample Variances Sampling Distributions of Sample Variances Ping Yu (HKU) Sampling Distribution Theory 34 / 49 Sampling Distributions of Sample Variances Sampling Distribution of the Sample Variance Variance is important nowadays because consumers care about whether the particular item they bought works. Also, a smaller population variance reduces the variance of the sample mean: recall that σ 2x̄ = σ 2 /n, where we assume random sampling with replacement or N is large. Recall that s2 = n 1 1 ∑ni=1 (xi x̄ )2 is a natural estimator of σ 2. h i E s2 = σ 2 , and if the population r.v. X is normally distributed, then 2σ 4 Var s2 = , n 1 and 2 1) s 2 (n ∑ni=1 (xi x̄ ) χ2 = = χ 2n 1 , [proof not required] σ2 σ2 the chi-square distribution with (n 1) degrees of freedom (df). [see the next slide for the definition of the chi-square distribution] Different from the CLT for the sample mean, the chi-square result is sensitive (i.e., not robust) to the normality assumption. Ping Yu (HKU) Sampling Distribution Theory 35 / 49 Sampling Distributions of Sample Variances χ 2 -Distribution If Z1 , , Zv are i.i.d. such that Zi N (0, 1), i = 1, , v , then X = ∑i =1 Zi2 χ 2v. ν Figure: Density of the χ 2v Distribution with v = 4, 6, 8 The χ 2 distribution can only take positive values (thinking of ∑vi=1 Zi2 > 0 and s2 > 0). [Appendix Table 7 contains chi-square probabilities] Ping Yu (HKU) Sampling Distribution Theory 36 / 49 Sampling Distributions of Sample Variances History of the χ 2 Distribution Friedrich R. Helmert (1843-1917), University of Berlin The χ 2 distribution was first described by Friedrich Robert Helmert in papers of 1875-6, and was independently rediscovered by Karl Pearson in 1900. Ping Yu (HKU) Sampling Distribution Theory 37 / 49 Sampling Distributions of Sample Variances Mean and Variance of the Sample Variance h i E χ 2v = v and Var χ 2v = 2v increase with v [refer to the figure in the last slide]. h i h i - Why? Var (Zi ) = E Zi2 E [Zi ]2 implies E Zi2 = E [Zi ]2 + Var (Zi ) = 02 + 1 h i h i2 = 1, and Var Zi2 = E Zi4 E Zi2 = 3 12 = 2. h i h i (n 1)s 2 So E σ2 = n 1 implies E s2 = σ 2. h i (n 1)s 2 - E s2 = σ 2 even if X is not normally distributed, i.e., σ 2 χ 2n 1. [(*) see the next slide which follows Appendix 3 of Chapter 6, Page 287] (n 1)s 2 (n 1)2 2(n 1)σ 4 Also, Var σ2 = σ4 Var s 2 = 2 (n 1) implies Var s2 = (n 1)2 2σ 4 = n 1, decreasing in n as in Var (x̄ ). (*) Why lose one df in ∑ni=1 (xi x̄ )2 ? Because the n values f(xi x̄ )gni=1 have only (n 1) "independent" or "free" values: if we know f(xi x̄ )gni =11 , then (xn x̄ ) = ∑ni =11 (xi x̄ ) because ∑ni=1 (xi x̄ ) = ∑ni=1 xi ∑ni=1 x̄ = nx̄ nx̄ = 0. - The df of f(xi µ )gni=1 is n, so we lose one df when we estimate µ by x̄. [see more details in the next slide] - In general, the number of df lost equals the number of parameters estimated. Ping Yu (HKU) Sampling Distribution Theory 38 / 49 Sampling Distributions of Sample Variances (*) The Mean of s2 Without Normality Note that 2 2 ∑ni=1 (hxi x̄ ) = ∑ni=1 [(xi µ ) (x̄ µ )] i = ∑ni=1 (xi µ ) 2 (x̄ µ ) (xi µ ) + (x̄ µ )2 2 = ∑ni=1 (xi µ )2 2 (x̄ µ ) ∑ni=1 (xi µ ) + ∑ni=1 (x̄ µ )2 = ∑ni=1 (xi µ )2 2n (x̄ µ )2 + n (x̄ µ )2 = ∑ni=1 (xi µ )2 n (x̄ µ )2 , so h i h i h i 2 E ∑ni=1 (xi x̄ )2 = E ∑ni=1 (xi µ )2 nE (x̄ µ )2 = nσ 2 n σn = (n 1) σ 2. h i h i As a result, E s2 = E n 1 1 ∑ni=1 (xi x̄ )2 = 1 n 1 (n 1) σ 2 = σ 2. We lose one df in ∑ni=1 (xi x̄ )2 because of the extra term n (x̄ µ )2 since 2 2 ∑ni=1 (xi µ ) n xi µ n σ2 = ∑ σ = ∑ zi2 χ 2n , i =1 i =1 xi µ where zi = σ is the z-score of xi and follows N (0, 1). Ping Yu (HKU) Sampling Distribution Theory 39 / 49 Sampling Distributions of Sample Variances [Example] Applying the χ 2 Distribution A commercial freezer must hold a selected temperature with little variation. Specifications call for a standard deviation of no more than 4 degrees. A sample of 14 freezers is to be tested. What is the upper limit (K ) for the sample variance such that the probability of exceeding this limit, given that the population standard deviation is 4, is less than 0.05? Ping Yu (HKU) Sampling Distribution Theory 40 / 49 Sampling Distributions of Sample Variances continue Solution: The our target is to find K such that P s2 > K = 0.05, which implies ! (n 1) s 2 (n 1) K 13K P > = P χ 213 > = 0.05, σ2 σ2 16 so 13K 22.36 16 = 22.36 =) K = = 27.25. 16 13 If s2 from the sample of size n = 14 is greater than 27.52, there is strong evidence to suggest the population variance exceeds 16. Ping Yu (HKU) Sampling Distribution Theory 41 / 49 Sampling Distributions of Sample Variances (**) Further Results h i In random sampling without replacement, E s2 = S 2 = NN 1 σ 2. [see the next slide for details] 2 So in random sampling with replacement, an unbiased estimator of Var (x̄ ) = σn is s2 , n and in random sampling without replacement, an unbiased estimator of S2 N n Var (x̄ ) = n N is s2 N n 1 1 = s2 , n N n N where unbiasedness will be defined in the next section. In random sampling with replacement and X is not normally distributed, µ4 n 3 Var s2 = σ 4 , [exercise ] n n (n 1) 4 which reduces to n2σ 1 when X is normally distributed because the fourth central h i moment µ 4 := E (X µ )4 = 3σ 4 now.3 3 3σ 4 (n 3)σ 4 2nσ 4 2σ 4 n n (n 1) = 3(n n1(n) (1n) 3) σ 4 = n (n 1) = n 1. Ping Yu (HKU) Sampling Distribution Theory 42 / 49 Sampling Distributions of Sample Variances (**) Mean of s2 in Random Sampling Without Replacement First, !2 1 n 1 N 1 N = ∑ (xi x̄ ) = ∑ Ri xi2 n i∑ 2 2 σ̂ : Ri xi n i =1 n i =1 =1 1 N 1 N 2 N N n i∑ ∑ Ri xi2 ∑ ∑ Ri Rj xi xj. = Ri xi2 =1 n 2 i =1 n 2 i =1 j =i +1 n Therefore, from E [Ri ] = N and (1), h i 1n 1 N 2 2 n 1 N N N n i∑ N (N 1) n i∑ ∑ x i xj E σ̂ 2 = xi =1 =1 j =i +1 n 1 = S 2 by (2). n As a result, h i h i n E s2 = E σ̂ 2 = S 2. n 1 Ping Yu (HKU) Sampling Distribution Theory 43 / 49 Sampling Distributions of Sample Variances Summary Parameter Estimator Mean and Var of the Est. Dist. of Normalized Est. (Normalized ) With Rep Without Rep Normality n!∞ µ µ µ x̄ σ2 σ2 N n - - n n N 1 x̄ E [x̄ ] (z = σ x̄ ) (0, 1) (0, 1) N (0, 1) N (0, 1)y σ2 S2 = N 2 σ2 s2 n 3 N 1σ - - σ 4z µ4 n n (n 1) ? (n 1)s 2 N (χ 2 = σ2 ) (n 1, ) N 1 (n 1) , ? χ 2n 1 ? ( ) When studying the distributions of normalized estimators, assume xi ’s are iid, i.e., randomly sampling xi with replacement. (y) If x̄qis sample proportion, i.e., xi Bernoulli(p ) (not normal), then E [x̄ ] = p, and p (1 p ) σ x̄ = n. 2σ 4 (z) The variance reduces to n 1 if xi N µ, σ 2. Ping Yu (HKU) Sampling Distribution Theory 44 / 49 Properties of Point Estimators Properties of Point Estimators Ping Yu (HKU) Sampling Distribution Theory 45 / 49 Properties of Point Estimators Estimator and Estimate We mentioned "estimator" above, and here we provide a rigorous definition. An estimator of a population parameter is a function of the sample whose value provides an approximation to this unknown parameter. If the sample is fxi gni=1 , then an estimator is f (x1 , , xn ) which is also a random variable given that xi is random. An estimate is a realized value of the estimator. So an estimate is just a number. A point estimator of a population parameter is a function of the sample that produces a single number called a point estimate. An interval estimator of a population parameter is a function of the sample that produces an interval. - An example of the interval estimator is the confidence interval that will be discussed in Lecture 7. No single mechanism exists for the determination of a uniquely "best" point estimator in all circumstances. What is available instead is a set of criteria under which particular estimators can be evaluated. Two criteria discussed here are unbiasedness and efficiency. Ping Yu (HKU) Sampling Distribution Theory 46 / 49 Properties of Point Estimators Unbiasedness A point estimator θ̂ is said to be an unbiased estimator of a population parameter θ if E θ̂ = θ for any possible value of θ. h i We show above that E [x̄ ] = µ, E [p̂ ] = p, and E s2 = σ 2 , so x̄, p̂ and s2 are unbiased estimators of µ, p and σ 2 , respectively. Figure: Density of an Unbiased Estimator θ̂ 1 and a Biased Estimator θ̂ 2 Bias θ̂ = E θ̂ θ. So the bias of an unbiased estimator is 0. Ping Yu (HKU) Sampling Distribution Theory 47 / 49 Properties of Point Estimators Most Efficient There may be many unbiased estimators. To choose among them, we use variance as a criterion. The unbiased estimator with the smallest variance is preferred, and is called the most efficient estimator, or the minimum variance unbiased estimator (MVUE). For two unbiased estimators of θ based on the same sample, θ̂ 1 and θ̂ 2 , θ̂ 1 is said to be more efficient than θ̂ 2 if Var θ̂ 1 < Var θ̂ 2. The relative efficiency of θ̂ 1 with respect to (w.r.t.) θ̂ 2 is Var θ̂ 2 /Var θ̂ 1 , i.e., if Var θ̂ 2 > Var θ̂ 1 , then θ̂ 1 is more efficient, so its relative efficiency w.r.t. θ̂ 2 is greater than 1. Given a random sample fxi gni=1 with xi N µ, σ 2. Both the sample mean x̄ and sample median x.5 are unbiased estimator of µ. 2 2 2 But Var (x̄ ) = σn , and Var (x.5 ) = π2 σn = 1.57σ n when n is large [proof not required], so the sample mean is more efficient than the sample median, and the relative efficiency of the former to the latter is Var (x.5 ) relative efficiency = = 1.57. Var (x̄ ) Ping Yu (HKU) Sampling Distribution Theory 48 / 49 Properties of Point Estimators (**) Proof for the efficiency properties are not required. Ping Yu (HKU) Sampling Distribution Theory 49 / 49