Homework 2 Solutions - Fall 2024 - STAT 4380
Document Details
Uploaded by BelovedSulfur
2024
Tags
Related
Summary
This document contains solutions to homework problems in statistical inference. The problems address topics like sample variance, normal approximation, and more. The document is likely part of a university course.
Full Transcript
STAT 4380 Statistical Inference Homework II (Due on 10/09/2024 Wednesday midnight 11:59 pm) Instructions: This assignment covers sample statistics and sampli...
STAT 4380 Statistical Inference Homework II (Due on 10/09/2024 Wednesday midnight 11:59 pm) Instructions: This assignment covers sample statistics and sampling distributions. While discussions are permitted, please attempt the problems independently and direct any questions to your instructor or teaching assistant. Detailed explanations are required to receive partial or full credit. Late submissions will not be accepted, as solutions will be posted right after the deadline. Section I: 1. Show that in the computational formula for the sample variance, the sum of squares (SS) quantity is n n n ( ni=1 xi )2 X X X P 2 2 2 2 (xi − x̄) = xi − nx̄ = xi − , n i=1 i=1 i=1 with x̄ = ni=1 xi /n being the sample mean. P 2. Suppose that 81% of the people in El Paso live in the city and 19% of the people live in the suburbs. If the 21,000 UTEP undergraduate students represent a random sample of the El Paso population, what is the probability that the number of UTEP undergraduates from the suburbs will be fewer than 4,000? (Hint: Apply normal approximation to binomial probability via CLT.) 3. Suppose that the body mass index (BMI) measure for adults is normally distributed with mean 21.7 and standard deviation 3.6. (a) One is considered as over-weighted with a BMI higher than 95% of the population. What is the cutoff value for this definition of overweight? (b) Obesity is defined as a BMI above 30. What is the proportion of obese adults? (c) For a randomly selected group of n = 10 adults, let X̄ denote the average BMI and s2 denote the sample variance. i. What is the likelihood that the average BMI of these 10 adults is above 25? ii. What is the probability that the sample variance s2 is above 2? Express this proba- bility in terms of an appropriate χ2 distribution. 4. Given X ∼ χ2 (ν), show that var(X) = 2ν. (Hint: One way is to use the moment generating funciton (MGF).) 5. Given Zi are IID N (0, 1) random variables, specify the distribution of the following quantity: 1 √ Z1 (a) 3· q Z22 + Z32 + Z42 Z12 + Z22 (b) 2 · Z32 + Z42 + Z52 + Z62 IID 6. Given Xi ∼ N (µ, σ 2 ) for i = 1,... , n, show that the sample mean X̄ and the quantity Xi − X̄ are independent by checking that their joint and marginals MGFs satisfy MX̄,Xi −X̄ (s, t) = MX̄ (s) · MXi −X̄ (t) for any s and t within a certain neighborhood of 0. (Hint: Find the distribution of Xi − X̄ by recognizing it is a linear combination of Xi ’s.) IID 7. Given a random sample Xi ∼ N (µ, σ 2 ) for i = 1,... , n, let X̄ denote the sample average. Show that (X1 −X2 ) ⊥ X̄, i.e., they are independent, using the fact that, for random variables that have a joint normal distribution, the necessary and sufficient condition of independence is zero covariance. 8. Show that , if X ∼ F (v1 , v2 ), then 1/X ∼ F (v2 , v1 ). Section II: The problems in this section are required for graduate students and optional for undergraduate students (with the opportunity for extra credit). 9. Prove Markov’s inequality: If X is a nonnegative random variable and a positive constant a > 0, then E(X) P (X ≥ a) ≤. a 10. Given a random sample {X1 , X2 ,... , Xn } of size from Exponential(λ) distribution with PDF f (x) = λe−λx for λ > 0 and x > 0, Pn let X̄ = i=1 Xi be the sample average. (a) Find the exact distribution of X̄ (b) Find the asymptotic distribution of X̄ by central limit theorem (CLT). (c) Given λ = 0.1, let n vary in {10, 20, 50, 100, 200}. Use R to compute P (X̄ > 12) based on the exact and asymptotic distributions, respectively, and compare. 2 STAT 4380 Statistical Inference Solutions for HW II 1. Consider n X X (xi − x̄)2 = x2i − 2xi x̄ + x̄2 i=1 i n X n X = x2i − 2x̄ xi + nx̄2 i=1 i=1 n X x2i − 2x̄nx̄ + nx̄2 , since P = i xi = nx̄ i=1 n X = x2i − nx̄2 , which completes the first part i=1 n P 2 i xi X = x2i −n n i=1 n Pn 2 X ( i=1 xi ) = x2i −. n i=1 2. Let X denote the number of UTEP undergraduates from the suburbs. Then X ∼ binomial(n = 21, 000, p = 0.19). With normal approximation, we have, by CLT, S approximately follow a normal distribution with mean µ = np = 21000 × 0.19 = 3, 990 and variance σ 2 = np(1 − p) = 21000 × 0.81 × 0.19 = 3231.9). Therefore, 4000 − 3990 Pr(X < 4000) = Pr Z < √ 3231.9 = Pr(Z < 0.1759) = 0.5698 or 56.98%. 3. Let X denote the BMI of a random adult. Then X ∼ N (µ = 21.7, σ = 3.6). 1 (a) A cutoff value c is sought such as Pr(X ≤ c) = 0.95. Equivalently, c − 21.7 Pr Z < = 0.95. 3.6 According to the Z or N (0, 1) table, it can be found that Pr(Z < 1.645) = 0.95, i.e., the 95-th percentile of standard norm is 1.645. Therefore, c − 21.7 = 1.645 ⇒ c + 3.6 × 1.645 + 21.7 = 27.62. 3.6 (b) We want 30 − 21.7 Pr(X > 30) = Pr Z > = Pr(Z > 2.31) = 1−Pr(Z < 2.31) = 1−0.9896 = 0.0104. 3.6 (c) This question asks about the sampling distribution of X̄ and s2 from a normal distribution. We know X̄ ∼ N (mu = 21.7, σ 2 /n = 3.62 /10) and (10 − 1)s2 9s2 = ∼ χ2 (n − 1) = χ2 (9). σ2 3.62 i. Want 25 − 21.7 Pr(X̄ > 25) = Pr Z > √ = Pr(Z > 2.899) = 1 − Pr(Z ≤ 2.899) = 0.0019. 3.6/ 10 ii. Want 9s2 9×2 Pr(s2 > 2) = Pr 2 > = Pr χ (9) > 1.389. 3.62 3.62 Although not required, the above probability could found 0.9979 with R; a range could be found via the table of χ2 distributions. 4. Recall that the MGF of X is 1 MX (t) = (1 − 2t)−ν/2 , for t <. 2 It follows that ν ν 0 ν − −1 − −1 MX (t) = − (1 − 2t) 2 (−2) = ν (1 − 2t) 2. 2 0 (0) = v. Also, and hence E(X) = MX ν 00 ν − −2 MX (t) = ν − − 1 (1 − 2t) 2 (−2) 2 ν − −2 = ν(ν + 2)(1 − 2t) 2 00 (0) = v(v + 2). Thus, it follows and E(X 2 ) = MX var(X) = E(X 2 ) − [E(X)]2 = v(v + 2) − v 2 = 2v. 2 5. (a) Rewrite the quantity as q Z1 ∼ t(3) (Z22 + Z32 + Z42 )/3 (Z12 + Z22 )/2 (b) Rewrite it as ∼ F (2, 4) (Z32 + Z42 + Z52 + Z62 )/4 6. Write Xi − X̄ as a linear combination of Xi ’s: n X 1/n 6 i if j = Xi − X̄ = aj Xj , with aj = 1 − 1/n if j = i. j=1 By linearity of normal r.v.’s, Xi − X̄ must follow a normal distribution with mean n X aj E(Xj ) = E(Xi ) − E(X̄) = µ − µ = 0 j=1 and variance n n 1 2 X X n−1 1 a2j var(Xj ) =σ 2 a2j = + 1− 2 =σ · 1−. n2 n n j=1 j=1 Namely, 2 1 Xi − X̄ ∼ N 0, σ · 1 −. n Thus, the product of marginals MGF on the right hand side (RHS) is, recalling X̄ ∼ N (µ, σ 2 /n), RHS = MX̄ (s) · MXi −X̄ (t) σ2 2 1 σ 2 t2 = exp µs + s · exp 0t + 1 − 2n n 2 2 2 σ s 1 = exp µs + + t2 (1 −. 2 n n 3 Consider the joint MGF on the left hand side (LHS), LHS = M GFX̄,Xi −X̄ (s, t) = E exp{sX̄ + t(Xi − X̄)} X s − t s−t = E exp Xj + + t Xi n n j6=i Y s−t s−t = E exp Xj · exp + t Xi n n j6=i Y s−t s−t = E exp Xj · E exp + t Xi by independence of Xi ’s n n j6=i Y s−t s−t = MXj · MXi +t n n j6=i " 2 # σ 2 (s − t)2 σ2 s − t Y s−t s−t = exp µ+ · exp µ +t + +t n n n2 n 2 n j6=i " 2 # σ 2 (s − t)2 σ2 s − t s−t s−t = exp (n − 1) µ+ +µ +t + +t n n n2 n 2 n σ 2 s2 2 1 = exp µs + + t (1 − , after simplification 2 n n which is the same as the LHS. The proof is completed. 7. Two approaches are given below. (I) Since (X1 − X̄) ⊥ X̄ and (X2 − X̄) ⊥ X̄, it follows that X1 − X2 = (X1 − X̄) − (X2 − X̄) ⊥ X̄. (II) Rewrite X1 −X2 = aT x with a = (1, −1, 0,... , 0)T and X̄ = bT x with b = (1/n, 1/n,... , 1/n)T. Thus X1 − X2 and X̄ have a joint bivariate normal distribution. It suffices to show that they have a covariance of 0. Consider cov(X1 − X2 , X̄) = aT σ 2 In b = σ 2 aT b = 0, where aT b = 1/n − 1/n + 0 + · · · + 0 = 0. 8. This is because, by definition, X can be written as χ21 /v1 X= , χ22 /v2 where χ21 ∼ χ2 (v1 ) is independent of χ22 ∼ χ2 (v2 ). Now 1 χ2 /v2 = 22 , X χ1 /v1 which must follow F (v2 , v1 ) again by definition of the F distribution. 4 9. Proof : Consider Z E(X) = xf (x)dx ZX Z = I(X > a)xf (x)dx + I(X ≤ a)xf (x)dx ZX X ≥ I(X > a)xf (x)dx, since the second term is nonnegative. ZX∞ = xf (x)dx a Z ∞ ≥ af (x)dx a Z ∞ = a f (x)dx a = aP (X ≥ a). The proof is completed. 10. (a) Recall the following properties of Gamma distribution: (i) If X ∼ exponential(λ), then X ∼ Gamma(1, λ). IID P P (ii) Given Xi ∼ Gamma(αi , β) for i = 1,... , n, i Xi ∼ Gamma( i αi , β). (iii) Given X ∼ Gamma(α, β), bX ∼ Gamma(α, β/b). Note that both λ and β are shape (not scale) parameters in the above statements. Therefore, we first have Xi ∼ Gamma(1, λ) in this question. It follows that X Xi ∼ Gamma(n, λ), i and hence 1X X̄ = Xi ∼ Gamma(n, nλ). n i (b) Given Xi ∼ Exponential(λ), we have E(X) = 1/λ and var(X) = 1/λ2. By CLT, 1 1 X̄ ∼ AN ,. λ nλ2 (c) The following computation is done in R. Sample size n Distribution 10 20 50 100 200 Exact 0.2424 0.1803 0.0844 0.0279 0.0037 Asymptotic 0.2635 0.1855 0.0786 0.0228 0.0023 An example R code is provided below: 5 x