Probability Theory and Statistics Lecture Notes PDF
Document Details
Uploaded by ProvenOpArt
Ukrainian Catholic University
2018
Rostyslav Hryniv
Tags
Summary
These are lecture notes on probability theory and statistics, taught in Autumn 2018 at Ukrainian Catholic University. The notes cover topics like inequalities, laws of large numbers, the central limit theorem, and characteristic functions.
Full Transcript
Probability theory and statistics Lecture Notes Rostyslav Hryniv Ukrainian Catholic University Computer Science and Business Analytics Programmes 3rd term Autumn 2018 WLLN Convergence and SLLN...
Probability theory and statistics Lecture Notes Rostyslav Hryniv Ukrainian Catholic University Computer Science and Business Analytics Programmes 3rd term Autumn 2018 WLLN Convergence and SLLN Central limit theorem Lecture 7. Law of Large Numbers and Central Limit Theorem WLLN Convergence and SLLN Central limit theorem Outline 1 Inequalities and weak law of large numbers Motivation Inequalities Weak law of large numbers 2 Convergence and Strong law of large numbers Convergence in probability Convergence almost everywhere Strong law of large numbers 3 Central limit theorem Characteristic functions Convergence in law Convergence of sums WLLN Convergence and SLLN Central limit theorem Frequentist approach to probability Definition Frequentist probability defines an event’s probability as the limit of its relative frequency in a large number of trials Given is an experiment in which an event A may happen repeat it n times (large enough) record the number nA of trials in which A occurred define nA P(A) = ( lim ) n→∞ n justified by the Law of Large Numbers problems: what n are large enough? :::::::::::: get different results if repeated WLLN Convergence and SLLN Central limit theorem Sums of independent identically distributed r.v.’s Consider i.i.d. r.v.’s X1 , X2 ,... with E(X1 ) = µ and Var(X1 ) = σ 2 Set Sn = X1 + · · · + Xn to be the sum of the first n of them By independence, Var(Sn ) = nσ 2 =⇒ Sn spreads out with n Thus Sn cannot have a meaningful limit For the sample mean X1 + · · · + Xn Sn Mn := = n n one finds that σ2 E(Mn ) = µ, Var(Mn ) = n and thus Mn are concentrated near µ informally, the Law of Large Numbers asserts that Mn converge to the true mean µ as n → ∞ WLLN Convergence and SLLN Central limit theorem Markov inequality For a non-negative r.v.and every a > 0, 1 P(X ≥ a) ≤ E(X ) a Proof. Introduce an auxiliary r.v.: ( 0 if X < a; Ya = a if X ≥ a Then Ya ≤ X whence E(Ya ) ≤ E(X ) E(Ya ) = aP(X ≥ a), thus P(X ≥ a) ≤ a−1 E(X ) as claimed Example If X ∼ B(100, 12 ), then P(X ≥ 60) ≤ 50 60 =.8333... The true inequality is P(X ≥ 60) ≈.02 WLLN Convergence and SLLN Central limit theorem Chebyshev’s inequality If X is a r.v. with mean µ and variance σ 2 , then for all c > 0 σ2 P(|X − µ| ≥ c) ≤ c2 Proof. Observe that {|X − µ| ≥ c} = {(X − µ)2 ≥ c 2 } Use the Markov inequality with a = c 2 and (X − µ)2 in place of X Example If X ∼ B(100, 12 ), then P(X ≥ 60) = P(X − 50 ≥ 10) 25 ≤ P(|X − 50| ≥ 10) ≤ =.25 102 The true inequality is P(X ≥ 60) ≈.02 WLLN Convergence and SLLN Central limit theorem One-sided Chebyshev’s inequality If X is a r.v. with mean µ and variance σ 2 , then for all c > 0 σ2 P(X − µ ≥ c) ≤ σ2 + c 2 Proof. Observe that {X − µ ≥ c} = {X − µ + b ≥ b + c}, for every b ≥ 0 Thus {X − µ ≥ c} ⊂ {|X − µ + b|2 ≥ (b + c)2 }, and σ2 + b2 P(X − µ ≥ c) ≤ (b + c)2 Minimize over b to get the result (at b = σ 2 /c) Example If X ∼ B(100, 12 ), then 25 P(X ≥ 60) = P(X − 50 ≥ 10) ≤ =.2 102 + 25 WLLN Convergence and SLLN Central limit theorem Derivation of the WLLN Consider i.i.d. r.v.’s X1 , X2 ,... with E(X1 ) = µ and Var(X1 ) = σ 2 For the sample mean Mn , we find that σ2 E(Mn ) = µ and Var(Mn ) = n Apply Chebyshev’s inequality and get σ2 P(|Mn − µ| ≥ ε) ≤ nε2 which goes to zero as n → ∞ WLLN Convergence and SLLN Central limit theorem Weak law of large numbers In fact, it is not necessary that Xk have finite Var(Xk ): Theorem (Weak law of large numbers) Let X1 , X2 ,... be independent identically distributed r.v.’s with mean µ. For every ε > 0, we have, as n → ∞, X + ··· + X 1 n P |Mn − µ| ≥ ε = P −µ ≥ε →0 n Example If Xk have√normal distribution N (µ, σ 2 ), then Mn ∼ N (µ, σ 2 /n), and with Z := n(Mn − µ)/σ ∼ N (0, 1) one finds √ √ P(|Mn − µ| ≥ ε) = 2P(Z ≤ −ε n/σ) = 2Φ(−ε n/σ) → 0 as n → ∞ WLLN Convergence and SLLN Central limit theorem Example: opinion poll A sample of size n is chosen at random from a large population Each person from the sample was asked if she or he supports some topic T (party, initiative etc); let k /n be the relative fraction of supporters k /n is taken as an estimate of support rate p denote by Ij the indicator of the event that j th respondent supports T ; Ij can be taken independent Bernoulli with parameter p now k = I1 + · · · + In = Sn , and p(1 − p) 1 P(|Mn − p| ≥ ε) ≤ 2 ≤ nε 4nε2 usually one takes ε =.03 and wants an upper bound on the probability α ≤.05 to achieve this, need n ≥ (4ε2 α)−1 ≈ 5000 This will be improved using the Central Limit Theorem WLLN Convergence and SLLN Central limit theorem Convergence in probability Definition A sequence of r.v.’s X1 , X2 ,... converges in probability to a r.v. Y P (written Xn → Y ) if, for every ε > 0, P{|Xn − Y | ≥ ε} → 0 as n → ∞. Thus the WLLN asserts that the sample means Sn of i.i.d. r.v.’s X1 , X2 ,... with mean µ converge to µ in probability. Problem P Assume that Xn → µ as n → ∞ and that a function g is continuous at P the point µ. Prove that g(Xn ) → g(µ) as n → ∞. WLLN Convergence and SLLN Central limit theorem Convergence almost everywhere Definition (Converegence almost everywhere) A sequence of r.v.’s X1 , X2 ,... converges almost everywhere (almost a.e. surely, with probability 1) to a r.v. X as n → ∞ (written Xn → X ) if P{ω ∈ Ω | Xn (ω) → X (ω)} = 1 Example Ω = [0, 1] with uniform probability; Xn (ω) = ω n converge to 0 if ω 6= 1 Theorem Convergence almost everywhere implies convergence in probability Example (Convergent in probability but not almost everywhere) Set Ak,n = [k · 2−n , (k + 1) · 2−n ), n ∈ N, k = 0,... , 2n − 1 and P Xk,n = χAk,n. Then Xk,n → 0, but Xk,n does not converge to 0 if ω 6= 1! WLLN Convergence and SLLN Central limit theorem Strong law of large numbers Theorem (Strong law of large numbers) If X1 , X2 ,... are independent identically distributed random variables a.e. with mean µ, then Mn = Sn /n → µ Corollary (Estimating the c.d.f.) Let X1 , X2 ,... be i.i.d. r.v.’s with c.d.f. F. For t ∈ R, introduce the r.v.’s Ik by ( 0 if Xk (ω) > t Ik (t; ω) = 1 if Xk (ω) ≤ t Then Ik (t) are independent and have Bernoulli distribution with parameter p := F (t); thus E(Ik (t)) = F (t) and by SLLN bX,n (t) := I1 (t) + · · · + In (t) → F (t) F n with probability 1. WLLN Convergence and SLLN Central limit theorem Empirical c.d.f. Definition (Empirical c.d.f.) Assume that X1 , X2 ,... are i.i.d. r.v.’s with c.d.f. F and xk = Xk (ω) is a realization of Xk. The empirical c.d.f. F bx,n for realization x = (x1 ,... , xn ) is bx,n (t) := ]{k | xk ≤ t} F n Clearly, F bx,n (t) = FbX,n (t; ω) is a realization of the r.v. F bX,n (t) therefore, ∀t, Fbx,n (t) → F (t) with probability 1. Theorem (Glivenko–Cantelli) Almost surely, sup |F bx,n (t) − F (t)| −→ 0 t∈R i.e., with probability 1 the empirical distribution function converges to F uniformly WLLN Convergence and SLLN Central limit theorem Monte Carlo SLLN is the basis for the Monte Carlo simulation method Example Assume that g is a continuous function over (0, 1) and let U ∼ U (0, 1). Then Z 1 1 g(x) dx = Eg(U) = lim g(U1 ) + · · · + g(Un ) , 0 n where U1 , U2 ,... are i.i.d. r.v.’s from U (0, 1) Example Let f be a continuous non-negative function on the square (0, 1) × (0, 1) assuming values less than 1. We generate n points uniformly in the unit cube and denote by kn the number of those below the graph of f. Then Z kn f (x, y ) dxdy = lim n→∞ n WLLN Convergence and SLLN Central limit theorem Characteristic functions Recall the moment generating function is E(esX ); it need not be defined for s 6= 0 Example: the Cauchy distribution with f (x) = π −1 (1 + x 2 )−1 on the contrary, the characteristic function φX (t) := E(eitX ) is well defined for all t ∈ R the characteristic function is the Fourier transform of the p.d.f. it has all properties of generating functions: uniqueness: φX determines uniquely the distribution of X ; (k ) derivatives related to moments: φX (0) = i k µk (X ) for independent X and Y , φX +Y = φX · φY WLLN Convergence and SLLN Central limit theorem Examples Normal distribution X ∼ N (µ, σ 2 ): Z n 2iσ 2 tx − (x − µ)2 o dx itX φX (t) = E(e ) = exp √ R 2σ 2 2πσ Z n (x − µ − itσ 2 )2 o dx n o 2 2 = exp − √ exp iµt − σ t /2 R 2σ 2 2πσ Observe that R = 1; thus for independent Xk ∼ N (µk , σk2 ) R X1 + · · · + Xn ∼ N ( µk , σk2 ) P P Exponential distribution For X ∼ E (λ), Z ∞ λ itX φX (t) = E(e ) = λe−(λ−it)x dx = 0 λ − it If X1 , X2 ∼ E (λ) are independent, then X1 + X2 is not exponentially distributed as φX1 +X2 = φX1 φX2 is not a characteristic function of E (µ) WLLN Convergence and SLLN Central limit theorem Convergence in law We learned two types of convergence for r.v.’s X1 , X2 ,... : almost everywhere (SLLN) and in probability (WLLN) Another type of convergence for is convergence in law that occurs in the Central Limit Theorem Definition R.v.’s X1 , X2 ,... converge in law to a r.v. X if FXn (t) → FX (t) as n → ∞ for all points of continuity of FX Example (Poisson approximation) Poisson theorem states that the binomial r.v.’s Xn ∼ B(n, pn ) converge in law to the Poisson distribution P(λ) if npn → λ as n → ∞ Theorem Convergence in law is equivalent to convergence of characteristic functions WLLN Convergence and SLLN Central limit theorem Assume X1 , X2 ,... are i.i.d. r.v.’s with mean µ and variance σ 2 The sum Sn := X1 + · · · + Xn has mean nµ and variance nσ 2 and cannot converge, in any sence the sample mean Mn := Sn /n converges to µ by the SLLN Var(Sn ) = σ 2 /n → 0 what in intermediate case? E.g., normalize to constant variance consider standardized sums √ n Sn − µn Zn := (Mn − µ) = √ σ σ n the Central Limit Theorem assers that Zn converge in law to the standard Gaussian r.v. Z ∼ N (0, 1) WLLN Convergence and SLLN Central limit theorem Theorem (Central Limit Theorem) Under the above assumptions, Zn converge in law to Z ∼ N (0, 1), i.e., for every t ∈ R n X + · · · + X − nµ o P 1 √ n ≤ t → Φ(t) σ n Moreover, the convergence is uniform on R. Proof. Prove convergence of the characteristic functions of Zn to that of Z √ Set Yk = (Xk − µ)/σ; then Zn = (Y1 + · · · + Yn )/ n and h in φZn (t) = φY1 √t n now φY1 √t n = 1 − t 2 /2n + o(n−1 ) recall that (1 + x/n)n → ex ; therefore, 2 /2 φZn (t) ≈ (1 − t 2 /2n)n → e−t = φZ (t) WLLN Convergence and SLLN Central limit theorem Berry–Esseen theorem Rate of convergence in CLT is given by the following theorem. Berry–Esseen theorem Assume X1 , X2 ,... are i.i.d. r.v.’s with E(X1 ) =: µ, Var(X1 ) =:√σ 2 > 0 and E(|X1 |3 ) =: ρ < ∞ and set Zn := (X1 + · · · + Xn − nµ)/(σ n). Then for all n ∈ N ρ sup |P{Zn ≤ x} − Φ(x)| ≤ √ x∈R σ3 n Remark (On sample sizes) For most distributions, CLT gives good approximation of Zn by N (0, 1) if n ≥ 30 WLLN Convergence and SLLN Central limit theorem Opinion polls Task: estimate the fraction p supporting topic T Ik indicator of support for the k th respondent, Ik are Bernoulli i.i.d. set p̂ := Mn = (I1 + · · · + In )/n the relative fraction of supporters then by the CLT √ n Zn := p (p̂ − p) → Z ∼ N (0, 1) p(1 − p) therefore, √ p P{|p̂ − p| ≥ ε} = P{|Zn | ≥ ε n/ p(1 − p)} √ √ n nε o nε = 2P Zn ≤ − p ≈ 2Φ − p p(1 − p) p(1 − p) √ to make that at most.05 with ε =.03, need √ nε ≥ 1.96; it p(1−p) √ suffices to have n ≥ 1.96/(2 · 0.03), i.e., n ≥ 1067 WLLN Convergence and SLLN Central limit theorem Example Post office overweight Assume that a post office gets every day appr. 100 parcels whose weight is uniformly distributed in the interval 1 to 10 kg. What is the probability that the total weight will not be greater than 500 kg? Solution: Denote by Xn the weight of the nth parcel and set S100 := X1 + · · · + X100 As E(Xk ) = 11/2 and Var(Xk ) = 27/4, we conclude that P(S100 ≤ 500) = P(S100 − 550 ≤ −50) S − 550 10 100 =P p ≤− √ ≈ Φ(−1.925) = 0.027 10 27/4 3 3 Note that (one-sided) Chebyshev’s inequality gives 27 · 25 P(S100 − 500) = P(S100 − 550 ≤ −50) ≤ ≤ 0.213 27 · 25 + 502