Almost Sure Convergence PDF
Document Details
University of Calcutta
Dr. Rahul Bhattacharya
Tags
Summary
This document provides a detailed explanation of almost sure convergence in probability theory. It covers different types of convergence including the strong and weak law of large numbers and the Borel-Cantelli lemma. The summary also explains when stochastic big O and small o are used.
Full Transcript
Large sample Inference: Module 111 What we provide in this module Almost sure convergence Weak law of large numbers Strong law of large numbers Borel-Cantelli Lemma 1 Co-Ordinator: Dr. Rahul Bhattacharya, Department of Statistics, Calcutta University...
Large sample Inference: Module 111 What we provide in this module Almost sure convergence Weak law of large numbers Strong law of large numbers Borel-Cantelli Lemma 1 Co-Ordinator: Dr. Rahul Bhattacharya, Department of Statistics, Calcutta University 1 1 Convergence almost surely Convergence in probability does not ensure lim Xn = X with probability 1 and hence we have another mode of convergence, known as almost sure convergence. Almost sure convergence: Consider a sequence of random variables Xn. Then Xn is a.s. said to converge almost surely to another random variable X. i.e. Xn − X → 0 if P sup d(Xm , X) → 0, m≥n where the random variables Xn , n ≥ 1 and X are defined on a common sample space. This P P is a stronger mode of convergence because supm≥n d(Xm , X) → 0 ⇒ d(Xn , X) → 0. 1.1 An example iid 1 Consider Xn ∼ R(0, 1), n ≥ 1. Then it is well known that E(X(1:n) ) = n+1 → 0 for large n. Now consider the quantity P (|X(1:m) −0| ≤ ∀ m ≥ n). Since X(1:m) ∈ [0, 1] with probability one, we get |X(1:m) − 0| ∈ [0, 1] with probability one. Then for ≥ 1, the above probability is simply one. Now assume 0 < < 1, then the above probability reduces to P (X(1:m) ≤ ∀ m ≥ n). Since, X(1:n) ≥ X(1:n+1) , we have P (X(1:m) ≤ ∀ m ≥ n) = P (X(1:m) ≤ ). Now a.s. P (X(1:m) ≤ ) = 1 − P (X(1:m) > ) = 1 − (1 − )m → 1 and hence Xn → 0. 2 Convergence in the r th mean Consider the random variables Xn and X defined on a common sample space such that E|Xn |r < ∞ for some r > 0. Then Xn is said to converge in the r th mean to X if E|Xn − X|r → 0 r r provided E|X|r < ∞. In notation, we use Xn → X. It should be noted that if Xn → X for P some r > 0 then Xn → X. However, the converse may not be always true. 2 3 Stochastic order relations We have already introduced big O and small O notations for ordinary sequences. Similar notations can also be introduced to simplify the notions regarding stochastic convergence. Stochastic big O: For a sequence of random variables Xn , if for every > 0, there exists a positive constant K() and an integer n0 = n0 () such that P (|Xn | < K()) ≥ 1 − , n ≥ n0 (), then Xn = Op (1),i.e. Xn is bounded in probability. For sequences of random variables Xn and Yn , if for every > 0, there exists a positive constant K() and an integer n0 = n0 () such that Xn P (| | < K()) ≥ 1 − , n ≥ n0 (), Yn then Xn = Op (Yn ). Stochastic small o: For a sequence of random variables Xn , if for every > 0 and η > 0, there exists a positive integer n0 = n0 (, η) such that P (|Xn | > η) < , n ≥ n0 (, η), then Xn = op (1). P It is easy to observe that Xn = op (1) ⇔ Xn → 0. One can use Euclidean norm to extend the idea for vector valued random variables. For sequences of random variables Xn and Yn , if for every > 0 and η > 0, there exists a positive integer n0 = n0 (, η) such that Xn P (| | > η) < , n ≥ n0 (, η), Yn then Xn = op (Yn ). For a sequence of random variables Xn , if Xn = op (1) then Xn = Op (1) also but the converse is not true, in general. As an example consider X ∼ N (0, 1) and define Xn = 3 D (−1)n X. Then Xn = X and hence Xn = Op (1). But P (|Xn | > ) = 2Φ() − 1 6= 0 and hence we do not have the relation Xn = op (1). We often use representations like Xn = Yn + Rn where either Rn → 0 in probability or nRn → 0 in probability. We can use small o notation to write these as Xn = Yn + op (1) and Xn = Yn + op (1/n), respectively. 4 Convergence of a series Of Random Variables We have already discussed sequence of random variables and their convergence. However, like ordinary series, we can also define a series with random variables. The different terms of such a series is either independent or dependent. Often our interest lies in the asymptotic nature of average or some standardised version of it. Consequently, we have laws of large numbers and central limit theorems. 4.1 Weak law of large numbers & law of averages Pn Let Xn , n ≥ 1 be a sequence of random variables and let Sn = k=1 Xk. Consider a sequence of constants bn , n ≥ 1, where bn is positive, nondecreasing and diverging to +∞. Then Xn , n ≥ 1 satisfies weak law of large numbers(WLLN) with respect to bn , if there exists a sequence of real constants an such that Sn − an P →0 bn as n → ∞. an are called centering constants and bn are called norming constants. Most p often we take an = E(Sn ) and bn = V ar(Sn ). Suppose a random experiment A is performed n times and fn is the sample frequency. Naturally, nothing can be predicted about the rate of occurrence P(A) based on a single fn trial. However, considering the averages n ,n ≥ 1, we arrive at an experiment in which the outcome can be predicted with high accuracy. The Law of Large Numbers are often describes as law of averages. 4 4.1.1 Different WLLN Depending on different models, different variations of WLLN exist. We provide below few of them. Markov WLLN: Consider a sequence of random variables Xn with E(Xk ) = µk. V ar(Sn ) If n2 → 0 then Pn Sn k=1 µk P − → 0. n n That is, Xn , n ≥ 1 satisfies WLLN. If, in addition, Xn are independent, then we have Chebyshev’s WLLN. Poisson’s WLLN: Consider a sequence of independent random variables Xn with Xn ∼ Bin(1, µn ). Then Pn Sn k=1 µk P − → 0. n n If µk = µ, then the above WLLN reduces to that of Bernoulli. Khintchine WLLN: Consider a sequence of iid random variables Xn with E(X1 ) = µ < ∞. Then Sn P → µ. n 4.1.2 An Example fn Consider tossing of a coin n times. If fn is the number of heads turned up then n gives the proportion of heads in n throws. Naturally, nothing is known about the probability of head fn at the outset. The Law of Large Numbers predicts that n will be close to the unknown fn probability for large n. Thus the value of n for large n would provide a close idea about the unknown probability. For better understanding, we simulate the tosses of a coin with success probability.7. For varying n(50,500 and 5000, clockwise), we have constructed the histogram of the proportion of heads in n trials. It is easy to observe that the distribution of the observed success proportion becomes more and more concentrated about the true proportion 0.7 as we increase n. 5 5 15 4 3 10 2 5 1 0 0 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.65 0.70 0.75 50 40 30 20 10 0 0.68 0.69 0.70 0.71 0.72 Figure 1: Convergence of observed success proportion 6 It is easy to observe that the distribution of the observed success proportion becomes more and more concentrated about the true proportion 0.7 as we increase n. 4.2 Strong law of large numbers(SLLN) Pn Let Xn , n ≥ 1 be a sequence of random variables and let Sn = k=1 Xk. Consider a sequence of constants bn , n ≥ 1, where bn is positive and diverging to +∞. Then Xn , n ≥ 1 satisfies strong law of large numbers(SLLN) with respect to bn , if there exists a sequence of real constants an such that Sn − an a.s. →0 bn as n → ∞. Unlike WLLN, in case of SLLN, we consider an = E(Sn ) but bn = n. 4.2.1 SLLN & Borel-Cantelli Lemma Almost sure convergence is not easy to check in a straightforward way and hence we need some sufficient conditions. The following result provides an way to establish almost sure convergence. P∞ Borel-Cantelli Lemma: If for a sequence of events An , n ≥ 1, n=1 P (An ) < ∞ then P∞ P (limAn ) = 0. However, if An , n ≥ 1 are pairwise independent with n=1 P (An ) = ∞ then P (limAn ) = 1. Observe that a.s. Xn − X → 0 ⇐⇒ P (lim|Xn − X| > ) = 0 for every > 0 and hence we can use the above lemma to establish almost sure convergence. 7