Lecture: Moment Generating Function, Prelim 2 (PDF)
Document Details
Uploaded by VictoriousTone4059
Cornell University
Tags
Summary
These lecture notes cover moment generating functions, Laplace transforms, and characteristic functions for random variables. Examples illustrating these concepts are discussed, and properties are detailed. The notes also touch upon various important discrete and continuous random variables and distributions, along with related computations.
Full Transcript
Moment generating function, Laplace transform and characteristic function For a random variable X : the moment generating function ϕX is ϕX (t) = ϕ(t) = E e tX. The reason for the name: for every n = 1, 2,......
Moment generating function, Laplace transform and characteristic function For a random variable X : the moment generating function ϕX is ϕX (t) = ϕ(t) = E e tX. The reason for the name: for every n = 1, 2,... d n ϕX (t) = E (X n ) , dt n t=0 the nth moment of X. 110 / 283 Example X : the standard exponential random variable, with the density fX (x) = e −x , x ≥ 0. Compute the moment generating function and the moments of X. 111 / 283 The Laplace transform of a nonnegative random variable X : LX (t) = E e −tX = ϕX (−t), t ≥ 0. The Laplace transform is always finite. 112 / 283 Properties of the moment generating function and of the Laplace transform 1 Both the moment generating function and the Laplace transform uniquely determine the cdf of a random variable. When asked to find the distribution of a random variable: the moment generating function or the Laplace transform are acceptable answers. 113 / 283 2 If X and Y are independent, then ϕX +Y (t) = ϕX (t)ϕY (t). The moment generating function of a sum of independent random variables is equal to the product of the moment generating functions of these random variables. The Laplace transform has the same property. 114 / 283 Example X1 ,... , Xn : independent standard exponential random variables, Y = X1 +... + Xn. Compute the Laplace transform of Y. 115 / 283 Question Which of the following statements characterize a standard exponential random variable X ? A The density fX (x) = e −x , x ≥ 0 B The moment generating function ϕX (t) = 1/(1 − t), t < 1 C The Laplace transform LX (t) = 1/(1 + t), t ≥ 0 D All of the above 116 / 283 For some random variables neither the moment generating function ϕX (t) = E (e tX ) nor the Laplace transform are defined. If t > 0 then e tX becomes very large if X is very large If t < 0 then e tX becomes very large if X is very small (negative) The characteristic function of a random variable X is always defined. 117 / 283 The characteristic function ψX (t) of a random variable X is √ ψX (t) = ψ(t) = E e itX , i = −1. Since |e ia | = 1 for any real a, the characteristic function of a random variable is always defined. Characteristic functions have similar properties and applications to those of moment generating functions and Laplace transforms. 118 / 283 Example Compute the characteristic function of the standard uniform random variable X. 119 / 283 Law of Large Numbers X1 , X2 ,...: independent and identically distributed (iid) random variables with mean µ = EXj and variance σ 2 = Var(Xj ). The sample mean of the first n observations: X1 + X2 +... + Xn X̄n = , n n = 1, 2,.... 120 / 283 Recall that E X̄n = µ, σ2 Var X̄n = , n n = 1, 2,.... The sample mean becomes more and more concentrated around the (population) true mean µ. 121 / 283 The Law of Large Numbers: The sample mean of a sequence of iid random variables with mean µ, converges to this true mean as the sample size n grows: X̄n → µ as n → ∞. 122 / 283 1. Markov’s inequality For a nonnegative random variable X and a number a > 0 EX P(X > a) ≤. a 123 / 283 2. Chebyshev’s inequality For a random variable X with mean µ, and a number h > 0 Var(X ) P (|X − µ| > h) ≤. h2 124 / 283 Question Suppose X is nonnegative, EX = 2. Which of the following is necessarily true? A P(|X − 2| > 1) ≤ 1/3 B P(X > 10) ≤ 1/5 C P(X > 10) ≥ 1/5 D More than one statement is necessarily true 125 / 283 3. The Weak Law of Large Numbers For any ϵ > 0 P X̄n − µ > ϵ → 0 as n → ∞. 126 / 283 If a random variable does not have a finite expectation, averaging does not lead to reduced variablity. 127 / 283 Finite mean, n=1(black), n=10(red), n=100(blue) 3 2 1 w 0 -1 -2 -3 0 20 40 60 80 100 tt 128 / 283 Ininite mean, n=1(black), n=10(red), n=100(blue) 150 100 50 w 0 -50 -100 0 20 40 60 80 100 tt 129 / 283 Review of the important discrete random variables Important discrete random variables are based on Bernoulli trials: a sequence of independent trials with probability p for success in each trial. The Binomial random variable: counts the number of succeses in n trials. Notation: X ∼ Bin(n, p). n k pX (k) = p (1 − p)n−k , k = 0, 1,... , n. k A Bin(1, p) random variable is called Bernoulli. 130 / 283 If X ∼ Bin(n, p), then EX = np, Var(X ) = np(1 − p) 131 / 283 Question 10 balls are drawn from a box containing 50 black balls and 50 white balls. X is the number of white balls drawn. Is X a binomial random variable? a Yes, the balls are drawn without replacement. b Yes, the balls are drawn with replacement. c Yes, in any case. d No, no matter what. 132 / 283 The Negative Binomial random variable: counts the number of trials until the nth success. Notation: X ∼ NB(n, p). k −1 n pX (k) = p (1 − p)k−n , k = n, n + 1,.... n−1 A NB(1, p) random variable is called Geometric. 133 / 283 If X ∼ NB(n, p), then EX = n/p, Var(X ) = n(1 − p)/p 2 134 / 283 The Hypergeometric random variable: counts the number of succeses in n trials in a finite population. N objects of type A and M objects of type B. A sample of n ≤ N + M objects are drawn withour replacement. A success = drawing a type A object. 135 / 283 N M k n−k pX (k) = N+M , n max(n − M, 0) ≤ k ≤ min(n, N). nN EX = , Var(X ) complicated. N +M 136 / 283 Question A box contains N white balls and M black balls. We draw n balls. X counts the number of white balls drawn. Is var(X ) larger when the balls are drawn with or without replacement? a var(X ) is larger when the balls are drawn with replacement b var(X ) is larger when the balls are drawn without replacement c var(X ) is the same in both cases d None of the above is correct 137 / 283 The Poisson random variable: is the limiting case of the Binomial random variable as: n → ∞; p → 0; np → λ > 0. Notation: X ∼ Poiss(λ). 138 / 283 λk pX (k) = e −λ , k = 0, 1, 2,.... k! EX = λ Var(X ) = λ. 139 / 283 Reproducing property of Poisson random variables X1 , X2 ,... , Xn : independent Poisson random variables with parameters λ1 , λ2 ,... , λn. Then the sum Y = X1 + X2 +... + Xn has the Poisson distribution with parameter λ1 + λ2 +... + λn. 140 / 283 Review of the important continuous random variables 1. Continuous models on a bounded interval The Uniform random variable: takes values in a bounded interval (a, b), and has a constant density 1 fX (x) = b−a if a ≤ x ≤ b 0 otherwise. Notation: X ∼ U(a, b). a+b (b − a)2 EX = , Var(X ) =. 2 12 141 / 283 Question Does a U(−∞, ∞) model exist? a Yes b No c The model does not exist but it is still used. d More than one answer is correct. 142 / 283 The Beta random variable: offers flexible shapes of a density on a bounded interval. A random variable X has the Beta distribution on [0, 1] with parameters α > 0 and β > 0 if it has the density 1 α−1 (1 − x)β−1 if 0 < x < 1 fX (x) = B(α,β) x 0 otherwise. Notation: X ∼ Beta(α, β). 143 / 283 B(α, β) is the Beta function: Z 1 Γ(α)Γ(β) B(α, β) = x α−1 (1 − x)β−1 dx = , 0 Γ(α + β) α, β > 0. Γ(α) is the Gamma function: Z ∞ Γ(α) = x α−1 e −x dx, α > 0. 0 144 / 283 145 / 283 Beta(1, 1) = U(0, 1). Mean and variance X ∼ Beta(α, β). α αβ EX = , Var(X ) = 2. α+β (α + β) (α + β + 1) The Beta distribution with parameters α > 0, β > 0 over an interval [a, b] has the density α−1 β−1 1 1 x −a b−x fX (x) = b − a B(α, β) b−a b−a for a < x < b and 0 otherwise. 146 / 283 2. Continuous models on unbounded intervals The Exponential random variable: parameter λ > 0, takes values in (0, ∞), the density λe −λx if x ≥ 0, fX (x) = 0 if x < 0. Notation: X ∼ Exp(λ). 1 1 EX = , Var(X ) = 2. λ λ 147 / 283 Lack of memory of exponential random variable If X ∼ Exp(λ), then for any x, y > 0 P X > x + y X > x = P X > y = e −λy. 148 / 283 Question Does a uniform random variable U(a, b) have a lack of memory? a Yes b No c It depends on the parameters a, b d More than one answer is correct 149 / 283 Competition between exponential random variables X1 , X2 ,... , Xn : independent exponential random variables, Xi ∼ Exp(λi ), i = 1, 2,... , n. Y = min(X1 , X2 ,... , Xn ). Questions: 1 Which one of the Xi is the smallest (equal to Y )? 2 What is the distribution of Y ? 150 / 283 Distribution of Y : Y ∼ Exp λ1 +... + λn. 151 / 283 The probability that Xi is the smallest is proportional to its parameter λi : P min(X1 , X2 ,... , Xn ) = Xi λi = λ1 + λ2 +... + λn for i = 1, 2,... , n. 152 / 283 The time of the winner is independent of the identity of the winner: P min(X1 , X2 ,... , Xn ) = Xi Y = y λi = λ1 + λ2 +... + λn for i = 1, 2,... , n. 153 / 283 Example Suppose that the lifetimes of 3 light bulbs follow the exponential distribution with means 1000 hours, and 800 and 600 respectively. Assuming that the lifetimes are independent, compute the expected time until one of the light bulbs burns out, the probability that the first one to burn out is the bulb with the longest expected lifetime. 154 / 283 Question Famous runners, a snail and a hare, keep competing in a race. Suppose that, on average, it takes the snail 2 hours to complete the race, while the hare completes the race, on average, in 10 seconds. On a particular day nobody could finish the race faster than in 2 hours. a This makes it more likely that the snail won the race that day. b This makes it less likely than the snail won the race that day. c This does not change the likelihood that the snail won the race that day. d The answer depends on more than just the averge times of the snail and the hare. 155 / 283 The Gamma random variable: shape parameter α > 0 and scale parameter λ > 0 takes values in (0, ∞); the pdf ( λ(λx)α−1 −λx Γ(α) e if x ≥ 0 fX (x) = 0 if x < 0. Notation: X ∼ Gamma(α, λ). A Gamma(1, λ) random variable is an Exp(λ) random variable. 156 / 283 Properties: Let X ∼ Gamma(α, λ). If the shape parameter α > 0 is an integer, X is an Erlang random variable with α degrees of freedom and scale λ. Mean and variance: α α EX = , VarX = 2. λ λ Moment generating function: α tX λ ϕX (t) = E e = λ−t for t < λ. 157 / 283 Reproducing property of Gamma random variables X1 , X2 ,... , Xn : independent Gamma random variables with the same scale λ > 0 and shape parameters α1 , α2 ,... , αn. Then the sum Y = X1 + X2 +... + Xn has the Gamma distribution with the same scale λ > 0 and shape α1 + α2 +... + αn. 158 / 283 The Normal random variable: mean µ, variance σ 2 ; can take any real value, the density 1 2 2 fX (x) = √ e −(x−µ) /2σ for −∞ < x < ∞. σ 2π Notation: X ∼ N(µ, σ 2 ). The N(0, 1) random variable: standard normal, the density 1 2 fX (x) = √ e −x /2 for −∞ < x < ∞. 2π 159 / 283 The moment generating function of a normal random variable If X ∼ N(µ, σ 2 ), then 2 2 ϕX (t) = E e tX = e µt+σ t /2 , all −∞ < t < ∞. 160 / 283 Normal random variables stay normal after a linear transformation. If X ∼ N(µ, σ 2 ) and Y = aX + b, then Y ∼ N aµ + b, a2 σ 2. 161 / 283 Question True or false? If X ∼ N(µ, σ 2 ), then −X has the same density as X. a True b False c Only true if µ = 0 d Only true if µ ̸= 0 162 / 283 Reproducing property of Normal random variables X1 , X2 ,... , Xn : independent, Xi ∼ N(µi , σi2 ), i = 1, 2,... , n. Y = X1 + · · · + Xn. Then Y has the Normal distribution, with the mean µ = µ1 + µ2 +... + µ n , the variance σ 2 = σ12 + σ22 +... + σn2. 163 / 283 A linear combination of two independent normal random variables X1 ∼ N(µ1 , σ12 ), X2 ∼ N(µ2 , σ22 ) independent normal random variables. a, b numbers. Y = aX1 + bX2. Then Y ∼ N aµ1 + bµ2 , a2 σ12 + b 2 σ22. 164 / 283 Linear combinations of several independent normal random variables Xi ∼ N(µi , σi2 ), i = 1,... , n, independent normal random variables. a1 ,... , an numbers. Pn Y = i=1 ai Xi. Then n X n X Y ∼N ai µ i , ai2 σi2. i=1 i=1 165 / 283 Normal approximations 1 The normal approximation to the binomial distribution. X ∼ Bin(n, p), n is large, p is not too close to either 0 or 1. Then X − np Y =p np(1 − p) has, approximately, the standard normal distribution N(0, 1). Rule of thumb: np > 5 and n(1 − p) > 5. 166 / 283 2 The normal approximation to the Poisson distribution. X ∼ Poiss(λ), λ large. Then X −λ Y = √ λ has, approximately, the standard normal distribution N(0, 1). Rule of thumb: λ > 5. 167 / 283 Hierarchy of approximations hypergeometric ⇒ binomial ⇒ Poisson ⇒ normal. 168 / 283 Question Under what conditions can the hypergeometric distribution with parameters N, M, n be directly approximated by the normal distribution? a N, M, n are all large, and n is much larger than N, M b N, M, n are all large, and n is much smaller than N, M c N, M, n are not very large d The hypergeometric distribution cannot be directly approximated by the normal distribution 169 / 283 Continuous models related to the normal distribution The chi-square random variable with n degrees of freedom: takes values in (0, ∞), the density x n/2−1 fX (x) = n/2 e −x/2 for x > 0. 2 Γ(n/2) Notation: X ∼ χ2n. 170 / 283 Relation to the normal distribution: Y1 ,... , Yn iid standard normal random variables. Then X = Y12 +... + Yn2 has the chi-square distribution with n degrees of freedom. 171 / 283 Relation to the Gamma distribution: χ2n = Gamma(n/2, 1/2). Reproducing property: X1 ,... , Xn independent chi-square, Xi with ki degrees of freedom, i = 1,... , n. Then Y = X1 +... + Xn is chi-square with k = k1 +... + kn degrees of freedom. 172 / 283 Question True or false: some exponential random variable is a chi-square random variable. a True b False c Any exponential random variable is a chi-square random variable d More than one answer is correct 173 / 283 The t random variable, n degrees of freedom, can take any real value, the density −(n+1)/2 Γ n+1 2 x2 fX (x) = √ 1+ nπΓ n2 n for −∞ < x < ∞. Also known as: the Student t random variable. 174 / 283 Relation to the normal distribution: X is a standard normal, Y chi-square with n degrees of freedom, X independent of Y , then the random variable X T =p Y /n is a Student t random variable with n degrees of freedom. 175 / 283 The F random variable, n1 and n2 degrees of freedom, takes values in (0, ∞), the density complicated. Related to the normal distribution through the chi-square distribution. X1 chi-square random variable with n1 degrees of freedom, X2 chi-square random variable with n2 degrees of freedom, X1 is independent of X2 , then the random variable X1 /n1 Z= X2 /n2 is an F -random variable with n1 and n2 degrees of freedom. 176 / 283 Transformations of random variables Statement of the problem: X a random variable with a known distribution, e.g. known cdf, or known pmf, or known pdf. Y = T (X ) a function, or a transformation, of X. Find the distribution of Y. 177 / 283 Computing the cdf of Y = T (X ) If X is discrete with pmf pX , then FY (y ) = P(Y ≤ y ) = P(T (X ) ≤ y ) X = pX (xi ), −∞ < y < ∞. xi :T (xi )≤y If X is continuous with pdf fX then FY (y ) = P(Y ≤ y ) = P(T (X ) ≤ y ) Z = fX (x) dx, −∞ < y < ∞. x:T (x)≤y 178 / 283 Example Suppose that X is uniformly distributed between −1 and 1. Find the distribution of Y = X 2. 179 / 283 This approach sometimes works even for transformations of several random variables. Example: Let X and Y be two independent standard exponential random variables, and Z = T (X , Y ) = X + Y. Find the distribution of Z. 180 / 283 Computing the pmf or pdf of Y = T (X ). If the computation of the cdf of Y = T (X ) is not practical: in the discrete case: try computing the pmf of Y = T (X ); in the continuous case: try computing the pdf of Y = T (X ). 181 / 283 If X is discrete with a probability mass function pX : Y = T (X ) is also discrete; the pmf of Y : pY (yj ) = P(Y = yj ) = P(T (X ) = yj ) X = pX (xi ). xi :T (xi )=yj If T is one-to-one: pY (yj ) = pX T −1 (yj ) (T −1 is the inverse map.) 182 / 283 Example Suppose X takes the values −1, 0 and 1 with probability 1/3 each. Find the distribution of Y = |X |. 183 / 283 Question: True or false? If X is continuous, and Y = T (X ), then Y is also continuous. a Always true b Always false c Sometimes true and sometimes false d None of the above 184 / 283 If X is continuous with pdf fX : Sometimes Y = T (X ) is also continuous. To compute the density of Y : adjust by the derivative of the inverse transformation. 185 / 283 Monotone transformations T is either increasing or decreasing on the range of X. A monotone function T is automatically one-to-one. The pdf of Y : dT −1 (y ) fY (y ) = fX T −1 (y ). dy 186 / 283 Example X a standard exponential random variable. Compute the density of Y = 1/X 2. 187 / 283 This method may work even when the function T is not monotone. The equation T (x) = y may have several roots, T1−1 (y ), T2−1 (y ),.... Each of the roots contributes to the density of Y = T (X ): X dTi−1 (y ) fY (y ) = fX Ti−1 (y ). dy i 188 / 283 Example X standard normal, Y = T (X ), with −x if x ≤ 0 T (x) = 2x if x > 0. Compute the pdf of Y. 189 / 283 Question Suppose that X is standard normal and Y is the sign of X (1 if X ≥ 0 and -1 if X < 0). How would you compute the distribution of Y ? a Find the inverse T −1 (x) and its derivative to compute the pdf of Y b Compute the cdf of Y directly c Compute the pmf of Y d More than one of these approaches works 190 / 283 Transformations of random vectors Statement of the problem: (X1 ,... , Xk ) a random vector, with a known joint pmf (if discrete), or a known joint pdf (if continuous). (Y1 ,... , Yk ) = T (X1 ,... , Xk ) a function, or a transformation, of (X1 ,... , Xk ). Find the joint pmf of (Y1 ,... , Yk ) in the discrete case; find the joint pdf of (Y1 ,... , Yk ) in the continuous case. 191 / 283 If (X1 ,... , Xk ) is discrete: (Y1 ,... , Yk ) = T (X1 ,... , Xk ) is also discrete. The joint pmf of Y1 ,... , Yk ): pY1 ,...,Yk (yj1 ,... , yjk ) X = pX1 ,...,Xk (xi1 ,... , xik ). xi ,...,xi : 1 k T (xi ,...,xi )=(yj ,...,yj ) 1 k 1 k 192 / 283 (X1 ,... , Xk ) continuous, joint pdf fX1 ,...,Xk (x1 ,... , xk ). Sometimes (Y1 ,... , Yk ) = T (X1 ,... , Xk ) is also continuous. Instead of adjusting by the derivative of the inverse transformation: adjust by the Jacobian of the inverse transformation. 193 / 283 Suppose the function T is one-to-one on the range of (X1 ,... , Xk ). The notation for the inverse transformation: (x1 ,... , xk ) = T −1 (y1 ,... , yk ) = (h1 (y1 ,... , yk ),... , hk (y1 ,... , yk )). 194 / 283 Compute the Jacobian of the inverse transformation: the determinant ∂h1 ∂h1 ∂h1 ∂y1 ∂y2... ∂yk ∂h2 ∂h2 ∂h2 ∂y1 ∂y2... ∂yk JT −1 (y1 ,... , yk ) =............ ∂hk ∂hk ∂hk ∂y1 ∂y2... ∂yk The joint density of (Y1 ,... , Yk ): fY1 ,...,Yk (y1 ,... , yk ) = T −1 (y1 ,... , yk ) |JT −1 (y1 ,... , yk )|. = fX1 ,...,Xk 195 / 283 Example X1 and X2 : independent standard exponential random variables, Y1 = X1 + X2 , Y2 = X1 /X2. Compute the joint pdf of Y1 and Y2. 196 / 283 The range of Y1 , Y2 might be an issue. Example (X1 , X2 ) continuous, joint pdf 1 if 0 < x1 , x2 < 1 fX1 ,X2 (x1 , x2 ) = 0 otherwise. Y1 = X1 + X2 , Y2 = X1 − X2. Compute the joint pdf of (Y1 , Y2 ). 197 / 283 Question True or false? If (X1 ,... , Xk ) are uniformly distributed in some region and T is a linear transformation, then (Y1 ,... , Yk ) are uniformly distributed in some region. a True b False c True only in some cases d More than one answer is correct. 198 / 283 Standard transformations The sum of independent random variables X1 , X2 : independent continuous random variables, densities fX1 ,fX2. Z = X1 + X2 : has the density Z ∞ fZ (z) = fX1 (z − v )fX2 (v )dv ; −∞ the convolution of the densities fX1 and fX2. 199 / 283 Example X , Y iid exponential with parameter λ. Find the pdf of the sum Z = X + Y. 200 / 283 The difference of independent random variables Z = X1 − X2 : The density of Z : Z ∞ fZ (z) = fX1 (z + v )fX2 (v )dv. −∞ 201 / 283 Question Is it true that the distribution of the sum of any two random variables can be computed by the convolution formula? a True b Only if the random variables are independent c Only if the random variables are continuous d Only if the random variables are continuous and independent 202 / 283 The product of independent random variables X1 , X2 independent continuous random variables, densities fX1 , fX2. Z = X1 X2 : has the density Z ∞ 1 fZ (z) = fX (z/v )fX2 (v )dv. −∞ |v | 1 203 / 283 Example X ,Y : iid standard uniform. Find the pdf of the product Z = XY. 204 / 283 The ratio of independent random variables Z = X1 /X2 : The density of Z : Z ∞ fZ (z) = |v |fX1 (zv )fX2 (v )dv. −∞ 205 / 283 Conditional distributions (X , Y ): a random vector. A marginal distribution vs. a conditional distribution: the marginal distribution of X : consists of probabilities associated with X separately, without regard to Y. if the value y of Y is known, the conditional distribution of X given Y = y : consists of conditional probabilities associated with X given that value of Y. 206 / 283 Question Suppose that X is a continuous random variable and Y = 2X. Then the conditional distribution of Y given X = x is a discrete b continuous c can be either discrete or continuous d none of the above 207 / 283 Conditional distributions in the discrete case (X , Y ) discrete, joint pmf pX ,Y (xi , yj ). Then pX ,Y (xi , yj ) P(X = xi |Y = yj ) = , pY (yj ) (pY is the marginal pmf of Y ). This is the conditional pmf of X given Y = yj. Notation: pX |Y (xi |yj ). 208 / 283 The conditional pmf of X given Y = yj : pX ,Y (xi , yj ) pX |Y (xi |yj ) = , pY (yj ) the ratio of the joint pmf of X and Y and the marginal pmf of Y. 209 / 283 The conditional pmf of Y given X = xi : pX ,Y (xi , yj ) pY |X (yj |xi ) = , pX (xi ) the ratio of the joint pmf of X and Y and the marginal pmf of X. 210 / 283 If X and Y are independent, then: pX |Y (xi |yj ) = pX (xi ), for all xi ; pY |X (yj |xi ) = pY (yj ), for all yj. If X and Y are independent, then the conditional distributions coincide with the marginal distributions. 211 / 283 Conditional expectation and conditional variance Conditional expectation E (X |Y = yj ) is the expectation with respect to the conditional distribution: X E (X |Y = yj ) = xi pX |Y (xi |yj ). xi 212 / 283 Conditional variance Var(X |Y = yj ) is the variance with respect to the conditional distribution: Var(X |Y = yj ) = E (X 2 |Y = yj ) − (E (X |Y = yj ))2 ; where X E (X 2 |Y = yj ) = xi2 pX |Y (xi |yj ). xi 213 / 283 Example (sequel) X , Y : the daily demand for two partially substitutable items. The joint and marginal pmfs: xi /yj 0 1 2 3 pX (xi ) 0.840.030.020.010.900 1.060.010.008.002.080 2.010.005.004.001.020 pY (yj ).910.045.032.013 1 1 Compute the conditional pmf of Y given X = 2; 2 compute the conditional mean E (Y |X = 2) and the conditional variance Var(Y |X = 2). 214 / 283 Conditional distributions in the continuous case (X , Y ) continuous, joint pdf fX ,Y (x, y ). Conditionally on Y = y , the random variable X is continuous. The conditional density is denoted by fX |Y (x|y ). The conditional pdf is fX ,Y (x, y ) fX |Y (x|y ) =. fY (y ) 215 / 283 It is only legitimate to condition on the values y in the range of Y. The conditional pdf of X given Y = y : fX ,Y (x, y ) fX |Y (x|y ) = , fY (y ) the ratio of the joint pdf of X and Y and the marginal pdf of Y. 216 / 283 The conditional pdf of Y given X = x: fX ,Y (x, y ) fY |X (y |x) = , fX (x) the ratio of the joint pdf of X and Y and the marginal pdf of X. 217 / 283 If X and Y are independent: the conditional densities are the same as the marginal densities. fX |Y (x|y ) = fX (x) for all y in the range of Y , fY |X (y |x) = fY (y ) for all x in the range of X. 218 / 283 Question Suppose that X and Y are independent standard uniform random variables. Then a fX |Y (x|y ) = 1 if 0 < x < 1 for all 0 < y < 1 b fX |Y (x|y ) = 1 if 0 ≤ x ≤ 1 for all 0 ≤ y ≤ 1 c fX |Y (x|y ) = 1 if 0 ≤ x ≤ 1 for all 0 < y < 1 d All of the above is true 219 / 283 Conditional expectation and conditional variance Conditional expectation E (X |Y = y ): the expectation with respect to the conditional distribution Z ∞ E (X |Y = y ) = xfX |Y (x|y )dx. −∞ 220 / 283 Conditional variance Var(X |Y = y ): the variance with respect to the conditional distribution: Var(X |Y = y ) = E (X 2 |Y = y ) − (E (X |Y = y ))2 , where Z ∞ 2 E (X |Y = y ) = x 2 fX |Y (x|y )dx. −∞ 221 / 283 Example (X , Y ) continuous, joint pdf fX ,Y (x, y ) = 15xy 2 , 0 ≤ x, y ≤ 1, y ≤ x. Compute the conditional densities fX |Y and fY |X ; Compute the conditional mean and the conditional variance of Y given X = x for 0 < x < 1. 222 / 283 Question Suppose that X is standard normal and Y = 2X. Which of the following statements is true? a E (Y |X = x) = 0, Var(Y |X = 2) = 4 b E (Y |X = x) = 2x, Var(Y |X = 2) = 0 c E (Y |X = x) = 2x, Var(Y |X = 2) = 4x 2 d None of the above is true 223 / 283 Computation of expectation and variance via conditioning Using conditional distributions can sometimes simplify computation of means and variances. The idea: E (Y |X = x) and Var(Y |X = x) are functions of the value x of the random variable X. The conditional mean and the conditional variance can be viewed as random variables. 224 / 283 Suppose X and Y continuous, joint pdf fX ,Y. Z ∞ Z ∞ EY = y fX ,Y (x, y ) dx dy −∞ −∞ Z ∞ = E (Y |X = x)fX (x) dx. −∞ 225 / 283 Abbreviated expression: EY = E (E (Y |X )) ; true for any random variables (X , Y ): discrete, continuous or mixed. Terminology: the formula of double expectation. 226 / 283 The exact meaning of the formula of double expectation: X EY = E (E (Y |X )) = E (Y |X = xi ) pX (xi ) xi if X is discrete with pmf pX ; Z ∞ EY = E (E (Y |X )) = E (Y |X = x)fX (x) dx, −∞ if X is continuous with pdf fX. 227 / 283 Question Choose the corect claim(s). The double expectation formula Y = E (E (Y |X )) is valid a only if X and Y are of the same type (both discrete or both continuous) b only if X is one-dimensional c always d none of the above is correct 228 / 283 Example The number of claims arriving to an insurance company in a week is a Poisson random variable with mean 20. The amounts of different claims are independent exponentially distributed random variables with mean 800. Assume that the claim amounts are independent of the number of claims arriving in a week. Compute the expected total amount of claims received in a week. 229 / 283 The formula of the double expectation EY = E (E (Y |X )) is a device to compute expectations by conditioning. There is a device to compute variances by conditioning: for any two random variables X and Y VarY = E (Var(Y |X )) + Var (E (Y |X )). 230 / 283 The exact meaning of terms in the formula for the variance: X E (Var(Y |X )) = Var(Y |X = xi ) pX (xi ) xi if X is discrete with pmf pX ; Z ∞ E (Var(Y |X )) = Var(Y |X = x)fX (x) dx, −∞ if X is continuous with pdf fX. 231 / 283 X Var (E (Y |X )) = (E (Y |X = xi ))2 pX (xi ) xi !2 X − E (Y |X = xi ) pX (xi ) xi if X is discrete with pmf pX ; 232 / 283 Z ∞ Var (E (Y |X )) = (E (Y |X = x))2 fX (x) dx −∞ Z ∞ 2 − E (Y |X = x)fX (x) dx −∞ if X is continuous with pdf fX. 233 / 283 Example (sequel) The number of claims arriving to an insurance company in a week is a Poisson random variable with mean 20; the amounts of different claims are independent exponentially distributed random variables with mean 800. Compute the variance of the total claims received in a week. 234 / 283 Question Choose the correct statement(s). a EY = E (E (Y |X )) b var(Y ) = var(var(Y |X )) c Both are correct d None are correct 235 / 283 The Law of Total Probability in the context of random variables X , Y , Z ,... random variables; some random variables may be discrete, and some other continuous; A an event related to X , Y , Z ,.... The law of total probability: compute the probability of A by conditioning on one (or more) of the random variables. 236 / 283 Conditioning on a discrete random variable Suppose X is discrete, with pmf pX. Then X P(A) = pX (xi )P(A|X = xi ). xi 237 / 283 Conditioning on a continuous random variable Suppose X is continuous, with pdf fX. Then Z ∞ P(A) = fX (x)P(A|X = x)dx. −∞ 238 / 283 Question Suppose that X is discrete, Y is continuous, and A an event related to X and Y. Choose the correct statement(s). a One can compute P(A) by conditioning on X b One can compute P(A) by conditioning on Y c One can compute P(A) by conditioning on X and Y at the same time. d Any of the above are possible. 239 / 283 Example (Competition of exponential random variables) X1 , X2 ,... , Xn : independent exponential random variables; Xi ∼ Exp(λi ), i = 1, 2,... , n. Compute the probability that Xi is the smallest among X1 , X2 ,... , Xn. 240 / 283