Discrete Probability Distributions PDF

Chapter 1 Discrete Probability Distributions 1.1 Introduction Random variables are numerical quantities that depend on the outcome of a random experi- ment. Random variables are used to model uncertain phenomena, such as the lifetime of an electronic component, the number of customers in a store, or the score of a candidate on an exam. Discrete random variables are variables that take distinct values in a finite or countable set. For example, the number of faces obtained when rolling a die, the number of children in a family, the number of customers who arrive in a store, etc. We will see in particular how to define the Mass Probability Function of a discrete random variable and its properties, which describes the distribution of possible values and their chances of occurrence. To study the properties and behavior of discrete random variables, we use mathematical tools such as the Distribution Function, Expectation, Variance, Standard Deviation and the Moment Generat- ing Function. In this chapter, we will define these notions and illustrate them with examples. We will also present some common discrete probability distributions that serve as models for many applications, such as Bernoulli, Binomial , Poisson and Geometric distributions. 1.2 Random variable 1.2.1 Introductory example As an introductory example, consider the experience corresponding to the throw of two six- sided dice. A possible outcome is then a couple (x1 , x2 ), where xi ∈ {1, 2, 3, 4, 5, 6}, i = 1, 2. If we are interested in the sum of the two realised numbers x1 et x2 , we associate to each elementary event ω of Ω a positive integer by: X(ω) = x1 + x2 where X is a function from Ω to N. Now suppose that Ω is the fundamental space corresponding to a certain experiment. The results of the experiment need not be numbers. But it is often desirable to assign a specific number to each outcome, for example, the sum of the points obtained by throwing a pair of dice, the number of aces in a hand in a game of bridge, or the life (in hours) of a light 1 2 CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS bulb. Such an operation is called a random variable. More precisely, the following definition formalizes this approach: Definition 1.1. A random variable X on a fundamental space Ω is a function from Ω in the set R of real numbers, such that for each interval I of R the inverse X −1 (I) = {ω ∈ Ω : X(ω) ∈ I} is an event of Ω. 1.2.2 Discrete Random Variable Definition 1.2 (Discrete set). A discrete set is a finite or countably infinite set. For example, the set of positive integer numbers N is a discrete set since it is a countably infinite set. The set of possible values of the experience of rolling a dice Ω = {1, 2, 3, 4, 5, 6} is a finite set then it is a discrete set. Definition 1.3. A discrete random variable X on a fundamental space Ω is a real function defined on Ω with values in a discrete set X = X(Ω) ⊂ R. X:Ω→X ω 7→ X(ω) If Ω is a discrete set, the set X of possible values of the random variable (r.v) X is countable and this justifies the qualification of discrete random variable. We denote: X = {xi }i∈I where I is a set of indices. Abbreviated notation P (X = a) is used to designate the probability of the event "X takes the value a". Notation P (a ≤ X ≤ b) is for the probability of the event "X takes its value in the interval [a, b]". More precisely: P (X = a) = P ({ω ∈ Ω : X(ω) = a}) P (a ≤ X ≤ b) = P ({ω ∈ Ω : a ≤ X(ω) ≤ b}) The symbols P (X ≥ a), P (X > a), P (X ≤ a), P (X < a), P (a < X ≤ b), P (a < X < b), P (a ≤ X < b) have similar meanings. 1.3 Probability Mass Function Definition 1.4. The Probability Mass Function (PMF) or Probability Distribution of a discrete random variable X is the numerical sequence of probabilities pX (k) = P (X = k) for all k ∈ X(Ω). Properties 1.1. The probability mass function has the following properties: 1. For all k ∈ X(Ω), 0 ≤ pX (k) ≤ 1, 1.4. CUMULATIVE DISTRIBUTION FUNCTION 3 pX (k) = 1. P 2. k∈X(Ω) If we can write X(Ω) = {x1 , x2 ,... , xn } the probability distribution of X can be presented in a table of the form: pX (xi ) Pn xi x1 x2... xn−1 xn i=1 pX (xi ) p1 p2... pn−1 pn 1 Example 1.1. We toss a fair coin and look at the top face. We have two possible outcomes, Heads or Tails, the universe is Ω = {Heads, Tails}. Let a random variable X associates the value 0 with Heads and the value 1 with Tails. We therefore have a discrete rv with values in X = X(Ω) = {0, 1}. The two faces of the coin being equiprobable we have the probability mass function of X given by pX (1) = 21 and pX (0) = 12. 0 1 pX (xi ) P2 xi i=1 pX (xi ) 1 2 1 2 1 Example 1.2. We roll two fair dice and denote X as the sum of the results of the two dice. The set of values of the r.v. X is X = X(Ω) = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. The probability distribution of X is given in the following table pX (k) P k 2 3 4 5 6 7 8 9 10 11 12 k∈X pX (k) 1 36 2 36 3 36 4 36 5 36 6 36 5 36 4 36 3 36 2 36 1 36 1 1.4 Cumulative Distribution Function Definition 1.5 (Cumulative Distribution Function). Let X be a discrete real r.v. We call Cumulative Distribution Function (CDF) of X the function FX defined on R by: FX (x) = P (X ≤ x) X = pX (xi ). xi ≤x, xi ∈X This function is increasing and: FX (−∞) = lim FX (x) = 0 and FX (+∞) = lim FX (x) = 1. x→−∞ x→+∞ Remark 1.1. The cumulative distribution function of X depends only on the probability mass function of X. Conversely, it characterizes this probability mass function. 4 CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS Example 1.3. Consider the Example 1.2. The cumulative distribution function FX is given for all x ∈ R by:     0, if x < 2 1 , if 2 ≤ x < 3   36    3 , if 3 ≤ x < 4   36   , if 4 ≤ x < 5   6   36   10 , if 5 ≤ x < 6     36  15 , if 6 ≤ x < 7  FX (x) = 36 (1.1)    21 36 , if 7 ≤ x < 8 26 , if 8 ≤ x < 9   36    30 , if 9 ≤ x < 10   36   , if 10 ≤ x < 11   33    36 , if 11 ≤ x < 12   35    36 1, if x ≥ 12.  Proposition 1.1. Let X be a discrete rv. We have 1/ P (X > x) = 1 − FX (x) 2/ If a < b, then P (a < X ≤ b) = FX (b) − FX (a) 3/ If X is a rv taking values in X = {x1 , x2 ,... , xn } with x1 < x2 < · · · < xn , then pX (x1 ) = P (X = x1 ) = FX (x1 ) and ∀ i ∈ {2,... , n} we have: pX (xi ) = FX (xi ) − FX (xi−1 ). 1.5 Function of a discrete random variable Let X a rv with values in a finite set X , and g a function from X with values in a finite set Y. Then g(X) defines a random variable with values in Y and its probability distribution is given by: X pg(X) (y) = P X ∈ g −1 ({y}) = pX (x). x∈g −1 ({y}) Example 1.4. Let us take the Example 1.2, and consider that if the result of the r.v. X is even then we lose 5$ and if the result is odd then we win 10$. We therefore have a function g defined by:  −5, if x ∈ {2, 4, 6, 8, 10, 12}  g(x) = +10, if x ∈ {3, 5, 7, 9, 11} 0,  otherwise.  Then the r.v. Y = g(X) takes its values in Y = {−5, +10}, and X 18 pY (−5) = P X ∈ g −1 ({−5}) = pX (x) =. 36 x∈{2,4,6,8,10,12} 1.6. EXPECTATION OF A DISCRETE RANDOM VARIABLE 5 In the same way X 18 pY (+10) = P X ∈ g −1 ({+10}) = pX (x) = . 36 x∈{3,5,7,9,11} 1.6 Expectation of a Discrete Random Variable The mathematical Expectation, or mean, is a measure of the central tendency of a random variable. Definition 1.6. Let X be a discrete random variable with values in X = {x1 , x2 ,... , xn }. The Expectation of X, denoted by E(X), is: E(X) = x1 × pX (x1 ) + x2 × pX (x2 ) + · · · + xn−1 × pX (xn−1 ) + xn × pX (xn ) (1.2) Xn = xi pX (xi ). (1.3) i=1 In general, for a countable set of values X : X E(X) = xi pX (xi ) (1.4) xi ∈X if the series is absolutely convergent, i.e., if X |xi |pX (xi ) < +∞. (1.5) i∈X Otherwise, the expectation is undefined. This means that E(X) is the weighted average of the values that X can take, each value being weighted by its probability. Also, the mean (or expected value) of a discrete random variable is the value that we expect to observe per repetition, on average, if we perform an experiment a large number of times. For example, we may expect a car salesperson to sell, on average, 2.4 cars per week. This does not mean that every week this salesperson will sell exactly 2.4 cars. (Obviously one cannot sell exactly 2.4 cars.) This simply means that if we observe for many weeks, this salesperson will sell a different number of cars during different weeks; however, the average for all these weeks will be 2.4 cars per week. Example 1.5. We throw a pair of well-balanced dice. We obtain the finite equiprobable space Ω whose elements are the 36 ordered pairs of numbers from 1 to 6: Ω = {(1, 1), (1, 2),... , (6, 6)} We assume that at each point (a, b) of Ω, we associate the maximum X of its components, i.e. X(a, b) = max(a, b). X is then a random variable on Ω, having as image space X = X(Ω) = {1, 2, 3, 4, 5, 6} 6 CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS and 1 pX (1) = P (X = 1) = P ({(1, 1)}) = 36 3 pX (2) = P (X = 2) = P ({(2, 1), (2, 2), (1, 2)}) = 36 5 pX (3) = P (X = 3) = P ({(3, 1), (3, 2), (3, 3), (2, 3), (1, 3)}) = 36 7 pX (4) = P (X = 4) = P ({(4, 1), (4, 2), (4, 3), (4, 4), (1, 4), (2, 4), (3, 4)}) = 36 In the same way 9 11 pX (5) = P (X = 5) = and pX (6) = P (X = 6) = 36 36 This information can be presented in the form of a table, as follows: k∈X pX (k) P k 1 2 3 4 5 6 pX (k) 1 36 3 36 5 36 7 36 9 36 11 36 1 We then calculate the mean of X: 1 3 5 7 9 11 E(X) = 1 × +2× +3× +4× +5× +6× 36 36 36 36 36 36 161 = = 4.47 36 Remark 1.2. There exists discrete random variables for which the expectation is not defined (because the series does not converge). Example 1.6 (Undefined expectation). Let the r.v. X such that: X = N∗ and ∀x ∈ N∗ 1 pX (x) =. x(x + 1) This is indeed a probability distribution since: - ∀x ∈ N∗ , 0 ≤ pX (x) ≤ 1, P∞ P∞ P∞ x=1 pX (x) = x=1 x(x+1) = = 1 − 12 + 12 − 13 + · · · = 1 1 1 1 - x=1 x − x+1 However, we know that the series: ∞ ∞ X X 1 xpX (x) = x=1 x=1 x+1 diverges and then E(X) is undefined. Remark 1.3. If a r.v. X takes its values in N, its expectation can also be written in the form: ∞ X ∞ X E(X) = (1 − FX (k)) = P (X > k). (1.6) k=0 k=0 1.7. COMMON DISCRETE DISTRIBUTIONS 7 1.7 Common Discrete Distributions 1.7.1 Discrete Uniform Distribution The discrete uniform distribution is associated with a system of equiprobable events. The r.v. X follows a discrete uniform distribution of order n if X = {1, 2,... , n} and if 1 pX (k) = , ∀ k = 1,... , n. n We denote X ∼ U (X ). Its cumulative distribution function is given by: 0, if x < 0   [x]   FX (x) = , if 0 ≤ x ≤ n  n 1, if x ≥ n.   where [x] is the integer part of x. Expectation of the Discrete Uniform Distribution If X ∼ U ({1, 2,... , n}) follows a uniform distribution with values in X = {1, 2,... , n}, its expectation is given by n X i E(X) = i=1 n 1 n(n + 1) = n 2 (n + 1) =. 2 Example 1.7. If X represents the result of a fair die roll, then X ∼ U ({1, 2, 3, 4, 5, 6}) and ∀ k ∈ {1, 2, 3, 4, 5, 6}: 1 pX (k) =. 6 The expectation is given by the formula (n + 1) (6 + 1) E(X) = = = 3.5. 2 2 1.7.2 Bernoulli Distribution Bernoulli’s distribution models an experiment with two possible outcomes. The value 1 can be associated with a certain success (making a Tails for example in the coin toss experiment); and the value 0 associated with a failure. Definition 1.7. Let p ∈ [0, 1]. We call the Bernoulli distribution with parameter p, denoted by B(p), the distribution of probabilities defined in X = {0, 1} and given by: ( p, if x = 1 pX (x) = 1 − p, if x = 0. 8 CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS Expectation of the Bernoulli distribution Let X ∼ B(p) a Bernoulli distribution with parameter p, with pX (1) = p and pX (0) = 1 − p. We have E(X) = 1 × pX (1) + 0 × pX (0) = p. Example 1.8. The toss of a fair coin follows a Bernoulli distribution B( 12 ) with parameter p = 12. Its expectation is then given by 1 1 1 E(X) = 1 × +0× =. 2 2 2 Example 1.9. An urn contains m black balls and n white balls. We randomly draw a ball from the urn and define the random variable ( 1, if we draw a black ball X= 0, if we draw a white ball. m where the probability of success is p = m Then X ∼ B m+n m+n. The expectation is: m E(X) = p =. m+n Example 1.10. If A is an event from a fundamental space Ω, the indicator variable of A is ( 1, if ω ∈ A IA (ω) = 0, if ω ∈ / A. is a Bernoulli r.v. with parameter p = P (A). Success is defined as the outcome of the trial belonging to event A. The expectation of an indicator variable of an event A is equal to the probability of this event (probability of success): E (IA ) = P (A). Remark 1.4. If X ∼ B(p) then 1 − X ∼ B(1 − p). Remark 1.5. B( 21 ) ≡ U ({0, 1}). 1.7.3 Binomial distribution The binomial distribution with parameters n and p, n ∈ N∗ and 0 ≤ p ≤ 1, denoted B(n, p) is the probability distribution of a random variable equal to the number of successes encountered during a successive repetition of n independent Bernoulli trials, p being the probability of success in each trial. For example, in the experience of the draw of balls with replacement from an urn, if the proportion of white balls in the urn is equal to p and if the number of draws is equal to n, then the random variable X "number of white balls drawn" is a Binomial random variable with parameter (n, p). 1.7. COMMON DISCRETE DISTRIBUTIONS 9 Definition 1.8. Let X be the r.v. following a Binomial distribution B(n, p). The set of possible values of X is X(Ω) = {0, 1, 2,... , n}. Then ∀ k ∈ {0, 1, 2,... , n} pX (k) = Cnk pk (1 − p)n−k. We can verify that we have nk=0 pX (k) = 1, using Newton’s binomial formula P n X n X pX (k) = Cnk pk (1 − p)n−k k=0 k=0 = (p + (1 − p))n = 1 Expectation of the Binomial distribution Let X ∼ B(n, p) be a Binomial distribution with parameters n and p. We have n X E(X) = kCnk pk (1 − p)n−k k=0 n X n! = k pk (1 − p)n−k k=1 k!(n − k)! n X n! = pk (1 − p)n−k k=1 (k − 1)!((n − 1) − (k − 1))! Xn =n p (1 − p)n−k k−1 k Cn−1 k=1 n−1 X =n k Cn−1 pk+1 (1 − p)n−1−k k=0 n−1 X = np k Cn−1 pk (1 − p)n−1−k = np(p + 1 − p)n−1 k=0 = np. Hence, if X ∼ B(n, p), then E(X) = np. (1.7) Example 1.11. An urn contains M black balls and N white balls. We draw successively with replacement n balls from the urn. The number X of white balls brought among these n balls is a Binomial random variable B(n, p) with p = MN+N the proportion of white balls in the urn and for all k, 0 ≤ k ≤ n, we have k n−k N N P (X = k) = pX (k) = Cn k 1−. M +N M +N The expectation is given by formula (1.7): N n×N E(X) = n × p = n × =. M +N M +N 10 CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS 1.7.4 Geometric distribution A typical experiment to introduce a geometric distribution r.v. is the following: we consider a sequence of independent Bernoulli trials with the same probability of success p; it is then sufficient to consider the r.v. that associates to each ω the number of Bernoulli trials executed until the first success. In 1986, Weinberg and Gladen 1 studied the number of menstrual cycles it took women to become pregnant, measured from the moment they decided to become pregnant. We model the number of cycles until pregnancy by a random variable X. Suppose that the probability that a woman becomes pregnant in a particular cycle is equal to p, for some p with 0 < p ≤ 1, independently of previous cycles. Then clearly P (X = 1) = p. Because of the independence of consecutive cycles, we find for k = 1, 2,... that P (X = k) = P (no pregnancy in the first k − 1 cycles, pregnancy in the k − th) = (1 − p)k−1 p. Definition 1.9. The r.v. X follows a geometric distribution of parameter p, denoted G(p), if X = N∗ and pX (k) = q k−1 p where q = 1 − p. P∞ We easily verify that k=1 pX (k) = 1 because ∞ X ∞ X ∞ X pX (k) = q k−1 p=p qj k=1 k=1 j=0 p = = 1. 1−q The CDF of this variable is written: FX (x) = 1 − q [x]. Very close to the geometric distribution, the modified geometric distribution associates with ω the number of failures up to the first success. Thus, the r.v. Z follows a modified geometric distribution of parameter p if X = N and if pX (k) = q k p. 1 C.R. Weinberg and B.C. Gladen. The beta-geometric distribution applied to comparative fecundability studies. Biometrics, 42(3):547–560, 1986. 1.7. COMMON DISCRETE DISTRIBUTIONS 11 Expectation of the Geometric distribution If X is a Geometric r.v. with parameter p, X ∼ G(p), we have: ∞ X ∞ X E(X) = i(1 − p)i−1 p = p i(1 − p)i−1 i=1 i=1 ∞ ! 1 d X d =p q i =p dq i=0 dq 1−q 1 1 =p =. (1 − q) 2 p Hence, if X ∼ G(p) then: 1 E(X) =. (1.8) p For example, if you throw a fair die, it takes on average 6 throws to get a given face. Memoryless property of Geometric distribution To introduce this property, let us assume that we observe a sequence of Bernoulli trials (inde- pendent and with the same probability of success) and that, after n trials, no success has yet occurred. If we define the r.v. Y as the number of additional trials to be performed to obtain the first success, we have Y = X − n where X ∼ G(p). Then, the conditional probability of Y given X > n, i.e., pY (k) = P (Y = k | X > n) = q k−1 p follows also the Geometric distribution with the same parameter. Indeed: P (Y = k | X > n) = P (X = n + k | X > n) P ({X = n + k} ∩ {X > n}) = P (X > n) P (X = n + k) = (since {X = n + k} ⊂ {X > n}) P (X > n) pq n+k−1 pq n+k−1 = = 1 − FX (n) qn = pq k−1. The Geometric distribution is the only discrete r.v. to possess this property. Even the modified geometric distribution does not possess this property. 1.7.5 Negative Binomial distribution A typical experiment to introduce this r.v. consists of considering a sequence of Bernoulli trials (independent and with the same probability of success) and defining the r.v. X as the number of trials to be carried out to obtain the r-th success. If we introduce the following three events: 12 CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS A:"X = n", B: "Exactly (r − 1) successes in the first (n − 1) trials", C: "success in the nth trial", We have A = B ∩ C. Since the Bernoulli trials are independent, this implies that P (A) = P (B)P (C) with P (B) = Cn−1 p (1 − p)n−r and P (C) = p, which corresponds for n = r, r + 1,... to r−1 r−1 pX (n) = P (A) = Cn−1 p (1 − p)n−r. r−1 r 1.7.6 Hypergeometric distribution A typical experiment to introduce this probability distribution is the following: let an urn contain d white balls and N − d black balls. Consider a draw without replacement of n balls. Define the r.v. X as the number of white balls obtained in this draw. Therefore the probability of obtaining k white balls, with values in max(0, (n − (N − d)),... , min(d, n), is equal to: Cdk CNn−k −d P (X = k) =. CNn We denote it by X ∼ H(n, N, d). 1.7.7 Poisson distribution By definition, a r.v. X follows a Poisson distribution with parameter λ > 0 if X = N and if for all k ∈ N λk pX (k) = e−λ. k! P∞ We easily verify that k=0 pX (k) = 1 since: ∞ ∞ X X λk pX (k) = e −λ = e−λ eλ = 1. k=0 k=0 k! Expectation of the Poisson Distribution Let X ∼ P(λ) a Poisson distribution with parameter λ. ∞ X λk −λ E(X) = k e k=0 k! ∞ X λk−1 = λe−λ = λe−λ eλ k=1 (k − 1)! = λ. Hence, if X ∼ P(λ), then E(X) = λ. (1.9) 1.7. COMMON DISCRETE DISTRIBUTIONS 13 Approximation of the Poisson distribution by the Binomial distribution We will examine an example that allows us to associate the Poisson distribution and the Binomial distribution. Consider a computer server connected to a network and receiving requests. We observe the number of requests arriving during a time interval of length T. We assume that the arrivals are all independent and that, moreover, provided that we consider a sufficiently small time slice △t, only one arrival can occur per time slice. Finally, we assume that the probability of such an arrival is equal to λ△t, with λ△t ≪ 1. If the observation was carried out over n time slices, with n = T /△t, we have observed, given the above assumptions, n independent Bernoulli trials with the same probability of success p = λ△t. If we introduce the r.v. X defined as the number of requests arriving during these n trials, the probability distribution of X is the Binomial distribution B(n, λ△t), or again B n, λT n , i.e., for k = 0, 1,... , n: k n−k λT λT P (X = k) = Cn k 1−. (1.10) n n We can wonder what happens to this distribution when, while keeping the same observation interval T , we take increasingly smaller time slices △t and then, n tends to infinity. In other words, what happens to B n, n when n → ∞ ? λT Recalling that x m lim 1 + = ex m→+∞ m and developing (1.10) in the form −k n n n−1 n − k + 1 (λT )k λT λT P (X = k) = × × ··· × × 1− 1− n n n k! n n we obtain for k = 0, 1,... , n (λT )k −λT lim P (X = k) = e. n→+∞ k! The limit therefore corresponds to the Poisson distribution with parameter λT. In fact, we use this result to approximate the binomial distribution using the Poisson dis- tribution, when n is large and p is small by taking λ = np. In practice, we can use this approximation when n ≥ 30, p ≤ 0.1 and np < 15. These conditions ensure that the mean number of successes (given by n × p) is not too high, making the Poisson approximation rea- sonable. The approximation tends to be more accurate as n increases and p decreases, with the constraint that λ remains moderate. Example 1.12. From a practical point of view, we are led to introduce the Poisson distribution in problems of the following type. Suppose that we take samples of n units from a population containing only two kinds of individuals A1 and A2 in proportion p and q (p + q = 1). If n ≥ 30 is large and p ≤ 0.1 so that np < 15, we can admit that the number of individuals of type A1 in a sample is approximately a Poisson random variable with parameter λ = np. Example 1.13. In a liquid solution containing bacteria A1 and other particles A2 , the number of bacteria is small compared to the total number of particles. On the other hand, even in 14 CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS a small volume of solution, the number of particles is large. To determine the probability distribution of the variable X "number of bacteria in a fixed elementary volume", it is necessary, even using the Poisson distribution, to know the proportion of bacteria in the solution. To estimate this, we take a drop of liquid and spread it on a hemacytometer. This has a number of cells of the order of 400. If the solution is homogeneous, the number Xi of bacteria in the ith cell is a Poisson random variable with parameter λ. If we perform a ω test, that is to say such an experiment, we will obtain 400 independent observations X1 (ω), X2 (ω),... , X400 (ω) (homogeneity of the liquid). The parameter λ is then estimated by Xi (ω) P400 λ̂ = i=1 400 1.8 Expectation of a function of a discrete r.v. Let X be a discrete r.v. and (pX (xi ))i∈X its probability distribution. We construct Y = g(X) where g is a real-valued function; to an elementary event ω corresponds a realization xi of the r.v. X and g(X) takes the value yi = g(xi ). Theorem 1.1. X E[g(X)] = g(xi )pX (xi ) i∈X if the series of the second member of the equation is absolutely convergent. Example 1.14. Let X a r.v. in X = {1, 2} with pX (1) = 1 2 and pX (2) = 1 2. We have E(X) = 32. Let the r.v. Y = X1 , then X 1 1 11 E(Y ) = pX (xi ) = 1 × + x ∈X xi 2 22 i 3 =. 4 1 But we have E(X) = 23. 1 1 So do not confuse between E X and E(X). Corollary 1.1. For two constants a and b E(aX + b) = aE(X) + b Corollary 1.2. For two functions g et h E[g(X) + h(X)] = E[g(X)] + E[h(X)]. The special cases where g(X) = X k , k = 1, 2, 3,... , define the moments of order k ∈ N∗. 1.9. VARIANCE OF A DISCRETE R.V. 15 1.8.1 Moment of a discrete r.v. Definition 1.10. E X k is called the moment of order k of the r.v. X X k E Xk = xi pX (xi ). xi ∈X h i E (X − E(X))k is called the centered moment of order k of the r.v. X. For k = 2, the centered moment of order 2 is called the variance of X. 1.9 Variance of a discrete r.v. V (X) = E (X − E(X))2. (1.11) We have the following formula V (X) = E X 2 − E(X)2. (1.12) Indeed, V (X) = E (X − E(X))2 = E X 2 − 2E(X)X + E(X)2 = E X 2 − 2E(X)E(X) + E(X)2 (because E(X) and E(X)2 are constants) = E X 2 − E(X)2. The variance provides an indication of the dispersion of the numerical values that can be taken by the random variable. By definition, V (X) ≥ 0. According to the definition, the variance is zero V (X) = 0 if and only if the r.v. is a constant. The square root of the variance is called the standard deviation, and is denoted σX = V (X). p Properties 1.2. Let X be a random variable and k a real number. We have the following properties of the variance: 1. V (X + k) = V (X) 2. V (kX) = k 2 V (X). Example 1.15. Let X be a discrete uniform distribution in X = {1, 2,... , n}. n X 21 1X 2 E X2 = i = i i∈X n n i=1 1 n(n + 1)(2n + 1) (n + 1)(2n + 1) = = n 6 6 2 (n + 1)(2n + 1) n+1 V (X) = E X − E(X) = 2 2 − 6 2 (n + 1)(n − 1) =. 12 16 CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS 1.10 Generating Function This notion is a mathematical tool that allows us to simplify a number of formal calculations on discrete random variables with values on N. Definition 1.11. Let X be a discrete r.v. with values on N and (pn )n∈N its probability distribution. We call generating function of X the function ∞ X GX (z) = E z X = pn z n. (1.13) n=0 This function converges for any complex value z such that |z| < 1. The generating function GX (z) characterizes the probability distribution (pn )n∈N and vice versa. Indeed, dk GX (z) (k) k = GX (z) = k!pk + (k + 1) · · · 2pk+1 z + (k + 2) · · · 3pk+2 z 2 + · · · dz For z = 0, we obtain for all k ≥ 0 1 (k) G (0). pk = (1.14) k! X Conversely, GX (z) depends on all the terms (pn )n∈N of the distribution by construction but depends only on them. We note that X∞ GX (z) = ′ npn z n−1 , n=1 hence for z = 1 G′X (1) = E(X) and G′′X (1) = E X 2 − E(X). 1.10.1 Generating function of the Bernoulli distribution Let X ∼ B(p), we have p0 = 1 − p and p1 = p. GX (z) = p0 z 0 + p1 z 1 = 1 − p + pz = pz + 1 − p We have G′X (z) = p and G′′X (z) = 0. Therefore, we obtain the mean and the second-order moment of Bernoulli’s distribution by E(X) = G′X (1) = p and E (X 2 ) = G′′X (1) + G′X (1) = p. Then the variance of the Bernoulli random variable X ∼ B(p) is obtained by V (X) = p − p2 = p(1 − p). Hence if X ∼ B(p), then V (X) = p(1 − p). (1.15) 1.10. GENERATING FUNCTION 17 1.10.2 Generating function of the Geometric distribution Let X ∼ G(p), the probability mass function (pmf) of the Geometric distribution is given for k ∈ N∗ by pk = (1 − p)k−1 p = q k−1 p. Then the generating function of the Geometric distribution is obtained by ∞ X GX (z) = (1 − p)k−1 pz k k=1 ∞ X = pz (qz)k−1 k=1 ∞ X pz = pz (qz)k = k=0 1 − qz pz =. 1 − (1 − p)z Hence, if X ∼ G(p), then its generating function is given by: pz GX (z) =. (1.16) 1 − (1 − p)z We can obtain the expectation, second moment and variance of the Geometric distribution using its generating function (1.16) as follows p(1 − qz) + pzq p G′X (z) = = (1 − qz)2 (1 − qz)2 We obtain 1 E(X) = G′X (1) = p and 2pq(1 − qz) 2pq G′′X (z) = = (1 − qz)4 (1 − qz)3 then 2pq 2q G′′X (1) = = 2. (1 − q)3 p Thus, the second-order moment of the Geometric distribution is given by 2q 1 2q + p E X 2 = G′′X (1) + G′X (1) = 2 + = p p p2 and the variance is given by 2q 1 1 1−p V (X) = 2 + − 2 =. p p p p2 18 CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS 1.10.3 Generating function of the Binomial distribution Let X ∼ B(n, p), the probability distribution of the Binomial distribution is given for k ∈ {0, 1,... , n} by pX (k) = Cnk pk (1 − p)n−k. Then the generating function of the Binomial distribution is obtained as follows Xn GX (z) = Cnk pk (1 − p)n−k z k k=0 Xn = Cnk (pz)k (1 − p)n−k k=0 = (pz + 1 − p)n. Hence, if X ∼ B(n, p), then its generating function is given by GX (z) = (pz + 1 − p)n. (1.17) Remark 1.6. If Y follows a Bernoulli distribution B(p), with probability of success p, and X follows a Binomial distribution B(n, p). We have seen that the generating function of the Bernoulli distribution is GY (z) = (pz + 1 − p) (1.18) then, the generating function of the Binomial distribution is GX (z) = (GY (z))n = (pz + 1 − p)n. This comes from the fact that the Binomial distribution is the sum of n independent Bernoulli trials2. We can obtain the expectation, second moment and variance of the Binomial distribution using its generating function (1.17) as follows G′X (z) = np(pz + (1 − p))n−1. We therefore have the mean of the Binomial distribution E(X) = G′X (1) = np. For the second-order moment and the variance we calculate the second derivative of the generating function G”X (z) = n(n − 1)p2 (pz + (1 − p))n−2. Thus, G”X (1) = n2 p2 − np2 , and the moment of order is therefore E X 2 = G′′X (1) + G′X (1) = n2 p2 − np2 = np(np − p + 1) and the variance is equal to V (X) = E X 2 − E(X)2 = np(1 − p). Hence, if X ∼ B(n, p), then its variance is given by V (X) = np(1 − p). (1.19) 2 We will see later this property of the generating function of the sum of independent random variables in Chapter 2. 1.10. GENERATING FUNCTION 19 1.10.4 Generating function of the Poisson distribution Let X ∼ P(λ), the probability distribution of the Poisson law is given for k ∈ N by λk −λ pX (k) = e. k! The generating function of the Poisson distribution is obtained as follows ∞ X λk GX (z) = e−λ z k k=0 k! ∞ X (λz)k =e −λ = e−λ eλz k=0 k! =e λ(z−1). Hence, if X ∼ P(λ), then its generating function is given by GX (z) = eλ(z−1). (1.20) We can obtain the expectation, second moment and variance of the Poisson distribution using its generating function (1.20) as follows G′X (z) = λeλ(z−1). We therefore have the mean of the Poisson distribution E(X) = G′X (1) = λ. For the second-order moment and the variance we calculate the second derivative of the generating function G”X (z) = λ2 eλ(z−1). Thus, G”X (1) = λ2 , and the moment of order 2 is therefore E X 2 = G′′X (1) + G′X (1) = λ2 + λ and the variance is equal to V (X) = E X 2 − E(X)2 = λ. Hence, if X ∼ P(λ), then its variance is given by V (X) = λ. (1.21) Remark 1.7. The Poisson distribution is well known for having the property that its mean and variance are equal, that is E(X) = V (X) = λ. It is also a useful tool in practice to recognize if a sample belongs to a Poisson population distribution. By computing the sample mean x and the sample variance S 2 and if x ≈ S 2 this indicates the possible fit of the data sample by a Poisson distribution. This should also be confirmed further by adequacy tests such as the chi-square goodness-of-fit test, since other distributions might also exhibit this property under certain conditions3 and the sample could exhibit this by random chance. 3 For example, the Negative Binomial distribution N B(r, p), which has a mean E(X) = r/p and variance V (X) = r(1 − p)/p2. Then, for probability parameter p = 1/2, we have E(X) = V (X) = 2r. 20 CHAPTER 1. DISCRETE PROBABILITY DISTRIBUTIONS 1.10.5 The Moment Generating Function The moment generating function MX (t) of the random variable X is defined for all real values of t by replacing z in the generating function (1.13) by z = et. We then have ∞ X MX (t) = GX (e ) = E e t tX = pk etk. (1.22) k=0 It is called the moment generating function because all of the moments of X can be obtained by successively differentiating MX (t) and then evaluating the result at t = 0. For example, d MX′ (t) = E etX dt d tX =E e dt = E XetX. Evaluating at t = 0, we obtain MX′ (0) = E(X). (1.23) In general, the nth derivative of MX (t) is given by for n ≥ 1 (n) MX (t) = E X n etX implying that for n ≥ 1 (n) MX (0) = E (X n ). (1.24) Example 1.16 (Moment Generating Function for the Poisson Distribution). By replacing z = et in the generating function (1.20) of the Poisson distribution we obtain the moment generating function of the Poisson distribution by MX (t) = exp λ et − 1. (1.25) Differentiation yields MX′ (t) = λet exp λ et − 1 2 MX′′ (t) = λet exp λ et − 1 + λet exp λ et − 1 then, evaluating at t = 0, we obtain the expectation and the second moment E(X) = MX′ (0) = λ E X 2 = MX′′ (0) = λ2 + λ. 1.11. CHEBYSHEV’S INEQUALITY 21 1.11 Chebyshev’s inequality 1.11.1 Markov inequality Let X be a discrete r.v. and f a function from X = X(Ω) to R such that X E(|f (X)|) = |f (x)|pX (x) < ∞. x∈X Then, for a > 0 E(|f (X)|) P (|f (X)| ≥ a) ≤. a In particular E(|X|) P (|X| ≥ a) ≤. (1.26) a We can prove (1.26), by Let C = {x; |x| ≥ a}, we have {X ∈ C} ≡ {|X| ≥ a} therefore |x| = 1C |x| + 1C̄ |x| ≥ 1C |x| ≥ 1C a. Hence E(|X|) ≥ E (a1C ) = aE (1C ) ≥ aP (X ∈ C) = aP (|X| ≥ a). The proof remains the same for a function f by replacing |X| by |f (X)|. 1.11.2 Chebyshev’s inequality Applying Markov’s inequality to f (x) = (x − m)2 and a = ϵ2 , ϵ > 0 with m = E(X), we obtain E ((X − m)2 ) P (X − m)2 ≥ ϵ2 ≤ ϵ2 which gives Chebyshev’s inequality for all ϵ > 0 V (X) P (|X − m| ≥ ϵ) ≤. (1.27) ϵ2

Discrete Probability Distributions PDF

Document Details

Tags

Related

Summary

Full Transcript