Statbook Hogg Gen AI PDF
Document Details
Uploaded by TruthfulImagery
Tags
Summary
This textbook introduces the concepts of probability and distributions. It defines random experiments, sample spaces, and events, and explains the relative frequency approach to probability. The text also covers set theory, including complements, subsets, unions, and intersections of sets.
Full Transcript
Chapter 1 Probability and Distributions 1.1 Introduction In this section, we intuitively discuss the concepts of a probability model which we formalize in Secton 1.3 Many kinds of investigations may be characterized in part by the fact that repeated experimentation, under essentially the sam...
Chapter 1 Probability and Distributions 1.1 Introduction In this section, we intuitively discuss the concepts of a probability model which we formalize in Secton 1.3 Many kinds of investigations may be characterized in part by the fact that repeated experimentation, under essentially the same conditions, is more or less standard procedure. For instance, in medical research, interest may center on the effect of a drug that is to be administered; or an economist may be concerned with the prices of three specified commodities at various time intervals; or an agronomist may wish to study the effect that a chemical fertilizer has on the yield of a cereal grain. The only way in which an investigator can elicit information about any such phenomenon is to perform the experiment. Each experiment terminates with an outcome. But it is characteristic of these experiments that the outcome cannot be predicted with certainty prior to the experiment. Suppose that we have such an experiment, but the experiment is of such a nature that a collection of every possible outcome can be described prior to its performance. If this kind of experiment can be repeated under the same conditions, it is called a random experiment, and the collection of every possible outcome is called the experimental space or the sample space. We denote the sample space by C. Example 1.1.1. In the toss of a coin, let the outcome tails be denoted by T and let the outcome heads be denoted by H. If we assume that the coin may be repeatedly tossed under the same conditions, then the toss of this coin is an example of a random experiment in which the outcome is one of the two symbols T or H; that is, the sample space is the collection of these two symbols. For this example, then, C = {H, T }. Example 1.1.2. In the cast of one red die and one white die, let the outcome be the ordered pair (number of spots up on the red die, number of spots up on the white die). If we assume that these two dice may be repeatedly cast under the same con- ditions, then the cast of this pair of dice is a random experiment. The sample space consists of the 36 ordered pairs: C = {(1, 1),... , (1, 6), (2, 1),... , (2, 6),... , (6, 6)}. 1 2 Probability and Distributions We generally use small Roman letters for the elements of C such as a, b, or c. Often for an experiment, we are interested in the chances of certain subsets of elements of the sample space occurring. Subsets of C are often called events and are generally denoted by capitol Roman letters such as A, B, or C. If the experiment results in an element in an event A, we say the event A has occurred. We are interested in the chances that an event occurs. For instance, in Example 1.1.1 we may be interested in the chances of getting heads; i.e., the chances of the event A = {H} occurring. In the second example, we may be interested in the occurrence of the sum of the upfaces of the dice being “7” or “11;” that is, in the occurrence of the event A = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1), (5, 6), (6, 5)}. Now conceive of our having made N repeated performances of the random ex- periment. Then we can count the number f of times (the frequency) that the event A actually occurred throughout the N performances. The ratio f /N is called the relative frequency of the event A in these N experiments. A relative fre- quency is usually quite erratic for small values of N , as you can discover by tossing a coin. But as N increases, experience indicates that we associate with the event A a number, say p, that is equal or approximately equal to that number about which the relative frequency seems to stabilize. If we do this, then the number p can be interpreted as that number which, in future performances of the experiment, the relative frequency of the event A will either equal or approximate. Thus, although we cannot predict the outcome of a random experiment, we can, for a large value of N , predict approximately the relative frequency with which the outcome will be in A. The number p associated with the event A is given various names. Some- times it is called the probability that the outcome of the random experiment is in A; sometimes it is called the probability of the event A; and sometimes it is called the probability measure of A. The context usually suggests an appropriate choice of terminology. Example 1.1.3. Let C denote the sample space of Example 1.1.2 and let B be the collection of every ordered pair of C for which the sum of the pair is equal to seven. Thus B = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2)(6, 1)}. Suppose that the dice are cast N = 400 times and let f denote the frequency of a sum of seven. Suppose that 400 casts result in f = 60. Then the relative frequency with which the outcome 60 was in B is f /N = 400 = 0.15. Thus we might associate with B a number p that is close to 0.15, and p would be called the probability of the event B. Remark 1.1.1. The preceding interpretation of probability is sometimes referred to as the relative frequency approach, and it obviously depends upon the fact that an experiment can be repeated under essentially identical conditions. However, many persons extend probability to other situations by treating it as a rational measure of belief. For example, the statement p = 25 for an event A would mean to them that their personal or subjective probability of the event A is equal to 25. Hence, if they are not opposed to gambling, this could be interpreted as a willingness on their part to bet on the outcome of A so that the two possible payoffs are in the ratio p/(1 − p) = 25 / 35 = 23. Moreover, if they truly believe that p = 25 is correct, they would be willing to accept either side of the bet: (a) win 3 units if A occurs and lose 2 if it does not occur, or (b) win 2 units if A does not occur and lose 3 if 1.2. Sets 3 it does. However, since the mathematical properties of probability given in Section 1.3 are consistent with either of these interpretations, the subsequent mathematical development does not depend upon which approach is used. The primary purpose of having a mathematical theory of statistics is to provide mathematical models for random experiments. Once a model for such an experi- ment has been provided and the theory worked out in detail, the statistician may, within this framework, make inferences (that is, draw conclusions) about the ran- dom experiment. The construction of such a model requires a theory of probability. One of the more logically satisfying theories of probability is that based on the concepts of sets and functions of sets. These concepts are introduced in Section 1.2. 1.2 Sets The concept of a set or a collection of objects is usually left undefined. However, a particular set can be described so that there is no misunderstanding as to what collection of objects is under consideration. For example, the set of the first 10 positive integers is sufficiently well described to make clear that the numbers 34 and 14 are not in the set, while the number 3 is in the set. If an object belongs to a set, it is said to be an element of the set. For example, if C denotes the set of real numbers x for which 0 ≤ x ≤ 1, then 34 is an element of the set C. The fact that 3 3 4 is an element of the set C is indicated by writing 4 ∈ C. More generally, c ∈ C means that c is an element of the set C. The sets that concern us are frequently sets of numbers. However, the language of sets of points proves somewhat more convenient than that of sets of numbers. Accordingly, we briefly indicate how we use this terminology. In analytic geometry considerable emphasis is placed on the fact that to each point on a line (on which an origin and a unit point have been selected) there corresponds one and only one number, say x; and that to each number x there corresponds one and only one point on the line. This one-to-one correspondence between the numbers and points on a line enables us to speak, without misunderstanding, of the “point x” instead of the “number x.” Furthermore, with a plane rectangular coordinate system and with x and y numbers, to each symbol (x, y) there corresponds one and only one point in the plane; and to each point in the plane there corresponds but one such symbol. Here again, we may speak of the “point (x, y),” meaning the “ordered number pair x and y.” This convenient language can be used when we have a rectangular coordinate system in a space of three or more dimensions. Thus the “point (x1 , x2 ,... , xn )” means the numbers x1 , x2 ,... , xn in the order stated. Accordingly, in describing our sets, we frequently speak of a set of points (a set whose elements are points), being careful, of course, to describe the set so as to avoid any ambiguity. The notation C = {x : 0 ≤ x ≤ 1} is read “C is the one-dimensional set of points x for which 0 ≤ x ≤ 1.” Similarly, C = {(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1} can be read “C is the two-dimensional set of points (x, y) that are interior to, or on the boundary of, a square with opposite vertices at (0, 0) and (1, 1).” We say a set C is countable if C is finite or has as many elements as there are positive integers. For example, the sets C1 = {1, 2,... , 100} and C2 = {1, 3, 5, 7,...} 4 Probability and Distributions are countable sets. The interval of real numbers (0, 1], though, is not countable. 1.2.1 Review of Set Theory As in Section 1.1, let C denote the sample space for the experiment. Recall that events are subsets of C. We use the words event and subset interchangeably in this section. An elementary algebra of sets will prove quite useful for our purposes. We now review this algebra below along with illustrative examples. For illustration, we also make use of Venn diagrams. Consider the collection of Venn diagrams in Figure 1.2.1. The interior of the rectangle in each plot represents the sample space C. The shaded region in Panel (a) represents the event A. Panel (a) Panel (b) B A A A A ⊂B Panel (c) Panel (d) A B A B A∪B A∩B Figure 1.2.1: A series of Venn diagrams. The sample space C is represented by the interior of the rectangle in each plot. Panel (a) depicts the event A; Panel (b) depicts A ⊂ B; Panel (c) depicts A ∪ B; and Panel (d) depicts A ∩ B. We first define the complement of an event A. Definition 1.2.1. The complement of an event A is the set of all elements in C which are not in A. We denote the complement of A by Ac. That is, Ac = {x ∈ C : x∈/ A}. 1.2. Sets 5 The complement of A is represented by the white space in the Venn diagram in Panel (a) of Figure 1.2.1. The empty set is the event with no elements in it. It is denoted by φ. Note that C c = φ and φc = C. The next definition defines when one event is a subset of another. Definition 1.2.2. If each element of a set A is also an element of set B, the set A is called a subset of the set B. This is indicated by writing A ⊂ B. If A ⊂ B and also B ⊂ A, the two sets have the same elements, and this is indicated by writing A = B. Panel (b) of Figure 1.2.1 depicts A ⊂ B. The event A or B is defined as follows: Definition 1.2.3. Let A and B be events. Then the union of A and B is the set of all elements that are in A or in B or in both A and B. The union of A and B is denoted by A ∪ B Panel (c) of Figure 1.2.1 shows A ∪ B. The event that both A and B occur is defined by, Definition 1.2.4. Let A and B be events. Then the intersection of A and B is the set of all elements that are in both A and B. The intersection of A and B is denoted by A ∩ B Panel (d) of Figure 1.2.1 illustrates A ∩ B. Two events are disjoint if they have no elements in common. More formally we define Definition 1.2.5. Let A and B be events. Then A and B are disjoint if A∩B = φ If A and B are disjoint, then we say A ∪ B forms a disjoint union. The next two examples illustrate these concepts. Example 1.2.1. Suppose we have a spinner with the numbers 1 through 10 on it. The experiment is to spin the spinner and record the number spun. Then C = {1, 2,... , 10}. Define the events A, B, and C by A = {1, 2}, B = {2, 3, 4}, and C = {3, 4, 5, 6}, respectively. Ac = {3, 4,... , 10}; A ∪ B = {1, 2, 3, 4}; A ∩ B = {2} A ∩ C = φ; B ∩ C = {3, 4}; B ∩ C ⊂ B; B ∩ C ⊂ C A ∪ (B ∩ C) = {1, 2} ∪ {3, 4} = {1, 2, 3, 4} (1.2.1) (A ∪ B) ∩ (A ∪ C) = {1, 2, 3, 4} ∩ {1, 2, 3, 4, 5, 6} = {1, 2, 3, 4} (1.2.2) The reader should verify these results. Example 1.2.2. For this example, suppose the experiment is to select a real number in the open interval (0, 5); hence, the sample space is C = (0, 5). Let A = (1, 3), 6 Probability and Distributions B = (2, 4), and C = [3, 4.5). A ∪ B = (1, 4); A ∩ B = (2, 3); B ∩ C = [3, 4) A ∩ (B ∪ C) = (1, 3) ∩ (2, 4.5) = (2, 3) (1.2.3) (A ∩ B) ∪ (A ∩ C) = (2, 3) ∪ φ = (2, 3) (1.2.4) A sketch of the real number line between 0 and 5 helps to verify these results. Expressions (1.2.1)–(1.2.2) and (1.2.3)–(1.2.4) are illustrations of general dis- tributive laws. For any sets A, B, and C, A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C). (1.2.5) These follow directly from set theory. To verify each identity, sketch Venn diagrams of both sides. The next two identities are collectively known as DeMorgan’s Laws. For any sets A and B, (A ∩ B)c = Ac ∪ B c (1.2.6) (A ∪ B)c = Ac ∩ B c. (1.2.7) For instance, in Example 1.2.1, (A∪B)c = {1, 2, 3, 4}c = {5, 6,... , 10} = {3, 4,... , 10}∩{{1, 5, 6,... , 10} = Ac ∩B c ; while, from Example 1.2.2, (A ∩ B)c = (2, 3)c = (0, 2] ∪ [3, 5) = [(0, 1] ∪ [3, 5)] ∪ [(0, 2] ∪ [4, 5)] = Ac ∪ B c. As the last expression suggests, it is easy to extend unions and intersections to more than two sets. If A1 , A2 ,... , An are any sets, we define A1 ∪ A2 ∪ · · · ∪ An = {x : x ∈ Ai , for some i = 1, 2,... , n} (1.2.8) A1 ∩ A2 ∩ · · · ∩ An = {x : x ∈ Ai , for all i = 1, 2,... , n}. (1.2.9) We often abbreviative these by ∪ni=1 Ai and ∩ni=1 Ai , respectively. Expressions for countable unions and intersections follow directly; that is, if A1 , A2 ,... , An... is a sequence of sets then A1 ∪ A2 ∪ · · · = {x : x ∈ An , for some n = 1, 2,...} = ∪∞ n=1 An (1.2.10) A1 ∩ A2 ∩ · · · = {x : x ∈ An , for all n = 1, 2,...} = ∩∞ n=1 An. (1.2.11) The next two examples illustrate these ideas. Example 1.2.3. Suppose C = {1, 2, 3,...}. If An = {1, 3,... , 2n − 1} and Bn = {n, n + 1,...}, for n = 1, 2, 3,..., then ∪∞ ∞ n=1 An = {1, 3, 5,...}; ∩n=1 An = {1}; (1.2.12) ∪∞ ∞ n=1 Bn = C; ∩n=1 Bn = φ. (1.2.13) 1.2. Sets 7 Example 1.2.4. Suppose C is the interval of real numbers (0, 5). Suppose Cn = (1 − n−1 , 2 + n−1 ) and Dn = (n−1 , 3 − n−1 ), for n = 1, 2, 3,.... Then ∪∞ ∞ n=1 Cn = (0, 3); ∩n=1 Cn = [1, 2] (1.2.14) ∪∞ n=1 Dn = (0, 3); ∩∞ n=1 Dn = (1, 2). (1.2.15) We occassionally have sequences of sets that are monotone. They are of two types. We say a sequence of sets {An } is nondecreasing, (nested upward), if An ⊂ An+1 for n = 1, 2, 3,.... For such a sequence, we define lim An = ∪∞ n=1 An. (1.2.16) n→∞ The sequence of sets An = {1, 3,... , 2n − 1} of Example 1.2.3 is such a sequence. So in this case, we write limn→∞ An = {1, 3, 5,...}. The sequence of sets {Dn } of Example 1.2.4 is also a nondecreasing suquence of sets. The second type of monotone sets consists of the nonincreasing, (nested downward) sequences. A sequence of sets {An } is nonincreasing, if An ⊃ An+1 for n = 1, 2, 3,.... In this case, we define lim An = ∩∞ n=1 An. (1.2.17) n→∞ The sequences of sets {Bn } and {Cn } of Examples 1.2.3 and 1.2.4, respectively, are examples of nonincreasing sequences of sets. 1.2.2 Set Functions Many of the functions used in calculus and in this book are functions that map real numbers into real numbers. We are concerned also with functions that map sets into real numbers. Such functions are naturally called functions of a set or, more simply, set functions. Next we give some examples of set functions and evaluate them for certain simple sets. Example 1.2.5. Let C = R, the set of real numbers. For a subset A in C, let Q(A) be equal to the number of points in A that correspond to positive integers. Then Q(A) is a set function of the set A. Thus, if A = {x : 0 < x < 5}, then Q(A) = 4; if A = {−2, −1}, then Q(A) = 0; and if A = {x : −∞ < x < 6}, then Q(A) = 5. Example 1.2.6. Let C = R2. For a subset A of C, let Q(A) be the area of A if A has a finite area; otherwise, let Q(A) be undefined. Thus, if A = {(x, y) : x2 + y 2 ≤ 1}, then Q(A) = π; if A = {(0, 0), (1, 1), (0, 1)}, then Q(A) = 0; and if A = {(x, y) : 0 ≤ x, 0 ≤ y, x + y ≤ 1}, then Q(A) = 12. Often our set functions are defined in terms of sums or integrals.1 With this in mind, we introduce the following notation. The symbol f (x) dx A 1 Please see Chapters 2 and 3 of Mathematical Comments, at site noted in the Preface, for a review of sums and integrals 8 Probability and Distributions means the ordinary (Riemann) integral of f (x) over a prescribed one-dimensional set A and the symbol g(x, y) dxdy A means the Riemann integral of g(x, y) over a prescribed two-dimensional set A. This notation can be extended to integrals over n dimensions. To be sure, unless these sets A and these functions f (x) and g(x, y) are chosen with care, the integrals frequently fail to exist. Similarly, the symbol f (x) A means the sum extended over all x ∈ A and the symbol g(x, y) A means the sum extended over all (x, y) ∈ A. As with integration, this notation extends to sums over n dimensions. The first example is for a set function defined on sums involving a geometric series. As pointed out in Example 2.3.1 of Mathematical Comments,2 if |a| < 1, then the following series converges to 1/(1 − a): ∞ 1 an = , if |a| < 1. (1.2.18) n=0 1−a Example 1.2.7. Let C be the set of all nonnegative integers and let A be a subset of C. Define the set function Q by 2 n Q(A) =. (1.2.19) 3 n∈A It follows from (1.2.18) that Q(C) = 3. If A = {1, 2, 3} then Q(A) = 38/27. Suppose B = {1, 3, 5,...} is the set of all odd positive integers. The computation of Q(B) is given next. This derivation consists of rewriting the series so that (1.2.18) can be applied. Frequently, we perform such derivations in this book. 2 n ∞ 2n+1 2 Q(B) = = 3 n=0 3 n∈B ∞ 2 n 2 2 2 1 6 = = = 3 n=0 3 3 1 − (4/9) 5 In the next example, the set function is defined in terms of an integral involving the exponential function f (x) = e−x. 2 Downloadable at site noted in the Preface 1.2. Sets 9 Example 1.2.8. Let C be the interval of positive real numbers, i.e., C = (0, ∞). Let A be a subset of C. Define the set function Q by Q(A) = e−x dx, (1.2.20) A provided the integral exists. The reader should work through the following integra- tions: 3 3 Q[(1, 3)] = e−x dx = −e−x = e−1 − e−3 =0.318 ˙ 1 1 ∞ 3 −x Q[(5, ∞)] = e dx = −e = e−5 =0.007 −x ˙ 1 5 5 3 5 Q[(1, 3) ∪ [3, 5)] = e−x dx = e−x dx + e−x dx = Q[(1, 3)] + Q([3, 5)] 1 1 3 ∞ Q(C) = e−x dx = 1. 0 Our final example, involves an n dimensional integral. Example 1.2.9. Let C = Rn. For A in C define the set function Q(A) = · · · dx1 dx2 · · · dxn , A provided the integral exists. For example, if A = {(x1 , x2 ,... , xn ) : 0 ≤ x1 ≤ x2 , 0 ≤ xi ≤ 1, for 1 = 3, 4,... , n}, then upon expressing the multiple integral as an iterated integral3 we obtain 1 x2 n 1 Q(A) = dx1 dx2 dxi 0 0 i=3 0 1 x22 1 = 1=. 2 0 2 If B = {(x1 , x2 ,... , xn ) : 0 ≤ x1 ≤ x2 ≤ · · · ≤ xn ≤ 1}, then 1 xn x3 x2 Q(B) = ··· dx1 dx2 · · · dxn−1 dxn 0 0 0 0 1 = , n! where n! = n(n − 1) · · · 3 · 2 · 1. 3 For a discussion of multiple integrals in terms of iterated integrals, see Chapter 3 of Mathe- matical Comments. 10 Probability and Distributions EXERCISES 1.2.1. Find the union C1 ∪ C2 and the intersection C1 ∩ C2 of the two sets C1 and C2 , where (a) C1 = {0, 1, 2, }, C2 = {2, 3, 4}. (b) C1 = {x : 0 < x < 2}, C2 = {x : 1 ≤ x < 3}. (c) C1 = {(x, y) : 0 < x < 2, 1 < y < 2}, C2 = {(x, y) : 1 < x < 3, 1 < y < 3}. 1.2.2. Find the complement C c of the set C with respect to the space C if 5 (a) C = {x : 0 < x < 1}, C = {x : 8 < x < 1}. (b) C = {(x, y, z) : x2 + y 2 + z 2 ≤ 1}, C = {(x, y, z) : x2 + y 2 + z 2 = 1}. (c) C = {(x, y) : |x| + |y| ≤ 2}, C = {(x, y) : x2 + y 2 < 2}. 1.2.3. List all possible arrangements of the four letters m, a, r, and y. Let C1 be the collection of the arrangements in which y is in the last position. Let C2 be the collection of the arrangements in which m is in the first position. Find the union and the intersection of C1 and C2. 1.2.4. Concerning DeMorgan’s Laws (1.2.6) and (1.2.7): (a) Use Venn diagrams to verify the laws. (b) Show that the laws are true. (c) Generalize the laws to countable unions and intersections. 1.2.5. By the use of Venn diagrams, in which the space C is the set of points enclosed by a rectangle containing the circles C1 , C2 , and C3 , compare the following sets. These laws are called the distributive laws. (a) C1 ∩ (C2 ∪ C3 ) and (C1 ∩ C2 ) ∪ (C1 ∩ C3 ). (b) C1 ∪ (C2 ∩ C3 ) and (C1 ∪ C2 ) ∩ (C1 ∪ C3 ). 1.2.6. Show that the following sequences of sets, {Ck }, are nondecreasing, (1.2.16), then find limk→∞ Ck. (a) Ck = {x : 1/k ≤ x ≤ 3 − 1/k}, k = 1, 2, 3,.... (b) Ck = {(x, y) : 1/k ≤ x2 + y 2 ≤ 4 − 1/k}, k = 1, 2, 3,.... 1.2.7. Show that the following sequences of sets, {Ck }, are nonincreasing, (1.2.17), then find limk→∞ Ck. (a) Ck = {x : 2 − 1/k < x ≤ 2}, k = 1, 2, 3,.... (b) Ck = {x : 2 < x ≤ 2 + 1/k}, k = 1, 2, 3,.... 1.2. Sets 11 (c) Ck = {(x, y) : 0 ≤ x2 + y 2 ≤ 1/k}, k = 1, 2, 3,.... 1.2.8. For every one-dimensional set C, define the function Q(C) = C f (x), where f (x) = ( 23 )( 13 )x , x = 0, 1, 2,... , zero elsewhere. If C1 = {x : x = 0, 1, 2, 3} and C2 = {x : x = 0, 1, 2,...}, find Q(C1 ) and Q(C2 ). Hint: Recall that Sn = a + ar + · · · + arn−1 = a(1 − rn )/(1 − r) and, hence, it follows that limn→∞ Sn = a/(1 − r) provided that |r| < 1. 1.2.9. For every one-dimensional set C for which the integral exists, let Q(C) = C f (x) dx, where f (x) = 6x(1 − x), 0 < x < 1, zero elsewhere; otherwise, let Q(C) be undefined. If C1 = {x : 14 < x < 34 }, C2 = { 21 }, and C3 = {x : 0 < x < 10}, find Q(C1 ), Q(C2 ), and Q(C3 ). 1.2.10. For every two-dimensional set C contained in R2 for which the integral 2 2 exists, let Q(C) = C (x + y ) dxdy. If C1 = {(x, y) : −1 ≤ x ≤ 1, −1 ≤ y ≤ 1}, C2 = {(x, y) : −1 ≤ x = y ≤ 1}, and C3 = {(x, y) : x2 + y 2 ≤ 1}, find Q(C1 ), Q(C2 ), and Q(C3 ). 1.2.11. Let C denote the set of points that are interior to, or on the boundary of, a square with opposite vertices at the points (0, 0) and (1, 1). Let Q(C) = C dy dx. (a) If C ⊂ C is the set {(x, y) : 0 < x < y < 1}, compute Q(C). (b) If C ⊂ C is the set {(x, y) : 0 < x = y < 1}, compute Q(C). (c) If C ⊂ C is the set {(x, y) : 0 < x/2 ≤ y ≤ 3x/2 < 1}, compute Q(C). 1.2.12. Let C be the set of points interior to or on the boundary of a cube with edge of length 1. Moreover, say that the cube is in the first octant with one vertex at the point (0, 0, 0) and an opposite vertex at the point (1, 1, 1). Let Q(C) = C dxdydz. (a) If C ⊂ C is the set {(x, y, z) : 0 < x < y < z < 1}, compute Q(C). (b) If C is the subset {(x, y, z) : 0 < x = y = z < 1}, compute Q(C). 1.2.13. Let C denote the set {(x, y, z) : x2 + y 2 + z 2 ≤ 1}. Using spherical coordi- nates, evaluate Q(C) = x2 + y 2 + z 2 dxdydz. C 1.2.14. To join a certain club, a person must be either a statistician or a math- ematician or both. Of the 25 members in this club, 19 are statisticians and 16 are mathematicians. How many persons in the club are both a statistician and a mathematician? 1.2.15. After a hard-fought football game, it was reported that, of the 11 starting players, 8 hurt a hip, 6 hurt an arm, 5 hurt a knee, 3 hurt both a hip and an arm, 2 hurt both a hip and a knee, 1 hurt both an arm and a knee, and no one hurt all three. Comment on the accuracy of the report. 12 Probability and Distributions 1.3 The Probability Set Function Given an experiment, let C denote the sample space of all possible outcomes. As discussed in Section 1.1, we are interested in assigning probabilities to events, i.e., subsets of C. What should be our collection of events? If C is a finite set, then we could take the set of all subsets as this collection. For infinite sample spaces, though, with assignment of probabilities in mind, this poses mathematical technicalities that are better left to a course in probability theory. We assume that in all cases, the collection of events is sufficiently rich to include all possible events of interest and is closed under complements and countable unions of these events. Using DeMorgan’s Laws, (1.2.6)–(1.2.7), the collection is then also closed under countable intersections. We denote this collection of events by B. Technically, such a collection of events is called a σ-field of subsets. Now that we have a sample space, C, and our collection of events, B, we can define the third component in our probability space, namely a probability set function. In order to motivate its definition, we consider the relative frequency approach to probability. Remark 1.3.1. The definition of probability consists of three axioms which we motivate by the following three intuitive properties of relative frequency. Let C be a sample space and let A ⊂ C. Suppose we repeat the experiment N times. Then the relative frequency of A is fA = #{A}/N , where #{A} denotes the number of times A occurred in the N repetitions. Note that fA ≥ 0 and fC = 1. These are the first two properties. For the third, suppose that A1 and A2 are disjoint events. Then fA1 ∪A2 = fA1 + fA2. These three properties of relative frequencies form the axioms of a probability, except that the third axiom is in terms of countable unions. As with the axioms of probability, the readers should check that the theorems we prove below about probabilities agree with their intuition of relative frequency. Definition 1.3.1 (Probability). Let C be a sample space and let B be the set of events. Let P be a real-valued function defined on B. Then P is a probability set function if P satisfies the following three conditions: 1. P (A) ≥ 0, for all A ∈ B. 2. P (C) = 1. 3. If {An } is a sequence of events in B and Am ∩ An = φ for all m = n, then ∞ ∞ P An = P (An ). n=1 n=1 A collection of events whose members are pairwise disjoint, as in (3), is said to be a mutually exclusive collection and its union is often referred to as a disjoint union. The collection is further said to be exhaustive if the union of its events is ∞ the sample space, in which case n=1 P (An ) = 1. We often say that a mutually exclusive and exhaustive collection of events forms a partition of C. 1.3. The Probability Set Function 13 A probability set function tells us how the probability is distributed over the set of events, B. In this sense we speak of a distribution of probability. We often drop the word “set” and refer to P as a probability function. The following theorems give us some other properties of a probability set func- tion. In the statement of each of these theorems, P (A) is taken, tacitly, to be a probability set function defined on the collection of events B of a sample space C. Theorem 1.3.1. For each event A ∈ B, P (A) = 1 − P (Ac ). Proof: We have C = A ∪ Ac and A ∩ Ac = φ. Thus, from (2) and (3) of Definition 1.3.1, it follows that 1 = P (A) + P (Ac ), which is the desired result. Theorem 1.3.2. The probability of the null set is zero; that is, P (φ) = 0. Proof: In Theorem 1.3.1, take A = φ so that Ac = C. Accordingly, we have P (φ) = 1 − P (C) = 1 − 1 = 0 and the theorem is proved. Theorem 1.3.3. If A and B are events such that A ⊂ B, then P (A) ≤ P (B). Proof: Now B = A ∪ (Ac ∩ B) and A ∩ (Ac ∩ B) = φ. Hence, from (3) of Definition 1.3.1, P (B) = P (A) + P (Ac ∩ B). From (1) of Definition 1.3.1, P (Ac ∩ B) ≥ 0. Hence, P (B) ≥ P (A). Theorem 1.3.4. For each A ∈ B, 0 ≤ P (A) ≤ 1. Proof: Since φ ⊂ A ⊂ C, we have by Theorem 1.3.3 that P (φ) ≤ P (A) ≤ P (C) or 0 ≤ P (A) ≤ 1, the desired result. Part (3) of the definition of probability says that P (A ∪ B) = P (A) + P (B) if A and B are disjoint, i.e., A ∩ B = φ. The next theorem gives the rule for any two events regardless if they are disjoint or not. Theorem 1.3.5. If A and B are events in C, then P (A ∪ B) = P (A) + P (B) − P (A ∩ B). Proof: Each of the sets A ∪ B and B can be represented, respectively, as a union of nonintersecting sets as follows: A ∪ B = A ∪ (Ac ∩ B) and B = (A ∩ B) ∪ (Ac ∩ B). (1.3.1) 14 Probability and Distributions That these identities hold for all sets A and B follows from set theory. Also, the Venn diagrams of Figure 1.3.1 offer a verification of them. Thus, from (3) of Definition 1.3.1, P (A ∪ B) = P (A) + P (Ac ∩ B) and P (B) = P (A ∩ B) + P (Ac ∩ B). If the second of these equations is solved for P (Ac ∩ B) and this result is substituted in the first equation, we obtain P (A ∪ B) = P (A) + P (B) − P (A ∩ B). This completes the proof. Panel (a) Panel (b) A B A B A ∪ B = A ∪ (A c ∩ B ) A = (A ∩ B c )∪ (A ∩ B ) Figure 1.3.1: Venn diagrams depicting the two disjoint unions given in expression (1.3.1). Panel (a) depicts the first disjoint union while Panel (b) shows the second disjoint union. Example 1.3.1. Let C denote the sample space of Example 1.1.2. Let the proba- 1 bility set function assign a probability of 36 to each of the 36 points in C; that is, the dice are fair. If C1 = {(1, 1), (2, 1), (3, 1), (4, 1), (5, 1)} and C2 = {(1, 2), (2, 2), (3, 2)}, 5 3 8 then P (C1 ) = 36 , P (C2 ) = 36 , P (C1 ∪ C2 ) = 36 , and P (C1 ∩ C2 ) = 0. Example 1.3.2. Two coins are to be tossed and the outcome is the ordered pair (face on the first coin, face on the second coin). Thus the sample space may be represented as C = {(H, H), (H, T ), (T, H), (T, T )}. Let the probability set function assign a probability of 14 to each element of C. Let C1 = {(H, H), (H, T )} and C2 = {(H, H), (T, H)}. Then P (C1 ) = P (C2 ) = 12 , P (C1 ∩ C2 ) = 14 , and, in accordance with Theorem 1.3.5, P (C1 ∪ C2 ) = 12 + 12 − 14 = 34. 1.3. The Probability Set Function 15 For a finite sample space, we can generate probabilities as follows. Let C = {x1 , x2 ,... , xm } be a finite set of m elements. Let p1 , p2 ,... , pm be fractions such that m 0 ≤ pi ≤ 1 for i = 1, 2,... , m and i=1 pi = 1. (1.3.2) Suppose we define P by P (A) = pi , for all subsets A of C. (1.3.3) xi ∈A Then P (A) ≥ 0 and P (C) = 1. Further, it follows that P (A ∪ B) = P (A) + P (B) when A ∩ B = φ. Therefore, P is a probability on C. For illustration, each of the following four assignments forms a probability on C = {1, 2,... , 6}. For each, we also compute P (A) for the event A = {1, 6}. 1 1 p1 = p 2 = · · · = p 6 = ; P (A) =. (1.3.4) 6 3 p1 = p2 = 0.1, p3 = p4 = p5 = p6 = 0.2; P (A) = 0.3. i 7 pi = , i = 1, 2,... , 6; P (A) =. 21 21 3 3 3 p1 = , p2 = 1 − , p3 = p4 = p5 = p6 = 0.0; P (A) =. π π π Note that the individual probabilities for the first probability set function, (1.3.4), are the same. This is an example of the equilikely case which we now formally define. Definition 1.3.2 (Equilikely Case). Let C = {x1 , x2 ,... , xm } be a finite sample space. Let pi = 1/m for all i = 1, 2,... , m and for all subsets A of C define 1 #(A) P (A) = = , m m xi ∈A where #(A) denotes the number of elements in A. Then P is a probability on C and it is refereed to as the equilikely case. Equilikely cases are frequently probability models of interest. Examples include: the flip of a fair coin; five cards drawn from a well shuffled deck of 52 cards; a spin of a fair spinner with the numbers 1 through 36 on it; and the upfaces of the roll of a pair of balanced dice. For each of these experiments, as stated in the definition, we only need to know the number of elements in an event to compute the probability of that event. For example, a card player may be interested in the probability of getting a pair (two of a kind) in a hand of five cards dealt from a well shuffled deck of 52 cards. To compute this probability, we need to know the number of five card hands and the number of such hands which contain a pair. Because the equilikely case is often of interest, we next develop some counting rules which can be used to compute the probabilities of events of interest. 16 Probability and Distributions 1.3.1 Counting Rules We discuss three counting rules that are usually discussed in an elementary algebra course. The first rule is called the mn-rule (m times n-rule), which is also called the multiplication rule. Let A = {x1 , x2 ,... , xm } be a set of m elements and let B = {y1 , y2 ,... , yn } be a set of n elements. Then there are mn ordered pairs, (xi , yj ), i = 1, 2,... , m and j = 1, 2,... , n, of elements, the first from A and the second from B. Informally, we often speak of ways, here. For example there are five roads (ways) between cities I and II and there are ten roads (ways) between cities II and III. Hence, there are 5 ∗ 10 = 50 ways to get from city I to city III by going from city I to city II and then from city II to city III. This rule extends immediately to more than two sets. For instance, suppose in a certain state that driver license plates have the pattern of three letters followed by three numbers. Then there are 263 ∗ 103 possible license plates in this state. Next, let A be a set with n elements. Suppose we are interested in k-tuples whose components are elements of A. Then by the extended mn rule, there are n · n · · · n = nk such k-tuples whose components are elements of A. Next, suppose k ≤ n and we are interested in k-tuples whose components are distinct (no repeats) elements of A. There are n elements from which to choose for the first component, n − 1 for the second component,... , n − (k − 1) for the kth. Hence, by the mn rule, there are n(n − 1) · · · (n − (k − 1)) such k-tuples with distinct elements. We call each such k-tuple a permutation and use the symbol Pkn to denote the number of k permutations taken from a set of n elements. This number of permutations, Pkn is our second counting rule. We can rewrite it as n! Pkn = n(n − 1) · · · (n − (k − 1)) =. (1.3.5) (n − k)! Example 1.3.3 (Birthday Problem). Suppose there are n people in a room. As- sume that n < 365 and that the people are unrelated in any way. Find the proba- bility of the event A that at least 2 people have the same birthday. For convenience, assign the numbers 1 though n to the people in the room. Then use n-tuples to denote the birthdays of the first person through the nth person in the room. Using the mn-rule, there are 365n possible birthday n-tuples for these n people. This is the number of elements in the sample space. Now assume that birthdays are equilikely to occur on any of the 365 days. Hence, each of these n-tuples has prob- ability 365−n. Notice that the complement of A is the event that all the birthdays in the room are distinct; that is, the number of n-tuples in Ac is Pn365. Thus, the probability of A is P 365 P (A) = 1 − n n. 365 For instance, if n = 2 then P (A) = 1 − (365 ∗ 364)/(3652) = 0.0027. This formula is not easy to compute by hand. The following R function4 computes the P (A) for the input n and it can be downloaded at the sites mentioned in the Preface. 4 An R primer for the course is found in Appendix B. 1.3. The Probability Set Function 17 bday = function(n){ bday = 1; nm1 = n - 1 for(j in 1:nm1){bday = bday*((365-j)/365)} bday source("bday.R") > bday(10) 0.1169482 For our last counting rule, as with permutations, we are drawing from a set A of n elements. Now, suppose order is not important, so instead of counting the number of permutations we want to count the number of subsets of k elements taken from A. We use the symbol nk to denote the total number of these subsets. Consider a subset of k elements from A. By the permutation rule it generates Pkk = k(k − 1) · · · 1 = k! permutations. Furthermore, all these permutations are distinct from the permutations generated by other subsets of k elements from A. Finally, each permutation of k distinct elements drawn from A must be generated by one of these subsets. Hence, we have shown that Pkn = nk k!; that is, n n! =. (1.3.6) k k!(n − k)! We often use the terminology combinations instead of subsets. So we say that there are nk combinations of k things taken from a set of n things. Another common symbol for nk is Ckn. It is interesting to note that if we expand the binomial series, (a + b)n = (a + b)(a + b) · · · (a + b), we get n n n k n−k (a + b) = a b , (1.3.7) k k=0 n n because we can select the k factors from which to take a in k ways. So k is also referred to as a binomial coefficient. Example 1.3.4 (Poker Hands). Let a card be drawn at random from an ordinary deck of 52 playing cards that has been well shuffled. The sample space C consists of 52 elements, each element represents one and only one of the 52 cards. Because the deck has been well shuffled, it is reasonable to assume that each of these outcomes 1 has the same probability 52. Accordingly, if E1 is the set of outcomes that are spades, P (E1 ) = 52 = 4 because there are 13 spades in the deck; that is, 14 is the 13 1 probability of drawing a card that is a spade. If E2 is the set of outcomes that 4 1 1 are kings, P (E2 ) = 52 = 13 because there are 4 kings in the deck; that is, 13 is the probability of drawing a card that is a king. These computations are very easy 18 Probability and Distributions because there are no difficulties in the determination of the number of elements in each event. However, instead of drawing only one card, suppose that five cards are taken, at random and without replacement, from this deck; i.e, a five card poker hand. In this instance, order is not important. So a hand is a subset of five elements drawn from a set of 52 elements. Hence, by (1.3.6) there are 52 5 poker hands. If the deck is well shuffled, each hand should be equilikely; i.e., each hand has probability 1/ 52. We can now compute the probabilities of some interesting poker hands. Let 5 4 E1 be the event of a flush, all five cards of the same suit. There are 1 = 4 suits to choose for the flush and in each suit there are 13 5 possible hands; hence, using the multiplication rule, the probability of getting a flush is 413 4 · 1287 P (E1 ) = 1525 = = 0.00198. 5 2598960 Real poker players note that this includes the probability of obtaining a straight flush. Next, consider the probability of the event E2 of getting exactly three of a kind, (the other two cards are distinct and are of different kinds). Choose the kind for the three, in 13 ways; choose the three, in 43 ways; choose the other two kinds, 1 44 in 12 2 ways; and choose one card from each of these last two kinds, in 1 1 ways. Hence the probability of exactly three of a kind is 1341242 1 3 P (E2 ) = 522 1 = 0.0211. 5 Now suppose that E3 is the set of outcomes in which exactly three cards are kings and exactly two cards are queens. Select the kings, in 43 ways, and select the queens, in 42 ways. Hence, the probability of E3 is 4 4 52 P (E3 ) = = 0.0000093. 3 2 5 The event E3 is an example of a full house: three of one kind and two of another kind. Exercise 1.3.19 asks for the determination of the probability of a full house. 1.3.2 Additional Properties of Probability We end this section with several additional properties of probability which prove useful in the sequel. Recall in Exercise 1.2.6 we said that a sequence of events {Cn } is a nondecreasing sequence if Cn ⊂ Cn+1 , for all n, in which case we wrote limn→∞ Cn = ∪∞ n=1 Cn. Consider limn→∞ P (Cn ). The question is: can we legiti- mately interchange the limit and P ? As the following theorem shows, the answer is yes. The result also holds for a decreasing sequence of events. Because of this interchange, this theorem is sometimes referred to as the continuity theorem of probability. 1.3. The Probability Set Function 19 Theorem 1.3.6. Let {Cn } be a nondecreasing sequence of events. Then ∞ lim P (Cn ) = P ( lim Cn ) = P Cn. (1.3.8) n→∞ n→∞ n=1 Let {Cn } be a decreasing sequence of events. Then ∞ lim P (Cn ) = P ( lim Cn ) = P Cn. (1.3.9) n→∞ n→∞ n=1 Proof. We prove the result (1.3.8) and leave the second result as Exercise 1.3.20. ∞ as R1 = C1 and, for n > 1, Rn = Cn ∩ Cn−1. It c Define the sets, ∞called rings, follows that n=1 Cn = n=1 Rn and that Rm ∩ Rn = φ, for m = n. Also, P (Rn ) = P (Cn ) − P (Cn−1 ). Applying the third axiom of probability yields the following string of equalities: ∞ ∞ ∞ n P lim Cn = P Cn = P Rn = P (Rn ) = lim P (Rj ) n→∞ n→∞ n=1 n=1 n=1 j=1 ⎧ ⎫ ⎨ n ⎬ = lim P (C1 )+ [P (Cj ) − P (Cj−1 )] = lim P (Cn ). (1.3.10) n→∞ ⎩ ⎭ n→∞ j=2 This is the desired result. Another useful result for arbitrary unions is given by Theorem 1.3.7 (Boole’s Inequality). Let {Cn } be an arbitrary sequence of events. Then ∞ ∞ P Cn ≤ P (Cn ). (1.3.11) n=1 n=1 n Proof: Let Dn = i=1 Ci. Then {Dn } is an increasing sequence of events that go ∞ up to n=1 Cn. Also, for all j, Dj = Dj−1 ∪ Cj. Hence, by Theorem 1.3.5, P (Dj ) ≤ P (Dj−1 ) + P (Cj ), that is, P (Dj ) − P (Dj−1 ) ≤ P (Cj ). In this case, the Ci s are replaced by the Di s in expression (1.3.10). Hence, using the above inequality in this expression and the fact that P (C1 ) = P (D1 ), we have ∞ ∞ ⎧ ⎫ ⎨ n ⎬ P Cn = P Dn = lim P (D1 ) + [P (Dj ) − P (Dj−1 )] n→∞ ⎩ ⎭ n=1 n=1 j=2 n ∞ ≤ lim P (Cj ) = P (Cn ). n→∞ j=1 n=1 20 Probability and Distributions Theorem 1.3.5 gave a general additive law of probability for the union of two events. As the next remark shows, this can be extended to an additive law for an arbitrary union. Remark 1.3.2 (Inclusion Exclusion Formula). It is easy to show (Exercise 1.3.9) that P (C1 ∪ C2 ∪ C3 ) = p1 − p2 + p3 , where p1 = P (C1 ) + P (C2 ) + P (C3 ) p2 = P (C1 ∩ C2 ) + P (C1 ∩ C3 ) + P (C2 ∩ C3 ) p3 = P (C1 ∩ C2 ∩ C3 ). (1.3.12) This can be generalized to the inclusion exclusion formula: P (C1 ∪ C2 ∪ · · · ∪ Ck ) = p1 − p2 + p3 − · · · + (−1)k+1 pk , (1.3.13) where pi equals the sum of the probabilities of all possible intersections involving i sets. When k = 3, it follows that p1 ≥ p2 ≥ p3 , but more generally p1 ≥ p2 ≥ · · · ≥ pk. As shown in Theorem 1.3.7, p1 = P (C1 ) + P (C2 ) + · · · + P (Ck ) ≥ P (C1 ∪ C2 ∪ · · · ∪ Ck ). For k = 2, we have 1 ≥ P (C1 ∪ C2 ) = P (C1 ) + P (C2 ) − P (C1 ∩ C2 ), which gives Bonferroni’s inequality, P (C1 ∩ C2 ) ≥ P (C1 ) + P (C2 ) − 1, (1.3.14) that is only useful when P (C1 ) and P (C2 ) are large. The inclusion exclusion formula provides other inequalities that are useful, such as p1 ≥ P (C1 ∪ C2 ∪ · · · ∪ Ck ) ≥ p1 − p2 and p1 − p2 + p3 ≥ P (C1 ∪ C2 ∪ · · · ∪ Ck ) ≥ p1 − p2 + p3 − p4. EXERCISES 1.3.1. A positive integer from one to six is to be chosen by casting a die. Thus the elements c of the sample space C are 1, 2, 3, 4, 5, 6. Suppose C1 = {1, 2, 3, 4} and C2 = {3, 4, 5, 6}. If the probability set function P assigns a probability of 16 to each of the elements of C, compute P (C1 ), P (C2 ), P (C1 ∩ C2 ), and P (C1 ∪ C2 ). 1.3. The Probability Set Function 21 1.3.2. A random experiment consists of drawing a card from an ordinary deck of 1 52 playing cards. Let the probability set function P assign a probability of 52 to each of the 52 possible outcomes. Let C1 denote the collection of the 13 hearts and let C2 denote the collection of the 4 kings. Compute P (C1 ), P (C2 ), P (C1 ∩ C2 ), and P (C1 ∪ C2 ). 1.3.3. A coin is to be tossed as many times as necessary to turn up one head. Thus the elements c of the sample space C are H, T H, T T H, T T T H, and so forth. Let the probability set function P assign to these elements the respec- tive probabilities 12 , 14 , 18 , 16 1 , and so forth. Show that P (C) = 1. Let C1 = {c : c is H, T H, T T H, T T T H, or T T T T H}. Compute P (C1 ). Next, suppose that C2 = {c : c is T T T T H or T T T T T H}. Compute P (C2 ), P (C1 ∩ C2 ), and P (C1 ∪ C2 ). 1.3.4. If the sample space is C = C1 ∪ C2 and if P (C1 ) = 0.8 and P (C2 ) = 0.5, find P (C1 ∩ C2 ). 1.3.5. Let the sample space be C = {c : 0 < c < ∞}. Let C ⊂ C be defined by C = {c : 4 < c < ∞} and take P (C) = C e−x dx. Show that P (C) = 1. Evaluate P (C), P (C c ), and P (C ∪ C c ). 1.3.6. If the sample space is C = {c : −∞ < c < ∞} and if C ⊂ C is a set for which the integral C e−|x| dx exists, show that this set function is not a probability set function. What constant do we multiply the integrand by to make it a probability set function? 1.3.7. If C1 and C2 are subsets of the sample space C, show that P (C1 ∩ C2 ) ≤ P (C1 ) ≤ P (C1 ∪ C2 ) ≤ P (C1 ) + P (C2 ). 1.3.8. Let C1 , C2 , and C3 be three mutually disjoint subsets of the sample space C. Find P [(C1 ∪ C2 ) ∩ C3 ] and P (C1c ∪ C2c ). 1.3.9. Consider Remark 1.3.2. (a) If C1 , C2 , and C3 are subsets of C, show that P (C1 ∪ C2 ∪ C3 ) = P (C1 ) + P (C2 ) + P (C3 ) − P (C1 ∩ C2 ) − P (C1 ∩ C3 ) − P (C2 ∩ C3 ) + P (C1 ∩ C2 ∩ C3 ). (b) Now prove the general inclusion exclusion formula given by the expression (1.3.13). Remark 1.3.3. In order to solve Exercises (1.3.10)–(1.3.19), certain reasonable assumptions must be made. 1.3.10. A bowl contains 16 chips, of which 6 are red, 7 are white, and 3 are blue. If four chips are taken at random and without replacement, find the probability that: (a) each of the four chips is red; (b) none of the four chips is red; (c) there is at least one chip of each color. 22 Probability and Distributions 1.3.11. A person has purchased 10 of 1000 tickets sold in a certain raffle. To determine the five prize winners, five tickets are to be drawn at random and without replacement. Compute the probability that this person wins at least one prize. Hint: First compute the probability that the person does not win a prize. 1.3.12. Compute the probability of being dealt at random and without replacement a 13-card bridge hand consisting of: (a) 6 spades, 4 hearts, 2 diamonds, and 1 club; (b) 13 cards of the same suit. 1.3.13. Three distinct integers are chosen at random from the first 20 positive integers. Compute the probability that: (a) their sum is even; (b) their product is even. 1.3.14. There are five red chips and three blue chips in a bowl. The red chips are numbered 1, 2, 3, 4, 5, respectively, and the blue chips are numbered 1, 2, 3, respectively. If two chips are to be drawn at random and without replacement, find the probability that these chips have either the same number or the same color. 1.3.15. In a lot of 50 light bulbs, there are 2 bad bulbs. An inspector examines five bulbs, which are selected at random and without replacement. (a) Find the probability of at least one defective bulb among the five. (b) How many bulbs should be examined so that the probability of finding at least one bad bulb exceeds 12 ? 1.3.16. For the birthday problem, Example 1.3.3, use the given R function bday to determine the value of n so that p(n) ≥ 0.5 and p(n − 1) < 0.5, where p(n) is the probability that at least two people in the room of n people have the same birthday. 1.3.17. If C1 ,... , Ck are k events in the sample space C, show that the probability that at least one of the events occurs is one minus the probability that none of them occur; i.e., P (C1 ∪ · · · ∪ Ck ) = 1 − P (C1c ∩ · · · ∩ Ckc ). (1.3.15) 1.3.18. A secretary types three letters and the three corresponding envelopes. In a hurry, he places at random one letter in each envelope. What is the probability that at least one letter is in the correct envelope? Hint: Let Ci be the event that the ith letter is in the correct envelope. Expand P (C1 ∪ C2 ∪ C3 ) to determine the probability. 1.3.19. Consider poker hands drawn from a well-shuffled deck as described in Ex- ample 1.3.4. Determine the probability of a full house, i.e, three of one kind and two of another. 1.3.20. Prove expression (1.3.9). 1.3.21. Suppose the experiment is to choose a real number at random in the in- terval (0, 1). For any subinterval (a, b) ⊂ (0, 1), it seems reasonable to assign the probability P [(a, b)] = b − a; i.e., the probability of selecting the point from a subin- terval is directly proportional to the length of the subinterval. If this is the case, choose an appropriate sequence of subintervals and use expression (1.3.9) to show that P [{a}] = 0, for all a ∈ (0, 1). 1.4. Conditional Probability and Independence 23 1.3.22. Consider the events C1 , C2 , C3. (a) Suppose C1 , C2 , C3 are mutually exclusive events. If P (Ci ) = pi , i = 1, 2, 3, what is the restriction on the sum p1 + p2 + p3 ? (b) In the notation of part (a), if p1 = 4/10, p2 = 3/10, and p3 = 5/10, are C1 , C2 , C3 mutually exclusive? For the last two exercises it is assumed that the reader is familiar with σ-fields. 1.3.23. Suppose D is a nonempty collection of subsets of C. Consider the collection of events B = ∩{E : D ⊂ E and E is a σ-field}. Note that φ ∈ B because it is in each σ-field, and, hence, in particular, it is in each σ-field E ⊃ D. Continue in this way to show that B is a σ-field. 1.3.24. Let C = R, where R is the set of all real numbers. Let I be the set of all open intervals in R. The Borel σ-field on the real line is given by B0 = ∩{E : I ⊂ E and E is a σ-field}. By definition, B0 contains the open intervals. Because [a, ∞) = (−∞, a)c and B0 is closed under complements, it contains all intervals of the form [a, ∞), for a ∈ R. Continue in this way and show that B0 contains all the closed and half-open intervals of real numbers. 1.4 Conditional Probability and Independence In some random experiments, we are interested only in those outcomes that are elements of a subset A of the sample space C. This means, for our purposes, that the sample space is effectively the subset A. We are now confronted with the problem of defining a probability set function with A as the “new” sample space. Let the probability set function P (A) be defined on the sample space C and let A be a subset of C such that P (A) > 0. We agree to consider only those outcomes of the random experiment that are elements of A; in essence, then, we take A to be a sample space. Let B be another subset of C. How, relative to the new sample space A, do we want to define the probability of the event B? Once defined, this probability is called the conditional probability of the event B, relative to the hypothesis of the event A, or, more briefly, the conditional probability of B, given A. Such a conditional probability is denoted by the symbol P (B|A). The “|” in this symbol is usually read as “given.” We now return to the question that was raised about the definition of this symbol. Since A is now the sample space, the only elements of B that concern us are those, if any, that are also elements of A, that is, the elements of A ∩ B. It seems desirable, then, to define the symbol P (B|A) in such a way that P (A|A) = 1 and P (B|A) = P (A ∩ B|A). 24 Probability and Distributions Moreover, from a relative frequency point of view, it would seem logically incon- sistent if we did not require that the ratio of the probabilities of the events A ∩ B and A, relative to the space A, be the same as the ratio of the probabilities of these events relative to the space C; that is, we should have P (A ∩ B|A) P (A ∩ B) =. P (A|A) P (A) These three desirable conditions imply that the relation conditional probability is reasonably defined as Definition 1.4.1 (Conditional Probability). Let B and A be events with P (A) > 0. Then we defined the conditional probability of B given A as P (A ∩ B) P (B|A) =. (1.4.1) P (A) Moreover, we have 1. P (B|A) ≥ 0. 2. P (A|A) = 1. ∞ 3. P (∪∞n=1 Bn |A) = n=1 P (Bn |A), provided that B1 , B2 ,... are mutually ex- clusive events. Properties (1) and (2) are evident. For Property (3), suppose the sequence of events B1 , B2 ,... is mutually exclusive. It follows that also (Bn ∩A)∩(Bm ∩A) = φ, n = m. Using this and the first of the distributive laws (1.2.5) for countable unions, we have P [∪∞ n=1 (Bn ∩ A)] P (∪∞ n=1 Bn |A) = P (A) ∞ P [Bn ∩ A] = n=1 P (A) ∞ = P [Bn |A]. n=1 Properties (1)–(3) are precisely the conditions that a probability set function must satisfy. Accordingly, P (B|A) is a probability set function, defined for subsets of A. It may be called the conditional probability set function, relative to the hypothesis A, or the conditional probability set function, given A. It should be noted that this conditional probability set function, given A, is defined at this time only when P (A) > 0. Example 1.4.1. A hand of five cards is to be dealt at random without replacement from an ordinary deck of 52 playing cards. The conditional probability of an all- spade hand (B), relative to the hypothesis that there are at least four spades in the 1.4. Conditional Probability and Independence 25 hand (A), is, since A ∩ B = B, 13 52 P (B) / 5 P (B|A) = = 13395 13 52 P (A) 4 1 + 5 / 5 13 = 13395 13 = 0.0441. 4 1 + 5 Note that this is not the same as drawing for a spade to complete a flush in draw poker; see Exercise 1.4.3. From the definition of the conditional probability set function, we observe that P (A ∩ B) = P (A)P (B|A). This relation is frequently called the multiplication rule for probabilities. Some- times, after considering the nature of the random experiment, it is possible to make reasonable assumptions so that both P (A) and P (B|A) can be assigned. Then P (A ∩ B) can be computed under these assumptions. This is illustrated in Exam- ples 1.4.2 and 1.4.3. Example 1.4.2. A bowl contains eight chips. Three of the chips are red and the remaining five are blue. Two chips are to be drawn successively, at random and without replacement. We want to compute the probability that the first draw results in a red chip (A) and that the second draw results in a blue chip (B). It is reasonable to assign the following probabilities: 3 P (A) = 8 and P (B|A) = 57. Thus, under these assignments, we have P (A ∩ B) = ( 38 )( 57 ) = 15 56 = 0.2679. Example 1.4.3. From an ordinary deck of playing cards, cards are to be drawn successively, at random and without replacement. The probability that the third spade appears on the sixth draw is computed as follows. Let A be the event of two spades in the first five draws and let B be the event of a spade on the sixth draw. Thus the probability that we wish to compute is P (A ∩ B). It is reasonable to take 1339 11 P (A) = 2523 = 0.2743 and P (B|A) = = 0.2340. 5 47 The desired probability P (A ∩ B) is then the product of these two numbers, which to four places is 0.0642. The multiplication rule can be extended to three or more events. In the case of three events, we have, by using the multiplication rule for two events, P (A ∩ B ∩ C) = P [(A ∩ B) ∩ C] = P (A ∩ B)P (C|A ∩ B). 26 Probability and Distributions But P (A ∩ B) = P (A)P (B|A). Hence, provided P (A ∩ B) > 0, P (A ∩ B ∩ C) = P (A)P (B|A)P (C|A ∩ B). This procedure can be used to extend the multiplication rule to four or more events. The general formula for k events can be proved by mathematical induction. Example 1.4.4. Four cards are to be dealt successively, at random and without replacement, from an ordinary deck of playing cards. The probability of receiving a spade, a heart, a diamond, and a club, in that order, is ( 13 13 13 13 52 )( 51 )( 50 )( 49 ) = 0.0044. This follows from the extension of the multiplication rule. Consider k mutually exclusive and exhaustive events A1 , A2 ,... , Ak such that P (Ai ) > 0, i = 1, 2,... , k; i.e., A1 , A2 ,... , Ak form a partition of C. Here the events A1 , A2 ,... , Ak do not need to be equally likely. Let B be another event such that P (B) > 0. Thus B occurs with one and only one of the events A1 , A2 ,... , Ak ; that is, B = B ∩ (A1 ∪ A2 ∪ · · · Ak ) = (B ∩ A1 ) ∪ (B ∩ A2 ) ∪ · · · ∪ (B ∩ Ak ). Since B ∩ Ai , i = 1, 2,... , k, are mutually exclusive, we have P (B) = P (B ∩ A1 ) + P (B ∩ A2 ) + · · · + P (B ∩ Ak ). However, P (B ∩ Ai ) = P (Ai )P (B|Ai ), i = 1, 2,... , k; so P (B) = P (A1 )P (B|A1 ) + P (A2 )P (B|A2 ) + · · · + P (Ak )P (B|Ak ) k = P (Ai )P (B|Ai ). (1.4.2) i=1 This result is sometimes called the law of total probability and it leads to the following important theorem. Theorem 1.4.1 (Bayes). Let A1 , A2 ,... , Ak be events such that P (Ai ) > 0, i = 1, 2,... , k. Assume further that A1 , A2 ,... , Ak form a partition of the sample space C. Let B be any event. Then P (Aj )P (B|Aj ) P (Aj |B) = k , (1.4.3) i=1 P (Ai )P (B|Ai ) Proof: Based on the definition of conditional probability, we have P (B ∩ Aj ) P (Aj )P (B|Aj ) P (Aj |B) = =. P (B) P (B) The result then follows by the law of total probability, (1.4.2). This theorem is the well-known Bayes’ Theorem. This permits us to calculate the conditional probability of Aj , given B, from the probabilities of A1 , A2 ,... , Ak and the conditional probabilities of B, given Ai , i = 1, 2,... , k. The next three examples illustrate the usefulness of Bayes Theorem to determine probabilities. 1.4. Conditional Probability and Independence 27 Example 1.4.5. Say it is known that bowl A1 contains three red and seven blue chips and bowl A2 contains eight red and two blue chips. All chips are identical in size and shape. A die is cast and bowl A1 is selected if five or six spots show on the side that is up; otherwise, bowl A2 is selected. For this situation, it seems reasonable to assign P (A1 ) = 26 and P (A2 ) = 46. The selected bowl is handed to another person and one chip is taken at random. Say that this chip is red, an event which we denote by B. By considering the contents of the bowls, it is reasonable 3 8 to assign the conditional probabilities P (B|A1 ) = 10 and P (B|A2 ) = 10. Thus the conditional probability of bowl A1 , given that a red chip is drawn, is P (A1 )P (B|A1 ) P (A1 |B) = P (A1 )P (B|A1 ) + P (A2 )P (B|A2 ) ( 26 )( 10 3 ) 3 = =. ( 6 )( 10 ) + ( 46 )( 10 2 3 8 ) 19 16 In a similar manner, we have P (A2 |B) = 19. In Example 1.4.5, the probabilities P (A1 ) = 26 and P (A2 ) = 46 are called prior probabilities of A1 and A2 , respectively, because they are known to be due to the random mechanism used to select the bowls. After the chip is taken and is observed 3 to be red, the conditional probabilities P (A1 |B) = 19 and P (A2 |B) = 1619 are called posterior probabilities. Since A2 has a larger proportion of red chips than does A1 , it appeals to one’s intuition that P (A2 |B) should be larger than P (A2 ) and, of course, P (A1 |B) should be smaller than P (A1 ). That is, intuitively the chances of having bowl A2 are better once that a red chip is observed than before a chip is taken. Bayes’ theorem provides a method of determining exactly what those probabilities are. Example 1.4.6. Three plants, A1 , A2 , and A3 , produce respectively, 10%, 50%, and 40% of a company’s output. Although plant A1 is a small plant, its manager believes in high quality and only 1% of its products are defective. The other two, A2 and A3 , are worse and produce items that are 3% and 4% defective, respectively. All products are sent to a central warehouse. One item is selected at random and observed to be defective, say event B. The conditional probability that it comes from plant A1 is found as follows. It is natural to assign the respective prior probabilities of getting an item from the plants as P (A1 ) = 0.1, P (A2 ) = 0.5, and P (A3 ) = 0.4, while the conditional probabilities of defective items are P (B|A1 ) = 0.01, P (B|A2 ) = 0.03, and P (B|A3 ) = 0.04. Thus the posterior probability of A1 , given a defective, is P (A1 ∩ B) (0.10)(0.01) 1 P (A1 |B) = = =. P (B) (0.1)(0.01) + (0.5)(0.03) + (0.4)(0.04) 32 1 This is much smaller than the prior probability P (A1 ) = 10. This is as it should be because the fact that the item is defective decreases the chances that it comes from the high-quality plant A1. 28 Probability and Distributions Example 1.4.7. Suppose we want to investigate the percentage of abused children in a certain population. The events of interest are: a child is abused (A) and its complement a child is not abused (N = Ac ). For the purposes of this example, we assume that P (A) = 0.01 and, hence, P (N ) = 0.99. The classification as to whether a child is abused or not is based upon a doctor’s examination. Because doctors are not perfect, they sometimes classify an abused child (A) as one that is not abused (ND , where ND means classified as not abused by a doctor). On the other hand, doctors sometimes classify a nonabused child (N ) as abused (AD ). Suppose these error rates of misclassification are P (ND | A) = 0.04 and P (AD | N ) = 0.05; thus the probabilities of correct decisions are P (AD | A) = 0.96 and P (ND | N ) = 0.95. Let us compute the probability that a child taken at random is classified as abused by a doctor. Because this can happen in two ways, A ∩ AD or N ∩ AD , we have P (AD ) = P (AD | A)P (A) + P (AD | N )P (N ) = (0.96)(0.01) + (0.05)(0.99) = 0.0591, which is quite high relative to the probability of an abused child, 0.01. Further, the probability that a child is abused when the doctor classified the child as abused is P (A ∩ AD ) (0.96)(0.01) P (A | AD ) = = = 0.1624, P (AD ) 0.0591 which is quite low. In the same way, the probability that a child is not abused when the doctor classified the child as abused is 0.8376, which is quite high. The reason that these probabilities are so poor at recording the true situation is that the doctors’ error rates are so high relative to the fraction 0.01 of the population that is abused. An investigation such as this would, hopefully, lead to better training of doctors for classifying abused children. See also Exercise 1.4.17. 1.4.1 Independence Sometimes it happens that the occurrence of event A does not change the probability of event B; that is, when P (A) > 0, P (B|A) = P (B). In this case, we say that the events A and B are independent. Moreover, the multiplication rule becomes P (A ∩ B) = P (A)P (B|A) = P (A)P (B). (1.4.4) This, in turn, implies, when P (B) > 0, that P (A ∩ B) P (A)P (B) P (A|B) = = = P (A). P (B) P (B) Note that if P (A) > 0 and P (B) > 0, then by the above discussion, independence is equivalent to P (A ∩ B) = P (A)P (B). (1.4.5) What if either P (A) = 0 or P (B) = 0? In either case, the right side of (1.4.5) is 0. However, the left side is 0 also because A ∩ B ⊂ A and A ∩ B ⊂ B. Hence, we take Equation (1.4.5) as our formal definition of independence; that is, 1.4. Conditional Probability and Independence 29 Definition 1.4.2. Let A and B be two events. We say that A and B are inde- pendent if P (A ∩ B) = P (A)P (B). Suppose A and B are independent events. Then the following three pairs of events are independent: Ac and B, A and B c , and Ac and B c. We show the first and leave the other two to the exercises; see Exercise 1.4.11. Using the disjoint union, B = (Ac ∩ B) ∪ (A ∩ B), we have P (Ac ∩B) = P (B)−P (A∩B) = P (B)−P (A)P (B) = [1−P (A)]P (B) = P (Ac )P (B). (1.4.6) Hence, Ac and B are also independent. Remark 1.4.1. Events that are independent are sometimes called statistically in- dependent, stochastically independent, or independent in a probability sense. In most instances, we use independent without a modifier if there is no possibility of misunderstanding. Example 1.4.8. A red die and a white die are cast in such a way that the numbers of spots on the two sides that are up are independent events. If A represents a four on the red die and B represents a three on the white die, with an equally likely assumption for each side, we assign P (A) = 16 and P (B) = 16. Thus, from independence, the probability of the ordered pair (red = 4, white = 3) is P [(4, 3)] = ( 16 )( 16 ) = 1 36. The probability that the sum of the up spots of the two dice equals seven is P [(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)] = 16 16 + 16 16 + 16 16 + 16 16 + 16 16 + 16 16 = 6 36. In a similar manner, it is easy to show that the probabilities of the sums of the upfaces 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 are, respectively, 1 2 3 4 5 6 5 4 3 2 1 36 , 36 , 36 , 36 , 36 , 36 , 36 , 36 , 36 , 36 , 36. Suppose now that we have three events, A1 , A2 , and A3. We say that they are mutually independent if and only if they are pairwise independent : P (A1 ∩ A3 ) = P (A1 )P (A3 ), P (A1 ∩ A2 ) = P (A1 )P (A2 ), P (A2 ∩ A3 ) = P (A2 )P (A3 ), and P (A1 ∩ A2 ∩ A3 ) = P (A1 )P (A2 )P (A3 ). More generally, the n events A1 , A2 ,... , An are mutually independent if and only if for every collection of k of these events, 2 ≤ k ≤ n, and for every permutation d1 , d2 ,... , dk of 1, 2,... , k, P (Ad1 ∩ Ad2 ∩ · · · ∩ Adk ) = P (Ad1 )P (Ad2 ) · · · P (Adk ). 30 Probability and Distributions In particular, if A1 , A2 ,... , An are mutually independent, then P (A1 ∩ A2 ∩ · · · ∩ An ) = P (A1 )P (A2 ) · · · P (An ). Also, as with two sets, many combinations of these events and their complements are independent, such as 1. The events Ac1 and A2 ∪ Ac3 ∪ A4 are independent, 2. The events A1 ∪ Ac2 , Ac3 and A4 ∩ Ac5 are mutually independent. If there is no possibility of misunderstanding, independent is often used without the modifier mutually when considering more than two events. Example 1.4.9. Pairwise independence does not imply mutual independence. As an example, suppose we twice spin a fair spinner with the numbers 1, 2, 3, and 4. Let A1 be the event that the sum of the numbers spun is 5, let A2 be the event that the first number spun is a 1, and let A3 be the event that the second number spun is a 4. Then P (Ai ) = 1/4, i = 1, 2, 3, and for i = j, P (Ai ∩ Aj ) = 1/16. So the three events are pairwise independent. But A1 ∩ A2 ∩ A3 is the event that (1, 4) is spun, which has probability 1/16 = 1/64 = P (A1 )P (A2 )P (A3 ). Hence the events A1 , A2 , and A3 are not mutually independent. We often perform a sequence of random experiments in such a way that the events associated with one of them are independent of the events associated with the others. For convenience, we refer to these events as as outcomes of independent experiments, meaning that the respective events are independent. Thus we often refer to independent flips of a coin or independent casts of a die or, more generally, independent trials of some given random experiment. Example 1.4.10. A coin is flipped independently several times. Let the event Ai represent a head (H) on the ith toss; thus Aci represents a tail (T). Assume that Ai and Aci are equally likely; that is, P (Ai ) = P (Aci ) = 12. Thus the probability of an ordered sequence like HHTH is, from independence, P (A1 ∩ A2 ∩ Ac3 ∩ A4 ) = P (A1 )P (A2 )P (Ac3 )P (A4 ) = ( 12 )4 = 1 16. Similarly, the probability of observing the first head on the third flip is P (Ac1 ∩ Ac2 ∩ A3 ) = P (Ac1 )P (Ac2 )P (A3 ) = ( 12 )3 = 18. Also, the probability of getting at least one head on four flips is P (A1 ∪ A2 ∪ A3 ∪ A4 ) = 1 − P [(A1 ∪ A2 ∪ A3 ∪ A4 )c ] = 1 − P (Ac1 ∩ Ac2 ∩ Ac3 ∩ Ac4 ) 4 = 1 − 21 = 15 16. See Exercise 1.4.13 to justify this last probability. 1.4. Conditional Probability and Independence 31 Example 1.4.11. A computer system is built so that if component K1 fails, it is bypassed and K2 is used. If K2 fails, then K3 is used. Suppose that the probability that K1 fails is 0.01, that K2 fails is 0.03, and that K3 fails is 0.08. Moreover, we can assume that the failures are mutually independent events. Then the probability of failure of the system is (0.01)(0.03)(0.08) = 0.000024, as all three components would have to fail. Hence, the probability that the system does not fail is 1 − 0.000024 = 0.999976. 1.4.2 Simulations Many of the exercises at the end of this section are designed to aid the reader in his/her understanding of the concepts of conditional probability and independence. With diligence and patience, the reader will derive the exact answer. Many real life problems, though, are too complicated to allow for exact derivation. In such cases, scientists often turn to computer simulations to estimate the answer. As an example, suppose for an experiment, we want to obtain P (A) for some event A. A program is written that performs one trial (one simulation) of the experiment and it records whether or not A occurs. We then obtain n independent simulations (runs) of the program. Denote by p̂n the proportion of these n simulations in which A occurred. Then p̂n is our estimate of the P (A). Besides the estimation of P (A), we also obtain an error of estimation given by 1.96 ∗ p̂n (1 − p̂n )/n. As we discuss theoretically in Chapter 4, we are 95% confident that P (A) lies in the interval ! p̂n (1 − p̂n ) p̂n ± 1.96. (1.4.7) n In Chapter 4, we call this interval a 95% confidence interval for P (A). For now, we make use of this confidence interval for our simulations. Example 1.4.12. As an example, consider the game: Person A tosses a coin and then person B rolls a die. This is repeated independently until a head or one of the numbers 1, 2, 3, 4 appears, at which time the game is stopped. Person A wins with the head and B wins with one of the numbers 1, 2, 3, 4. Compute the probability P (A) that person A wins the game. For an exact derivation, notice that it is implicit in the statement A wins the game that the game is completed. Using abbreviated notation, the game is completed if H or T {1,... , 4} occurs. Using independence, the probability that A wins is thus the conditional probability (1/2)/[(1/2) + (1/2)(4/6)] = 3/5. The following R function, abgame, simulates the problem. This function can be downloaded and sourced at the site discussed in the Preface. The first line of the program sets up the draws for persons A and B, respectively. The second line sets up a flag for the while loop and the returning values, Awin and Bwin are initialized 32 Probability and Distributions at 0. The command sample(rngA,1,pr=pA) draws a sample of size 1 from rngA with pmf pA. Each execution of the while loop returns one complete game. Further, the executions are independent of one another. abgame