Introduction to Mathematical Statistics PDF
Document Details
Uploaded by CorrectHeliotrope6479
2019
Robert V. Hogg, Joseph W. McKean, Allen T. Craig
Tags
Related
Summary
This textbook, 'Introduction to Mathematical Statistics,' 8th edition by Hogg, McKean, and Craig, provides a comprehensive introduction to statistical methods. The book covers fundamental concepts including probability, distributions, and estimation techniques.
Full Transcript
This page intentionally left blank Introduction to Mathematical Statistics Eighth Edition Robert V. Hogg University of Iowa Joseph W. McKean Western Michigan University Allen T. Craig Late Professor of Statistics Uni...
This page intentionally left blank Introduction to Mathematical Statistics Eighth Edition Robert V. Hogg University of Iowa Joseph W. McKean Western Michigan University Allen T. Craig Late Professor of Statistics University of Iowa Director, Portfolio Management: Deirdre Lynch Courseware Portfolio Manager: Patrick Barbera Portfolio Management Assistant: Morgan Danna Content Producer: Lauren Morse Managing Producer: Scott Disanno Product Marketing Manager: Yvonne Vannatta Field Marketing Manager: Evan St. Cyr Marketing Assistant: Jon Bryant Senior Author Support/Technology Specialist: Joe Vetere Manager, Rights and Permissions: Gina Cheselka Manufacturing Buyer: Carol Melville, LSC Communications Art Director: Barbara Atkinson Production Coordination and Illustrations: Integra Cover Design: Studio Montage Cover Image: Aleksandarvelasevic/Digital Vision Vectors/Getty Images. Copyright 2019, 2013, 2005 by Pearson Education, Inc. All Rights Reserved. Printed in the United States of America. This publication is protected by copyright, and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval sys- tem, or transmission in any form or by any means, electronic, mechanical, photocopying, record- ing, or otherwise. For information regarding permissions, request forms and the appropriate contacts within the Pearson Education Global Rights & Permissions department, please visit www.pearsoned.com/permissions/. PEARSON and ALWAYS LEARNING are exclusive trademarks owned by Pearson Education, Inc. or its affiliates in the U.S. and/or other countries. Unless otherwise indicated herein, any third-party trademarks that may appear in this work are the property of their respective owners and any references to third-party trademarks, logos or other trade dress are for demonstrative or descriptive purposes only. Such references are not intended to imply any sponsorship, endorsement, authorization, or promotion of Pearson’s products by the owners of such marks, or any relationship between the owner and Pearson Education, Inc. or its affiliates, authors, licensees or distributors. Library of Congress Cataloging-in-Publications Data Names: Hogg, Robert V., author. | McKean, Joseph W., 1944- author. | Craig, Allen T. (Allen Thornton), 1905- author. Title: Introduction to mathematical statistics / Robert V. Hogg, Late Professor of Statistics, University of Iowa, Joseph W. McKean, Western Michigan University, Allen T. Craig, Late Professor of Statistics, University of Iowa. Description: Eighth edition. | Boston : Pearson, | Includes bibliographical references and index. Identifiers: LCCN 2017033015| ISBN 9780134686998 | ISBN 0134686993 Subjects: LCSH: Mathematical statistics. Classification: LCC QA276.H59 2019 | DDC 519.5–dc23 LC record available at https://lccn.loc.gov/2017033015 ISBN 13: 978-0-13-468699-8 ISBN 10: 0-13-468699-3 Dedicated to my wife Marge and to the memory of Bob Hogg This page intentionally left blank Contents Preface xi 1 Probability and Distributions 1 1.1 Introduction................................ 1 1.2 Sets.................................... 3 1.2.1 Review of Set Theory...................... 4 1.2.2 Set Functions........................... 7 1.3 The Probability Set Function...................... 12 1.3.1 Counting Rules.......................... 16 1.3.2 Additional Properties of Probability.............. 18 1.4 Conditional Probability and Independence............... 23 1.4.1 Independence........................... 28 1.4.2 Simulations............................ 31 1.5 Random Variables............................ 37 1.6 Discrete Random Variables....................... 45 1.6.1 Transformations......................... 47 1.7 Continuous Random Variables..................... 49 1.7.1 Quantiles............................. 51 1.7.2 Transformations......................... 53 1.7.3 Mixtures of Discrete and Continuous Type Distributions... 56 1.8 Expectation of a Random Variable................... 60 1.8.1 R Computation for an Estimation of the Expected Gain... 65 1.9 Some Special Expectations....................... 68 1.10 Important Inequalities.......................... 78 2 Multivariate Distributions 85 2.1 Distributions of Two Random Variables................ 85 2.1.1 Marginal Distributions...................... 89 2.1.2 Expectation............................ 93 2.2 Transformations: Bivariate Random Variables............. 100 2.3 Conditional Distributions and Expectations.............. 109 2.4 Independent Random Variables..................... 117 2.5 The Correlation Coefficient....................... 125 2.6 Extension to Several Random Variables................ 134 v vi Contents 2.6.1 ∗ Multivariate Variance-Covariance Matrix........... 140 2.7 Transformations for Several Random Variables............ 143 2.8 Linear Combinations of Random Variables............... 151 3 Some Special Distributions 155 3.1 The Binomial and Related Distributions................ 155 3.1.1 Negative Binomial and Geometric Distributions........ 159 3.1.2 Multinomial Distribution.................... 160 3.1.3 Hypergeometric Distribution.................. 162 3.2 The Poisson Distribution........................ 167 3.3 The Γ, χ2 , and β Distributions..................... 173 3.3.1 The χ2 -Distribution....................... 178 3.3.2 The β-Distribution........................ 180 3.4 The Normal Distribution......................... 186 3.4.1 ∗ Contaminated Normals..................... 193 3.5 The Multivariate Normal Distribution................. 198 3.5.1 Bivariate Normal Distribution.................. 198 3.5.2 ∗ Multivariate Normal Distribution, General Case....... 199 3.5.3 ∗ Applications........................... 206 3.6 t- and F -Distributions.......................... 210 3.6.1 The t-distribution........................ 210 3.6.2 The F -distribution........................ 212 3.6.3 Student’s Theorem........................ 214 3.7 ∗ Mixture Distributions.......................... 218 4 Some Elementary Statistical Inferences 225 4.1 Sampling and Statistics......................... 225 4.1.1 Point Estimators......................... 226 4.1.2 Histogram Estimates of pmfs and pdfs............. 230 4.2 Confidence Intervals........................... 238 4.2.1 Confidence Intervals for Difference in Means.......... 241 4.2.2 Confidence Interval for Difference in Proportions....... 243 4.3 ∗ Confidence Intervals for Parameters of Discrete Distributions.... 248 4.4 Order Statistics.............................. 253 4.4.1 Quantiles............................. 257 4.4.2 Confidence Intervals for Quantiles............... 261 4.5 Introduction to Hypothesis Testing................... 267 4.6 Additional Comments About Statistical Tests............. 275 4.6.1 Observed Significance Level, p-value.............. 279 4.7 Chi-Square Tests............................. 283 4.8 The Method of Monte Carlo....................... 292 4.8.1 Accept–Reject Generation Algorithm.............. 298 4.9 Bootstrap Procedures.......................... 303 4.9.1 Percentile Bootstrap Confidence Intervals........... 303 4.9.2 Bootstrap Testing Procedures.................. 308 4.10 ∗ Tolerance Limits for Distributions................... 315 Contents vii 5 Consistency and Limiting Distributions 321 5.1 Convergence in Probability....................... 321 5.1.1 Sampling and Statistics..................... 324 5.2 Convergence in Distribution....................... 327 5.2.1 Bounded in Probability..................... 333 5.2.2 Δ-Method............................. 334 5.2.3 Moment Generating Function Technique............ 336 5.3 Central Limit Theorem......................... 341 5.4 ∗ Extensions to Multivariate Distributions............... 348 6 Maximum Likelihood Methods 355 6.1 Maximum Likelihood Estimation.................... 355 6.2 Rao–Cramér Lower Bound and Efficiency............... 362 6.3 Maximum Likelihood Tests....................... 376 6.4 Multiparameter Case: Estimation.................... 386 6.5 Multiparameter Case: Testing...................... 395 6.6 The EM Algorithm............................ 404 7 Sufficiency 413 7.1 Measures of Quality of Estimators................... 413 7.2 A Sufficient Statistic for a Parameter.................. 419 7.3 Properties of a Sufficient Statistic.................... 426 7.4 Completeness and Uniqueness...................... 430 7.5 The Exponential Class of Distributions................. 435 7.6 Functions of a Parameter........................ 440 7.6.1 Bootstrap Standard Errors................... 444 7.7 The Case of Several Parameters..................... 447 7.8 Minimal Sufficiency and Ancillary Statistics.............. 454 7.9 Sufficiency, Completeness, and Independence............. 461 8 Optimal Tests of Hypotheses 469 8.1 Most Powerful Tests........................... 469 8.2 Uniformly Most Powerful Tests..................... 479 8.3 Likelihood Ratio Tests.......................... 487 8.3.1 Likelihood Ratio Tests for Testing Means of Normal Distri- butions.............................. 488 8.3.2 Likelihood Ratio Tests for Testing Variances of Normal Dis- tributions............................. 495 8.4 ∗ The Sequential Probability Ratio Test................. 500 8.5 ∗ Minimax and Classification Procedures................ 507 8.5.1 Minimax Procedures....................... 507 8.5.2 Classification........................... 510 viii Contents 9 Inferences About Normal Linear Models 515 9.1 Introduction................................ 515 9.2 One-Way ANOVA............................ 516 9.3 Noncentral χ2 and F -Distributions................... 522 9.4 Multiple Comparisons.......................... 525 9.5 Two-Way ANOVA............................ 531 9.5.1 Interaction between Factors................... 534 9.6 A Regression Problem.......................... 539 9.6.1 Maximum Likelihood Estimates................. 540 9.6.2 ∗ Geometry of the Least Squares Fit.............. 546 9.7 A Test of Independence......................... 551 9.8 The Distributions of Certain Quadratic Forms............. 555 9.9 The Independence of Certain Quadratic Forms............ 562 10 Nonparametric and Robust Statistics 569 10.1 Location Models............................. 569 10.2 Sample Median and the Sign Test.................... 572 10.2.1 Asymptotic Relative Efficiency................. 577 10.2.2 Estimating Equations Based on the Sign Test......... 582 10.2.3 Confidence Interval for the Median............... 584 10.3 Signed-Rank Wilcoxon.......................... 586 10.3.1 Asymptotic Relative Efficiency................. 591 10.3.2 Estimating Equations Based on Signed-Rank Wilcoxon... 593 10.3.3 Confidence Interval for the Median............... 594 10.3.4 Monte Carlo Investigation.................... 595 10.4 Mann–Whitney–Wilcoxon Procedure.................. 598 10.4.1 Asymptotic Relative Efficiency................. 602 10.4.2 Estimating Equations Based on the Mann–Whitney–Wilcoxon 604 10.4.3 Confidence Interval for the Shift Parameter Δ......... 604 10.4.4 Monte Carlo Investigation of Power.............. 605 10.5 ∗ General Rank Scores.......................... 607 10.5.1 Efficacy.............................. 610 10.5.2 Estimating Equations Based on General Scores........ 612 10.5.3 Optimization: Best Estimates.................. 612 10.6 ∗ Adaptive Procedures.......................... 619 10.7 Simple Linear Model........................... 625 10.8 Measures of Association......................... 631 10.8.1 Kendall’s τ............................ 631 10.8.2 Spearman’s Rho......................... 634 10.9 Robust Concepts............................. 638 10.9.1 Location Model.......................... 638 10.9.2 Linear Model........................... 645 Contents ix 11 Bayesian Statistics 655 11.1 Bayesian Procedures........................... 655 11.1.1 Prior and Posterior Distributions................ 656 11.1.2 Bayesian Point Estimation................... 658 11.1.3 Bayesian Interval Estimation.................. 662 11.1.4 Bayesian Testing Procedures.................. 663 11.1.5 Bayesian Sequential Procedures................. 664 11.2 More Bayesian Terminology and Ideas................. 666 11.3 Gibbs Sampler.............................. 672 11.4 Modern Bayesian Methods........................ 679 11.4.1 Empirical Bayes......................... 682 A Mathematical Comments 687 A.1 Regularity Conditions.......................... 687 A.2 Sequences................................. 688 B R Primer 693 B.1 Basics................................... 693 B.2 Probability Distributions......................... 696 B.3 R Functions................................ 698 B.4 Loops................................... 699 B.5 Input and Output............................ 700 B.6 Packages.................................. 700 C Lists of Common Distributions 703 D Tables of Distributions 707 E References 715 F Answers to Selected Exercises 721 Index 733 This page intentionally left blank Preface We have made substantial changes in this edition of Introduction to Mathematical Statistics. Some of these changes help students appreciate the connection between statistical theory and statistical practice while other changes enhance the develop- ment and discussion of the statistical theory presented in this book. Many of the changes in this edition reflect comments made by our readers. One of these comments concerned the small number of real data sets in the previous editions. In this edition, we have included more real data sets, using them to illustrate statistical methods or to compare methods. Further, we have made these data sets accessible to students by including them in the free R package hmcpkg. They can also be individually downloaded in an R session at the url listed below. In general, the R code for the analyses on these data sets is given in the text. We have also expanded the use of the statistical software R. We selected R because it is a powerful statistical language that is free and runs on all three main platforms (Windows, Mac, and Linux). Instructors, though, can select another statistical package. We have also expanded our use of R functions to compute analyses and simulation studies, including several games. We have kept the level of coding for these functions straightforward. Our goal is to show students that with a few simple lines of code they can perform significant computations. Appendix B contains a brief R primer, which suffices for the understanding of the R used in the text. As with the data sets, these R functions can be sourced individually at the cited url; however, they are also included in the package hmcpkg. We have supplemented the mathematical review material in Appendix A, placing it in the document Mathematical Primer for Introduction to Mathematical Statistics. It is freely available for students to download at the listed url. Besides sequences, this supplement reviews the topics of infinite series, differentiation, and integra- tion (univariate and bivariate). We have also expanded the discussion of iterated integrals in the text. We have added figures to clarify discussion. We have retained the order of elementary statistical inferences (Chapter 4) and asymptotic theory (Chapter 5). In Chapters 5 and 6, we have written brief reviews of the material in Chapter 4, so that Chapters 4 and 5 are essentially independent of one another and, hence, can be interchanged. In Chapter 3, we now begin the section on the multivariate normal distribution with a subsection on the bivariate normal distribution. Several important topics have been added. This includes Tukey’s multiple comparison procedure in Chapter 9 and confidence intervals for the correlation coefficients found in Chapters 9 and 10. Chapter 7 now contains a xi xii Preface discussion on standard errors for estimates obtained by bootstrapping the sample. Several topics that were discussed in the Exercises are now discussed in the text. Examples include quantiles, Section 1.7.1, and hazard functions, Section 3.3. In general, we have made more use of subsections to break up some of the discussion. Also, several more sections are now indicated by ∗ as being optional. Content and Course Planning Chapters 1 and 2 develop probability models for univariate and multivariate vari- ables while Chapter 3 discusses many of the most widely used probability models. Chapter 4 discusses statistical theory for much of the inference found in a stan- dard statistical methods course. Chapter 5 presents asymptotic theory, concluding with the Central Limit Theorem. Chapter 6 provides a complete inference (esti- mation and testing) based on maximum likelihood theory. The EM algorithm is also discussed. Chapters 7–8 contain optimal estimation procedures and tests of statistical hypotheses. The final three chapters provide theory for three important topics in statistics. Chapter 9 contains inference for normal theory methods for basic analysis of variance, univariate regression, and correlation models. Chapter 10 presents nonparametric methods (estimation and testing) for location and uni- variate regression models. It also includes discussion on the robust concepts of efficiency, influence, and breakdown. Chapter 11 offers an introduction to Bayesian methods. This includes traditional Bayesian procedures as well as Markov Chain Monte Carlo techniques. Several courses can be designed using our book. The basic two-semester course in mathematical statistics covers most of the material in Chapters 1–8 with topics selected from the remaining chapters. For such a course, the instructor would have the option of interchanging the order of Chapters 4 and 5, thus beginning the second semester with an introduction to statistical theory (Chapter 4). A one-semester course could consist of Chapters 1–4 with a selection of topics from Chapter 5. Under this option, the student sees much of the statistical theory for the methods discussed in a non-theoretical course in methods. On the other hand, as with the two-semester sequence, after covering Chapters 1–3, the instructor can elect to cover Chapter 5 and finish the course with a selection of topics from Chapter 4. The data sets and R functions used in this book and the R package hmcpkg can be downloaded at the site: https://media.pearsoncmg.com/cmg/pmmg_mml_shared/mathstatsresources /home/index.html Preface xiii Acknowledgements Bob Hogg passed away in 2014, so he did not work on this edition of the book. Often, though, when I was trying to decide whether or not to make a change in the manuscript, I found myself thinking of what Bob would do. In his memory, I have retained the order of the authors for this edition. As with earlier editions, comments from readers are always welcomed and ap- preciated. We would like to thank these reviewers of the previous edition: James Baldone, Virginia College; Steven Culpepper, University of Illinois at Urbana- Champaign; Yuichiro Kakihara, California State University; Jaechoul Lee, Boise State University; Michael Levine, Purdue University; Tingni Sun, University of Maryland, College Park; and Daniel Weiner, Boston University. We appreciated and took into consideration their comments for this revision. We appreciate the helpful comments of Thomas Hettmansperger of Penn State University, Ash Abebe of Auburn University, and Professor Ioannis Kalogridis of the University of Leuven. A special thanks to Patrick Barbera (Portfolio Manager, Statistics), Lauren Morse (Content Producer, Math/Stats), Yvonne Vannatta (Product Marketing Manager), and the rest of the staff at Pearson for their help in putting this edition together. Thanks also to Richard Ponticelli, North Shore Community College, who accuracy checked the page proofs. Also, a special thanks to my wife Marge for her unwavering support and encouragement of my efforts in writing this edition. Joe McKean This page intentionally left blank Chapter 1 Probability and Distributions 1.1 Introduction In this section, we intuitively discuss the concepts of a probability model which we formalize in Secton 1.3 Many kinds of investigations may be characterized in part by the fact that repeated experimentation, under essentially the same conditions, is more or less standard procedure. For instance, in medical research, interest may center on the effect of a drug that is to be administered; or an economist may be concerned with the prices of three specified commodities at various time intervals; or an agronomist may wish to study the effect that a chemical fertilizer has on the yield of a cereal grain. The only way in which an investigator can elicit information about any such phenomenon is to perform the experiment. Each experiment terminates with an outcome. But it is characteristic of these experiments that the outcome cannot be predicted with certainty prior to the experiment. Suppose that we have such an experiment, but the experiment is of such a nature that a collection of every possible outcome can be described prior to its performance. If this kind of experiment can be repeated under the same conditions, it is called a random experiment, and the collection of every possible outcome is called the experimental space or the sample space. We denote the sample space by C. Example 1.1.1. In the toss of a coin, let the outcome tails be denoted by T and let the outcome heads be denoted by H. If we assume that the coin may be repeatedly tossed under the same conditions, then the toss of this coin is an example of a random experiment in which the outcome is one of the two symbols T or H; that is, the sample space is the collection of these two symbols. For this example, then, C = {H, T }. Example 1.1.2. In the cast of one red die and one white die, let the outcome be the ordered pair (number of spots up on the red die, number of spots up on the white die). If we assume that these two dice may be repeatedly cast under the same con- ditions, then the cast of this pair of dice is a random experiment. The sample space consists of the 36 ordered pairs: C = {(1, 1),... , (1, 6), (2, 1),... , (2, 6),... , (6, 6)}. 1 2 Probability and Distributions We generally use small Roman letters for the elements of C such as a, b, or c. Often for an experiment, we are interested in the chances of certain subsets of elements of the sample space occurring. Subsets of C are often called events and are generally denoted by capitol Roman letters such as A, B, or C. If the experiment results in an element in an event A, we say the event A has occurred. We are interested in the chances that an event occurs. For instance, in Example 1.1.1 we may be interested in the chances of getting heads; i.e., the chances of the event A = {H} occurring. In the second example, we may be interested in the occurrence of the sum of the upfaces of the dice being “7” or “11;” that is, in the occurrence of the event A = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1), (5, 6), (6, 5)}. Now conceive of our having made N repeated performances of the random ex- periment. Then we can count the number f of times (the frequency) that the event A actually occurred throughout the N performances. The ratio f /N is called the relative frequency of the event A in these N experiments. A relative fre- quency is usually quite erratic for small values of N , as you can discover by tossing a coin. But as N increases, experience indicates that we associate with the event A a number, say p, that is equal or approximately equal to that number about which the relative frequency seems to stabilize. If we do this, then the number p can be interpreted as that number which, in future performances of the experiment, the relative frequency of the event A will either equal or approximate. Thus, although we cannot predict the outcome of a random experiment, we can, for a large value of N , predict approximately the relative frequency with which the outcome will be in A. The number p associated with the event A is given various names. Some- times it is called the probability that the outcome of the random experiment is in A; sometimes it is called the probability of the event A; and sometimes it is called the probability measure of A. The context usually suggests an appropriate choice of terminology. Example 1.1.3. Let C denote the sample space of Example 1.1.2 and let B be the collection of every ordered pair of C for which the sum of the pair is equal to seven. Thus B = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2)(6, 1)}. Suppose that the dice are cast N = 400 times and let f denote the frequency of a sum of seven. Suppose that 400 casts result in f = 60. Then the relative frequency with which the outcome 60 was in B is f /N = 400 = 0.15. Thus we might associate with B a number p that is close to 0.15, and p would be called the probability of the event B. Remark 1.1.1. The preceding interpretation of probability is sometimes referred to as the relative frequency approach, and it obviously depends upon the fact that an experiment can be repeated under essentially identical conditions. However, many persons extend probability to other situations by treating it as a rational measure of belief. For example, the statement p = 25 for an event A would mean to them that their personal or subjective probability of the event A is equal to 25. Hence, if they are not opposed to gambling, this could be interpreted as a willingness on their part to bet on the outcome of A so that the two possible payoffs are in the ratio p/(1 − p) = 25 / 35 = 23. Moreover, if they truly believe that p = 25 is correct, they would be willing to accept either side of the bet: (a) win 3 units if A occurs and lose 2 if it does not occur, or (b) win 2 units if A does not occur and lose 3 if 1.2. Sets 3 it does. However, since the mathematical properties of probability given in Section 1.3 are consistent with either of these interpretations, the subsequent mathematical development does not depend upon which approach is used. The primary purpose of having a mathematical theory of statistics is to provide mathematical models for random experiments. Once a model for such an experi- ment has been provided and the theory worked out in detail, the statistician may, within this framework, make inferences (that is, draw conclusions) about the ran- dom experiment. The construction of such a model requires a theory of probability. One of the more logically satisfying theories of probability is that based on the concepts of sets and functions of sets. These concepts are introduced in Section 1.2. 1.2 Sets The concept of a set or a collection of objects is usually left undefined. However, a particular set can be described so that there is no misunderstanding as to what collection of objects is under consideration. For example, the set of the first 10 positive integers is sufficiently well described to make clear that the numbers 34 and 14 are not in the set, while the number 3 is in the set. If an object belongs to a set, it is said to be an element of the set. For example, if C denotes the set of real numbers x for which 0 ≤ x ≤ 1, then 34 is an element of the set C. The fact that 3 3 4 is an element of the set C is indicated by writing 4 ∈ C. More generally, c ∈ C means that c is an element of the set C. The sets that concern us are frequently sets of numbers. However, the language of sets of points proves somewhat more convenient than that of sets of numbers. Accordingly, we briefly indicate how we use this terminology. In analytic geometry considerable emphasis is placed on the fact that to each point on a line (on which an origin and a unit point have been selected) there corresponds one and only one number, say x; and that to each number x there corresponds one and only one point on the line. This one-to-one correspondence between the numbers and points on a line enables us to speak, without misunderstanding, of the “point x” instead of the “number x.” Furthermore, with a plane rectangular coordinate system and with x and y numbers, to each symbol (x, y) there corresponds one and only one point in the plane; and to each point in the plane there corresponds but one such symbol. Here again, we may speak of the “point (x, y),” meaning the “ordered number pair x and y.” This convenient language can be used when we have a rectangular coordinate system in a space of three or more dimensions. Thus the “point (x1 , x2 ,... , xn )” means the numbers x1 , x2 ,... , xn in the order stated. Accordingly, in describing our sets, we frequently speak of a set of points (a set whose elements are points), being careful, of course, to describe the set so as to avoid any ambiguity. The notation C = {x : 0 ≤ x ≤ 1} is read “C is the one-dimensional set of points x for which 0 ≤ x ≤ 1.” Similarly, C = {(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1} can be read “C is the two-dimensional set of points (x, y) that are interior to, or on the boundary of, a square with opposite vertices at (0, 0) and (1, 1).” We say a set C is countable if C is finite or has as many elements as there are positive integers. For example, the sets C1 = {1, 2,... , 100} and C2 = {1, 3, 5, 7,...} 4 Probability and Distributions are countable sets. The interval of real numbers (0, 1], though, is not countable. 1.2.1 Review of Set Theory As in Section 1.1, let C denote the sample space for the experiment. Recall that events are subsets of C. We use the words event and subset interchangeably in this section. An elementary algebra of sets will prove quite useful for our purposes. We now review this algebra below along with illustrative examples. For illustration, we also make use of Venn diagrams. Consider the collection of Venn diagrams in Figure 1.2.1. The interior of the rectangle in each plot represents the sample space C. The shaded region in Panel (a) represents the event A. Panel (a) Panel (b) B A A A A ⊂B Panel (c) Panel (d) A B A B A∪B A∩B Figure 1.2.1: A series of Venn diagrams. The sample space C is represented by the interior of the rectangle in each plot. Panel (a) depicts the event A; Panel (b) depicts A ⊂ B; Panel (c) depicts A ∪ B; and Panel (d) depicts A ∩ B. We first define the complement of an event A. Definition 1.2.1. The complement of an event A is the set of all elements in C which are not in A. We denote the complement of A by Ac. That is, Ac = {x ∈ C : x∈/ A}. 1.2. Sets 5 The complement of A is represented by the white space in the Venn diagram in Panel (a) of Figure 1.2.1. The empty set is the event with no elements in it. It is denoted by φ. Note that C c = φ and φc = C. The next definition defines when one event is a subset of another. Definition 1.2.2. If each element of a set A is also an element of set B, the set A is called a subset of the set B. This is indicated by writing A ⊂ B. If A ⊂ B and also B ⊂ A, the two sets have the same elements, and this is indicated by writing A = B. Panel (b) of Figure 1.2.1 depicts A ⊂ B. The event A or B is defined as follows: Definition 1.2.3. Let A and B be events. Then the union of A and B is the set of all elements that are in A or in B or in both A and B. The union of A and B is denoted by A ∪ B Panel (c) of Figure 1.2.1 shows A ∪ B. The event that both A and B occur is defined by, Definition 1.2.4. Let A and B be events. Then the intersection of A and B is the set of all elements that are in both A and B. The intersection of A and B is denoted by A ∩ B Panel (d) of Figure 1.2.1 illustrates A ∩ B. Two events are disjoint if they have no elements in common. More formally we define Definition 1.2.5. Let A and B be events. Then A and B are disjoint if A∩B = φ If A and B are disjoint, then we say A ∪ B forms a disjoint union. The next two examples illustrate these concepts. Example 1.2.1. Suppose we have a spinner with the numbers 1 through 10 on it. The experiment is to spin the spinner and record the number spun. Then C = {1, 2,... , 10}. Define the events A, B, and C by A = {1, 2}, B = {2, 3, 4}, and C = {3, 4, 5, 6}, respectively. Ac = {3, 4,... , 10}; A ∪ B = {1, 2, 3, 4}; A ∩ B = {2} A ∩ C = φ; B ∩ C = {3, 4}; B ∩ C ⊂ B; B ∩ C ⊂ C A ∪ (B ∩ C) = {1, 2} ∪ {3, 4} = {1, 2, 3, 4} (1.2.1) (A ∪ B) ∩ (A ∪ C) = {1, 2, 3, 4} ∩ {1, 2, 3, 4, 5, 6} = {1, 2, 3, 4} (1.2.2) The reader should verify these results. Example 1.2.2. For this example, suppose the experiment is to select a real number in the open interval (0, 5); hence, the sample space is C = (0, 5). Let A = (1, 3), 6 Probability and Distributions B = (2, 4), and C = [3, 4.5). A ∪ B = (1, 4); A ∩ B = (2, 3); B ∩ C = [3, 4) A ∩ (B ∪ C) = (1, 3) ∩ (2, 4.5) = (2, 3) (1.2.3) (A ∩ B) ∪ (A ∩ C) = (2, 3) ∪ φ = (2, 3) (1.2.4) A sketch of the real number line between 0 and 5 helps to verify these results. Expressions (1.2.1)–(1.2.2) and (1.2.3)–(1.2.4) are illustrations of general dis- tributive laws. For any sets A, B, and C, A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C). (1.2.5) These follow directly from set theory. To verify each identity, sketch Venn diagrams of both sides. The next two identities are collectively known as DeMorgan’s Laws. For any sets A and B, (A ∩ B)c = Ac ∪ B c (1.2.6) (A ∪ B)c = Ac ∩ B c. (1.2.7) For instance, in Example 1.2.1, (A∪B)c = {1, 2, 3, 4}c = {5, 6,... , 10} = {3, 4,... , 10}∩{{1, 5, 6,... , 10} = Ac ∩B c ; while, from Example 1.2.2, (A ∩ B)c = (2, 3)c = (0, 2] ∪ [3, 5) = [(0, 1] ∪ [3, 5)] ∪ [(0, 2] ∪ [4, 5)] = Ac ∪ B c. As the last expression suggests, it is easy to extend unions and intersections to more than two sets. If A1 , A2 ,... , An are any sets, we define A1 ∪ A2 ∪ · · · ∪ An = {x : x ∈ Ai , for some i = 1, 2,... , n} (1.2.8) A1 ∩ A2 ∩ · · · ∩ An = {x : x ∈ Ai , for all i = 1, 2,... , n}. (1.2.9) We often abbreviative these by ∪ni=1 Ai and ∩ni=1 Ai , respectively. Expressions for countable unions and intersections follow directly; that is, if A1 , A2 ,... , An... is a sequence of sets then A1 ∪ A2 ∪ · · · = {x : x ∈ An , for some n = 1, 2,...} = ∪∞ n=1 An (1.2.10) A1 ∩ A2 ∩ · · · = {x : x ∈ An , for all n = 1, 2,...} = ∩∞ n=1 An. (1.2.11) The next two examples illustrate these ideas. Example 1.2.3. Suppose C = {1, 2, 3,...}. If An = {1, 3,... , 2n − 1} and Bn = {n, n + 1,...}, for n = 1, 2, 3,..., then ∪∞ ∞ n=1 An = {1, 3, 5,...}; ∩n=1 An = {1}; (1.2.12) ∪∞ ∞ n=1 Bn = C; ∩n=1 Bn = φ. (1.2.13) 1.2. Sets 7 Example 1.2.4. Suppose C is the interval of real numbers (0, 5). Suppose Cn = (1 − n−1 , 2 + n−1 ) and Dn = (n−1 , 3 − n−1 ), for n = 1, 2, 3,.... Then ∪∞ ∞ n=1 Cn = (0, 3); ∩n=1 Cn = [1, 2] (1.2.14) ∪∞ n=1 Dn = (0, 3); ∩∞ n=1 Dn = (1, 2). (1.2.15) We occassionally have sequences of sets that are monotone. They are of two types. We say a sequence of sets {An } is nondecreasing, (nested upward), if An ⊂ An+1 for n = 1, 2, 3,.... For such a sequence, we define lim An = ∪∞ n=1 An. (1.2.16) n→∞ The sequence of sets An = {1, 3,... , 2n − 1} of Example 1.2.3 is such a sequence. So in this case, we write limn→∞ An = {1, 3, 5,...}. The sequence of sets {Dn } of Example 1.2.4 is also a nondecreasing suquence of sets. The second type of monotone sets consists of the nonincreasing, (nested downward) sequences. A sequence of sets {An } is nonincreasing, if An ⊃ An+1 for n = 1, 2, 3,.... In this case, we define lim An = ∩∞ n=1 An. (1.2.17) n→∞ The sequences of sets {Bn } and {Cn } of Examples 1.2.3 and 1.2.4, respectively, are examples of nonincreasing sequences of sets. 1.2.2 Set Functions Many of the functions used in calculus and in this book are functions that map real numbers into real numbers. We are concerned also with functions that map sets into real numbers. Such functions are naturally called functions of a set or, more simply, set functions. Next we give some examples of set functions and evaluate them for certain simple sets. Example 1.2.5. Let C = R, the set of real numbers. For a subset A in C, let Q(A) be equal to the number of points in A that correspond to positive integers. Then Q(A) is a set function of the set A. Thus, if A = {x : 0 < x < 5}, then Q(A) = 4; if A = {−2, −1}, then Q(A) = 0; and if A = {x : −∞ < x < 6}, then Q(A) = 5. Example 1.2.6. Let C = R2. For a subset A of C, let Q(A) be the area of A if A has a finite area; otherwise, let Q(A) be undefined. Thus, if A = {(x, y) : x2 + y 2 ≤ 1}, then Q(A) = π; if A = {(0, 0), (1, 1), (0, 1)}, then Q(A) = 0; and if A = {(x, y) : 0 ≤ x, 0 ≤ y, x + y ≤ 1}, then Q(A) = 12. Often our set functions are defined in terms of sums or integrals.1 With this in mind, we introduce the following notation. The symbol f (x) dx A 1 Please see Chapters 2 and 3 of Mathematical Comments, at site noted in the Preface, for a review of sums and integrals 8 Probability and Distributions means the ordinary (Riemann) integral of f (x) over a prescribed one-dimensional set A and the symbol g(x, y) dxdy A means the Riemann integral of g(x, y) over a prescribed two-dimensional set A. This notation can be extended to integrals over n dimensions. To be sure, unless these sets A and these functions f (x) and g(x, y) are chosen with care, the integrals frequently fail to exist. Similarly, the symbol f (x) A means the sum extended over all x ∈ A and the symbol g(x, y) A means the sum extended over all (x, y) ∈ A. As with integration, this notation extends to sums over n dimensions. The first example is for a set function defined on sums involving a geometric series. As pointed out in Example 2.3.1 of Mathematical Comments,2 if |a| < 1, then the following series converges to 1/(1 − a): ∞ 1 an = , if |a| < 1. (1.2.18) n=0 1−a Example 1.2.7. Let C be the set of all nonnegative integers and let A be a subset of C. Define the set function Q by 2 n Q(A) =. (1.2.19) 3 n∈A It follows from (1.2.18) that Q(C) = 3. If A = {1, 2, 3} then Q(A) = 38/27. Suppose B = {1, 3, 5,...} is the set of all odd positive integers. The computation of Q(B) is given next. This derivation consists of rewriting the series so that (1.2.18) can be applied. Frequently, we perform such derivations in this book. 2 n ∞ 2n+1 2 Q(B) = = 3 n=0 3 n∈B ∞ 2 n 2 2 2 1 6 = = = 3 n=0 3 3 1 − (4/9) 5 In the next example, the set function is defined in terms of an integral involving the exponential function f (x) = e−x. 2 Downloadable at site noted in the Preface 1.2. Sets 9 Example 1.2.8. Let C be the interval of positive real numbers, i.e., C = (0, ∞). Let A be a subset of C. Define the set function Q by Q(A) = e−x dx, (1.2.20) A provided the integral exists. The reader should work through the following integra- tions: 3 3 Q[(1, 3)] = e−x dx = −e−x = e−1 − e−3 =0.318 ˙ 1 1 ∞ 3 −x Q[(5, ∞)] = e dx = −e = e−5 =0.007 −x ˙ 1 5 5 3 5 Q[(1, 3) ∪ [3, 5)] = e−x dx = e−x dx + e−x dx = Q[(1, 3)] + Q([3, 5)] 1 1 3 ∞ Q(C) = e−x dx = 1. 0 Our final example, involves an n dimensional integral. Example 1.2.9. Let C = Rn. For A in C define the set function Q(A) = · · · dx1 dx2 · · · dxn , A provided the integral exists. For example, if A = {(x1 , x2 ,... , xn ) : 0 ≤ x1 ≤ x2 , 0 ≤ xi ≤ 1, for 1 = 3, 4,... , n}, then upon expressing the multiple integral as an iterated integral3 we obtain 1 x2 n 1 Q(A) = dx1 dx2 dxi 0 0 i=3 0 1 x22 1 = 1=. 2 0 2 If B = {(x1 , x2 ,... , xn ) : 0 ≤ x1 ≤ x2 ≤ · · · ≤ xn ≤ 1}, then 1 xn x3 x2 Q(B) = ··· dx1 dx2 · · · dxn−1 dxn 0 0 0 0 1 = , n! where n! = n(n − 1) · · · 3 · 2 · 1. 3 For a discussion of multiple integrals in terms of iterated integrals, see Chapter 3 of Mathe- matical Comments. 10 Probability and Distributions EXERCISES 1.2.1. Find the union C1 ∪ C2 and the intersection C1 ∩ C2 of the two sets C1 and C2 , where (a) C1 = {0, 1, 2, }, C2 = {2, 3, 4}. (b) C1 = {x : 0 < x < 2}, C2 = {x : 1 ≤ x < 3}. (c) C1 = {(x, y) : 0 < x < 2, 1 < y < 2}, C2 = {(x, y) : 1 < x < 3, 1 < y < 3}. 1.2.2. Find the complement C c of the set C with respect to the space C if 5 (a) C = {x : 0 < x < 1}, C = {x : 8 < x < 1}. (b) C = {(x, y, z) : x2 + y 2 + z 2 ≤ 1}, C = {(x, y, z) : x2 + y 2 + z 2 = 1}. (c) C = {(x, y) : |x| + |y| ≤ 2}, C = {(x, y) : x2 + y 2 < 2}. 1.2.3. List all possible arrangements of the four letters m, a, r, and y. Let C1 be the collection of the arrangements in which y is in the last position. Let C2 be the collection of the arrangements in which m is in the first position. Find the union and the intersection of C1 and C2. 1.2.4. Concerning DeMorgan’s Laws (1.2.6) and (1.2.7): (a) Use Venn diagrams to verify the laws. (b) Show that the laws are true. (c) Generalize the laws to countable unions and intersections. 1.2.5. By the use of Venn diagrams, in which the space C is the set of points enclosed by a rectangle containing the circles C1 , C2 , and C3 , compare the following sets. These laws are called the distributive laws. (a) C1 ∩ (C2 ∪ C3 ) and (C1 ∩ C2 ) ∪ (C1 ∩ C3 ). (b) C1 ∪ (C2 ∩ C3 ) and (C1 ∪ C2 ) ∩ (C1 ∪ C3 ). 1.2.6. Show that the following sequences of sets, {Ck }, are nondecreasing, (1.2.16), then find limk→∞ Ck. (a) Ck = {x : 1/k ≤ x ≤ 3 − 1/k}, k = 1, 2, 3,.... (b) Ck = {(x, y) : 1/k ≤ x2 + y 2 ≤ 4 − 1/k}, k = 1, 2, 3,.... 1.2.7. Show that the following sequences of sets, {Ck }, are nonincreasing, (1.2.17), then find limk→∞ Ck. (a) Ck = {x : 2 − 1/k < x ≤ 2}, k = 1, 2, 3,.... (b) Ck = {x : 2 < x ≤ 2 + 1/k}, k = 1, 2, 3,.... 1.2. Sets 11 (c) Ck = {(x, y) : 0 ≤ x2 + y 2 ≤ 1/k}, k = 1, 2, 3,.... 1.2.8. For every one-dimensional set C, define the function Q(C) = C f (x), where f (x) = ( 23 )( 13 )x , x = 0, 1, 2,... , zero elsewhere. If C1 = {x : x = 0, 1, 2, 3} and C2 = {x : x = 0, 1, 2,...}, find Q(C1 ) and Q(C2 ). Hint: Recall that Sn = a + ar + · · · + arn−1 = a(1 − rn )/(1 − r) and, hence, it follows that limn→∞ Sn = a/(1 − r) provided that |r| < 1. 1.2.9. For every one-dimensional set C for which the integral exists, let Q(C) = C f (x) dx, where f (x) = 6x(1 − x), 0 < x < 1, zero elsewhere; otherwise, let Q(C) be undefined. If C1 = {x : 14 < x < 34 }, C2 = { 21 }, and C3 = {x : 0 < x < 10}, find Q(C1 ), Q(C2 ), and Q(C3 ). 1.2.10. For every two-dimensional set C contained in R2 for which the integral 2 2 exists, let Q(C) = C (x + y ) dxdy. If C1 = {(x, y) : −1 ≤ x ≤ 1, −1 ≤ y ≤ 1}, C2 = {(x, y) : −1 ≤ x = y ≤ 1}, and C3 = {(x, y) : x2 + y 2 ≤ 1}, find Q(C1 ), Q(C2 ), and Q(C3 ). 1.2.11. Let C denote the set of points that are interior to, or on the boundary of, a square with opposite vertices at the points (0, 0) and (1, 1). Let Q(C) = C dy dx. (a) If C ⊂ C is the set {(x, y) : 0 < x < y < 1}, compute Q(C). (b) If C ⊂ C is the set {(x, y) : 0 < x = y < 1}, compute Q(C). (c) If C ⊂ C is the set {(x, y) : 0 < x/2 ≤ y ≤ 3x/2 < 1}, compute Q(C). 1.2.12. Let C be the set of points interior to or on the boundary of a cube with edge of length 1. Moreover, say that the cube is in the first octant with one vertex at the point (0, 0, 0) and an opposite vertex at the point (1, 1, 1). Let Q(C) = C dxdydz. (a) If C ⊂ C is the set {(x, y, z) : 0 < x < y < z < 1}, compute Q(C). (b) If C is the subset {(x, y, z) : 0 < x = y = z < 1}, compute Q(C). 1.2.13. Let C denote the set {(x, y, z) : x2 + y 2 + z 2 ≤ 1}. Using spherical coordi- nates, evaluate Q(C) = x2 + y 2 + z 2 dxdydz. C 1.2.14. To join a certain club, a person must be either a statistician or a math- ematician or both. Of the 25 members in this club, 19 are statisticians and 16 are mathematicians. How many persons in the club are both a statistician and a mathematician? 1.2.15. After a hard-fought football game, it was reported that, of the 11 starting players, 8 hurt a hip, 6 hurt an arm, 5 hurt a knee, 3 hurt both a hip and an arm, 2 hurt both a hip and a knee, 1 hurt both an arm and a knee, and no one hurt all three. Comment on the accuracy of the report. 12 Probability and Distributions 1.3 The Probability Set Function Given an experiment, let C denote the sample space of all possible outcomes. As discussed in Section 1.1, we are interested in assigning probabilities to events, i.e., subsets of C. What should be our collection of events? If C is a finite set, then we could take the set of all subsets as this collection. For infinite sample spaces, though, with assignment of probabilities in mind, this poses mathematical technicalities that are better left to a course in probability theory. We assume that in all cases, the collection of events is sufficiently rich to include all possible events of interest and is closed under complements and countable unions of these events. Using DeMorgan’s Laws, (1.2.6)–(1.2.7), the collection is then also closed under countable intersections. We denote this collection of events by B. Technically, such a collection of events is called a σ-field of subsets. Now that we have a sample space, C, and our collection of events, B, we can define the third component in our probability space, namely a probability set function. In order to motivate its definition, we consider the relative frequency approach to probability. Remark 1.3.1. The definition of probability consists of three axioms which we motivate by the following three intuitive properties of relative frequency. Let C be a sample space and let A ⊂ C. Suppose we repeat the experiment N times. Then the relative frequency of A is fA = #{A}/N , where #{A} denotes the number of times A occurred in the N repetitions. Note that fA ≥ 0 and fC = 1. These are the first two properties. For the third, suppose that A1 and A2 are disjoint events. Then fA1 ∪A2 = fA1 + fA2. These three properties of relative frequencies form the axioms of a probability, except that the third axiom is in terms of countable unions. As with the axioms of probability, the readers should check that the theorems we prove below about probabilities agree with their intuition of relative frequency. Definition 1.3.1 (Probability). Let C be a sample space and let B be the set of events. Let P be a real-valued function defined on B. Then P is a probability set function if P satisfies the following three conditions: 1. P (A) ≥ 0, for all A ∈ B. 2. P (C) = 1. 3. If {An } is a sequence of events in B and Am ∩ An = φ for all m = n, then ∞ ∞ P An = P (An ). n=1 n=1 A collection of events whose members are pairwise disjoint, as in (3), is said to be a mutually exclusive collection and its union is often referred to as a disjoint union. The collection is further said to be exhaustive if the union of its events is ∞ the sample space, in which case n=1 P (An ) = 1. We often say that a mutually exclusive and exhaustive collection of events forms a partition of C. 1.3. The Probability Set Function 13 A probability set function tells us how the probability is distributed over the set of events, B. In this sense we speak of a distribution of probability. We often drop the word “set” and refer to P as a probability function. The following theorems give us some other properties of a probability set func- tion. In the statement of each of these theorems, P (A) is taken, tacitly, to be a probability set function defined on the collection of events B of a sample space C. Theorem 1.3.1. For each event A ∈ B, P (A) = 1 − P (Ac ). Proof: We have C = A ∪ Ac and A ∩ Ac = φ. Thus, from (2) and (3) of Definition 1.3.1, it follows that 1 = P (A) + P (Ac ), which is the desired result. Theorem 1.3.2. The probability of the null set is zero; that is, P (φ) = 0. Proof: In Theorem 1.3.1, take A = φ so that Ac = C. Accordingly, we have P (φ) = 1 − P (C) = 1 − 1 = 0 and the theorem is proved. Theorem 1.3.3. If A and B are events such that A ⊂ B, then P (A) ≤ P (B). Proof: Now B = A ∪ (Ac ∩ B) and A ∩ (Ac ∩ B) = φ. Hence, from (3) of Definition 1.3.1, P (B) = P (A) + P (Ac ∩ B). From (1) of Definition 1.3.1, P (Ac ∩ B) ≥ 0. Hence, P (B) ≥ P (A). Theorem 1.3.4. For each A ∈ B, 0 ≤ P (A) ≤ 1. Proof: Since φ ⊂ A ⊂ C, we have by Theorem 1.3.3 that P (φ) ≤ P (A) ≤ P (C) or 0 ≤ P (A) ≤ 1, the desired result. Part (3) of the definition of probability says that P (A ∪ B) = P (A) + P (B) if A and B are disjoint, i.e., A ∩ B = φ. The next theorem gives the rule for any two events regardless if they are disjoint or not. Theorem 1.3.5. If A and B are events in C, then P (A ∪ B) = P (A) + P (B) − P (A ∩ B). Proof: Each of the sets A ∪ B and B can be represented, respectively, as a union of nonintersecting sets as follows: A ∪ B = A ∪ (Ac ∩ B) and B = (A ∩ B) ∪ (Ac ∩ B). (1.3.1) 14 Probability and Distributions That these identities hold for all sets A and B follows from set theory. Also, the Venn diagrams of Figure 1.3.1 offer a verification of them. Thus, from (3) of Definition 1.3.1, P (A ∪ B) = P (A) + P (Ac ∩ B) and P (B) = P (A ∩ B) + P (Ac ∩ B). If the second of these equations is solved for P (Ac ∩ B) and this result is substituted in the first equation, we obtain P (A ∪ B) = P (A) + P (B) − P (A ∩ B). This completes the proof. Panel (a) Panel (b) A B A B A ∪ B = A ∪ (A c ∩ B ) A = (A ∩ B c )∪ (A ∩ B ) Figure 1.3.1: Venn diagrams depicting the two disjoint unions given in expression (1.3.1). Panel (a) depicts the first disjoint union while Panel (b) shows the second disjoint union. Example 1.3.1. Let C denote the sample space of Example 1.1.2. Let the proba- 1 bility set function assign a probability of 36 to each of the 36 points in C; that is, the dice are fair. If C1 = {(1, 1), (2, 1), (3, 1), (4, 1), (5, 1)} and C2 = {(1, 2), (2, 2), (3, 2)}, 5 3 8 then P (C1 ) = 36 , P (C2 ) = 36 , P (C1 ∪ C2 ) = 36 , and P (C1 ∩ C2 ) = 0. Example 1.3.2. Two coins are to be tossed and the outcome is the ordered pair (face on the first coin, face on the second coin). Thus the sample space may be represented as C = {(H, H), (H, T ), (T, H), (T, T )}. Let the probability set function assign a probability of 14 to each element of C. Let C1 = {(H, H), (H, T )} and C2 = {(H, H), (T, H)}. Then P (C1 ) = P (C2 ) = 12 , P (C1 ∩ C2 ) = 14 , and, in accordance with Theorem 1.3.5, P (C1 ∪ C2 ) = 12 + 12 − 14 = 34. 1.3. The Probability Set Function 15 For a finite sample space, we can generate probabilities as follows. Let C = {x1 , x2 ,... , xm } be a finite set of m elements. Let p1 , p2 ,... , pm be fractions such that m 0 ≤ pi ≤ 1 for i = 1, 2,... , m and i=1 pi = 1. (1.3.2) Suppose we define P by P (A) = pi , for all subsets A of C. (1.3.3) xi ∈A Then P (A) ≥ 0 and P (C) = 1. Further, it follows that P (A ∪ B) = P (A) + P (B) when A ∩ B = φ. Therefore, P is a probability on C. For illustration, each of the following four assignments forms a probability on C = {1, 2,... , 6}. For each, we also compute P (A) for the event A = {1, 6}. 1 1 p1 = p 2 = · · · = p 6 = ; P (A) =. (1.3.4) 6 3 p1 = p2 = 0.1, p3 = p4 = p5 = p6 = 0.2; P (A) = 0.3. i 7 pi = , i = 1, 2,... , 6; P (A) =. 21 21 3 3 3 p1 = , p2 = 1 − , p3 = p4 = p5 = p6 = 0.0; P (A) =. π π π Note that the individual probabilities for the first probability set function, (1.3.4), are the same. This is an example of the equilikely case which we now formally define. Definition 1.3.2 (Equilikely Case). Let C = {x1 , x2 ,... , xm } be a finite sample space. Let pi = 1/m for all i = 1, 2,... , m and for all subsets A of C define 1 #(A) P (A) = = , m m xi ∈A where #(A) denotes the number of elements in A. Then P is a probability on C and it is refereed to as the equilikely case. Equilikely cases are frequently probability models of interest. Examples include: the flip of a fair coin; five cards drawn from a well shuffled deck of 52 cards; a spin of a fair spinner with the numbers 1 through 36 on it; and the upfaces of the roll of a pair of balanced dice. For each of these experiments, as stated in the definition, we only need to know the number of elements in an event to compute the probability of that event. For example, a card player may be interested in the probability of getting a pair (two of a kind) in a hand of five cards dealt from a well shuffled deck of 52 cards. To compute this probability, we need to know the number of five card hands and the number of such hands which contain a pair. Because the equilikely case is often of interest, we next develop some counting rules which can be used to compute the probabilities of events of interest. 16 Probability and Distributions 1.3.1 Counting Rules We discuss three counting rules that are usually discussed in an elementary algebra course. The first rule is called the mn-rule (m times n-rule), which is also called the multiplication rule. Let A = {x1 , x2 ,... , xm } be a set of m elements and let B = {y1 , y2 ,... , yn } be a set of n elements. Then there are mn ordered pairs, (xi , yj ), i = 1, 2,... , m and j = 1, 2,... , n, of elements, the first from A and the second from B. Informally, we often speak of ways, here. For example there are five roads (ways) between cities I and II and there are ten roads (ways) between cities II and III. Hence, there are 5 ∗ 10 = 50 ways to get from city I to city III by going from city I to city II and then from city II to city III. This rule extends immediately to more than two sets. For instance, suppose in a certain state that driver license plates have the pattern of three letters followed by three numbers. Then there are 263 ∗ 103 possible license plates in this state. Next, let A be a set with n elements. Suppose we are interested in k-tuples whose components are elements of A. Then by the extended mn rule, there are n · n · · · n = nk such k-tuples whose components are elements of A. Next, suppose k ≤ n and we are interested in k-tuples whose components are distinct (no repeats) elements of A. There are n elements from which to choose for the first component, n − 1 for the second component,... , n − (k − 1) for the kth. Hence, by the mn rule, there are n(n − 1) · · · (n − (k − 1)) such k-tuples with distinct elements. We call each such k-tuple a permutation and use the symbol Pkn to denote the number of k permutations taken from a set of n elements. This number of permutations, Pkn is our second counting rule. We can rewrite it as n! Pkn = n(n − 1) · · · (n − (k − 1)) =. (1.3.5) (n − k)! Example 1.3.3 (Birthday Problem). Suppose there are n people in a room. As- sume that n < 365 and that the people are unrelated in any way. Find the proba- bility of the event A that at least 2 people have the same birthday. For convenience, assign the numbers 1 though n to the people in the room. Then use n-tuples to denote the birthdays of the first person through the nth person in the room. Using the mn-rule, there are 365n possible birthday n-tuples for these n people. This is the number of elements in the sample space. Now assume that birthdays are equilikely to occur on any of the 365 days. Hence, each of these n-tuples has prob- ability 365−n. Notice that the complement of A is the event that all the birthdays in the room are distinct; that is, the number of n-tuples in Ac is Pn365. Thus, the probability of A is P 365 P (A) = 1 − n n. 365 For instance, if n = 2 then P (A) = 1 − (365 ∗ 364)/(3652) = 0.0027. This formula is not easy to compute by hand. The following R function4 computes the P (A) for the input n and it can be downloaded at the sites mentioned in the Preface. 4 An R primer for the course is found in Appendix B. 1.3. The Probability Set Function 17 bday = function(n){ bday = 1; nm1 = n - 1 for(j in 1:nm1){bday = bday*((365-j)/365)} bday source("bday.R") > bday(10) 0.1169482 For our last counting rule, as with permutations, we are drawing from a set A of n elements. Now, suppose order is not important, so instead of counting the number of permutations we want to count the number of subsets of k elements taken from A. We use the symbol nk to denote the total number of these subsets. Consider a subset of k elements from A. By the permutation rule it generates Pkk = k(k − 1) · · · 1 = k! permutations. Furthermore, all these permutations are distinct from the permutations generated by other subsets of k elements from A. Finally, each permutation of k distinct elements drawn from A must be generated by one of these subsets. Hence, we have shown that Pkn = nk k!; that is, n n! =. (1.3.6) k k!(n − k)! We often use the terminology combinations instead of subsets. So we say that there are nk combinations of k things taken from a set of n things. Another common symbol for nk is Ckn. It is interesting to note that if we expand the binomial series, (a + b)n = (a + b)(a + b) · · · (a + b), we get n n n k n−k (a + b) = a b , (1.3.7) k k=0 n n because we can select the k factors from which to take a in k ways. So k is also referred to as a binomial coefficient. Example 1.3.4 (Poker Hands). Let a card be drawn at random from an ordinary deck of 52 playing cards that has been well shuffled. The sample space C consists of 52 elements, each element represents one and only one of the 52 cards. Because the deck has been well shuffled, it is reasonable to assume that each of these outcomes 1 has the same probability 52. Accordingly, if E1 is the set of outcomes that are spades, P (E1 ) = 52 = 4 because there are 13 spades in the deck; that is, 14 is the 13 1 probability of drawing a card that is a spade. If E2 is the set of outcomes that 4 1 1 are kings, P (E2 ) = 52 = 13 because there are 4 kings in the deck; that is, 13 is the probability of drawing a card that is a king. These computations are very easy 18 Probability and Distributions because there are no difficulties in the determination of the number of elements in each event. However, instead of drawing only one card, suppose that five cards are taken, at random and without replacement, from this deck; i.e, a five card poker hand. In this instance, order is not important. So a hand is a subset of five elements drawn from a set of 52 elements. Hence, by (1.3.6) there are 52 5 poker hands. If the deck is well shuffled, each hand should be equilikely; i.e., each hand has probability 1/ 52. We can now compute the probabilities of some interesting poker hands. Let 5 4 E1 be the event of a flush, all five cards of the same suit. There are 1 = 4 suits to choose for the flush and in each suit there are 13 5 possible hands; hence, using the multiplication rule, the probability of getting a flush is 413 4 · 1287 P (E1 ) = 1525 = = 0.00198. 5 2598960 Real poker players note that this includes the probability of obtaining a straight flush. Next, consider the probability of the event E2 of getting exactly three of a kind, (the other two cards are distinct and are of different kinds). Choose the kind for the three, in 13 ways; choose the three, in 43 ways; choose the other two kinds, 1 44 in 12 2 ways; and choose one card from each of these last two kinds, in 1 1 ways. Hence the probability of exactly three of a kind is 1341242 1 3 P (E2 ) = 522 1 = 0.0211. 5 Now suppose that E3 is the set of outcomes in which exactly three cards are kings and exactly two cards are queens. Select the kings, in 43 ways, and select the queens, in 42 ways. Hence, the probability of E3 is 4 4 52 P (E3 ) = = 0.0000093. 3 2 5 The event E3 is an example of a full house: three of one kind and two of another kind. Exercise 1.3.19 asks for the determination of the probability of a full house. 1.3.2 Additional Properties of Probability We end this section with several additional properties of probability which prove useful in the sequel. Recall in Exercise 1.2.6 we said that a sequence of events {Cn } is a nondecreasing sequence if Cn ⊂ Cn+1 , for all n, in which case we wrote limn→∞ Cn = ∪∞ n=1 Cn. Consider limn→∞ P (Cn ). The question is: can we legiti- mately interchange the limit and P ? As the following theorem shows, the answer is yes. The result also holds for a decreasing sequence of events. Because of this interchange, this theorem is sometimes referred to as the continuity theorem of probability. 1.3. The Probability Set Function 19 Theorem 1.3.6. Let {Cn } be a nondecreasing sequence of events. Then ∞ lim P (Cn ) = P ( lim Cn ) = P Cn. (1.3.8) n→∞ n→∞ n=1 Let {Cn } be a decreasing sequence of events. Then ∞ lim P (Cn ) = P ( lim Cn ) = P Cn. (1.3.9) n→∞ n→∞ n=1 Proof. We prove the result (1.3.8) and leave the second result as Exercise 1.3.20. ∞ as R1 = C1 and, for n > 1, Rn = Cn ∩ Cn−1. It c Define the sets, ∞called rings, follows that n=1 Cn = n=1 Rn and that Rm ∩ Rn = φ, for m = n. Also, P (Rn ) = P (Cn ) − P (Cn−1 ). Applying the third axiom of probability yields the following string of equalities: ∞ ∞ ∞ n P lim Cn = P Cn = P Rn = P (Rn ) = lim P (Rj ) n→∞ n→∞ n=1 n=1 n=1 j=1 ⎧ ⎫ ⎨ n ⎬ = lim P (C1 )+ [P (Cj ) − P (Cj−1 )] = lim P (Cn ). (1.3.10) n→∞ ⎩ ⎭ n→∞ j=2 This is the desired result. Another useful result for arbitrary unions is given by Theorem 1.3.7 (Boole’s Inequality). Let {Cn } be an arbitrary sequence of events. Then ∞ ∞ P Cn ≤ P (Cn ). (1.3.11) n=1 n=1 n Proof: Let Dn = i=1 Ci. Then {Dn } is an increasing sequence of events that go ∞ up to n=1 Cn. Also, for all j, Dj = Dj−1 ∪ Cj. Hence, by Theorem 1.3.5, P (Dj ) ≤ P (Dj−1 ) + P (Cj ), that is, P (Dj ) − P (Dj−1 ) ≤ P (Cj ). In this case, the Ci s are replaced by the Di s in expression (1.3.10). Hence, using the above inequality in this expression and the fact that P (C1 ) = P (D1 ), we have ∞ ∞ ⎧ ⎫ ⎨ n ⎬ P Cn = P Dn = lim P (D1 ) + [P (Dj ) − P (Dj−1 )] n→∞ ⎩ ⎭ n=1 n=1 j=2 n ∞ ≤ lim P (Cj ) = P (Cn ). n→∞ j=1 n=1 20 Probability and Distributions Theorem 1.3.5 gave a general additive law of probability for the union of two events. As the next remark shows, this can be extended to an additive law for an arbitrary union. Remark 1.3.2 (Inclusion Exclusion Formula). It is easy to show (Exercise 1.3.9) that P (C1 ∪ C2 ∪ C3 ) = p1 − p2 + p3 , where p1 = P (C1 ) + P (C2 ) + P (C3 ) p2 = P (C1 ∩ C2 ) + P (C1 ∩ C3 ) + P (C2 ∩ C3 ) p3 = P (C1 ∩ C2 ∩ C3 ). (1.3.12) This can be generalized to the inclusion exclusion formula: P (C1 ∪ C2 ∪ · · · ∪ Ck ) = p1 − p2 + p3 − · · · + (−1)k+1 pk , (1.3.13) where pi equals the sum of the probabilities of all possible intersections involving i sets. When k = 3, it follows that p1 ≥ p2 ≥ p3 , but more generally p1 ≥ p2 ≥ · · · ≥ pk. As shown in Theorem 1.3.7, p1 = P (C1 ) + P (C2 ) + · · · + P (Ck ) ≥ P (C1 ∪ C2 ∪ · · · ∪ Ck ). For k = 2, we have 1 ≥ P (C1 ∪ C2 ) = P (C1 ) + P (C2 ) − P (C1 ∩ C2 ), which gives Bonferroni’s inequality, P (C1 ∩ C2 ) ≥ P (C1 ) + P (C2 ) − 1, (1.3.14) that is only useful when P (C1 ) and P (C2 ) are large. The inclusion exclusion formula provides other inequalities that are useful, such as p1 ≥ P (C1 ∪ C2 ∪ · · · ∪ Ck ) ≥ p1 − p2 and p1 − p2 + p3 ≥ P (C1 ∪ C2 ∪ · · · ∪ Ck ) ≥ p1 − p2 + p3 − p4. EXERCISES 1.3.1. A positive integer from one to six is to be chosen by casting a die. Thus the elements c of the sample space C are 1, 2, 3, 4, 5, 6. Suppose C1 = {1, 2, 3, 4} and C2 = {3, 4, 5, 6}. If the probability set function P assigns a probability of 16 to each of the elements of C, compute P (C1 ), P (C2 ), P (C1 ∩ C2 ), and P (C1 ∪ C2 ). 1.3. The Probability Set Function 21 1.3.2. A random experiment consists of drawing a card from an ordinary deck of 1 52 playing cards. Let the probability set function P assign a probability of 52 to each of the 52 possible outcomes. Let C1 denote the collection of the 13 hearts and let C2 denote the collection of the 4 kings. Compute P (C1 ), P (C2 ), P (C1 ∩ C2 ), and P (C1 ∪ C2 ). 1.3.3. A coin is to be tossed as many times as necessary to turn up one head. Thus the elements c of the sample space C are H, T H, T T H, T T T H, and so forth. Let the probability set function P assign to these elements the respec- tive probabilities 12 , 14 , 18 , 16 1 , and so forth. Show that P (C) = 1. Let C1 = {c : c is H, T H, T T H, T T T H, or T T T T H}. Compute P (C1 ). Next, suppose that C2 = {c : c is T T T T H or T T T T T H}. Compute P (C2 ), P (C1 ∩ C2 ), and P (C1 ∪ C2 ). 1.3.4. If the sample space is C = C1 ∪ C2 and if P (C1 ) = 0.8 and P (C2 ) = 0.5, find P (C1 ∩ C2 ). 1.3.5. Let the sample space be C = {c : 0 < c < ∞}. Let C ⊂ C be defined by C = {c : 4 < c < ∞} and take P (C) = C e−x dx. Show that P (C) = 1. Evaluate P (C), P (C c ), and P (C ∪ C c ). 1.3.6. If the sample space is C = {c : −∞ < c < ∞} and if C ⊂ C is a set for which the integral C e−|x| dx exists, show that this set function is not a probability set function. What constant do we multiply the integrand by to make it a probability set function? 1.3.7. If C1 and C2 are subsets of the sample space C, show that P (C1 ∩ C2 ) ≤ P (C1 ) ≤ P (C1 ∪ C2 ) ≤ P (C1 ) + P (C2 ). 1.3.8. Let C1 , C2 , and C3 be three mutually disjoint subsets of the sample space C. Find P [(C1 ∪ C2 ) ∩ C3 ] and P (C1c ∪ C2c ). 1.3.9. Consider Remark 1.3.2. (a) If C1 , C2 , and C3 are subsets of C, show that P (C1 ∪ C2 ∪ C3 ) = P (C1 ) + P (C2 ) + P (C3 ) − P (C1 ∩ C2 ) − P (C1 ∩ C3 ) − P (C2 ∩ C3 ) + P (C1 ∩ C2 ∩ C3 ). (b) Now prove the general inclusion exclusion formula given by the expression (1.3.13). Remark 1.3.3. In order to solve Exercises (1.3.10)–(1.3.19), certain reasonable assumptions must be made. 1.3.10. A bowl contains 16 chips, of which 6 are red, 7 are white, and 3 are blue. If four chips are taken at random and without replacement, find the probability that: (a) each of the four chips is red; (b) none of the four chips is red; (c) there is at least one chip of each color. 22 Probability and Distributions 1.3.11. A person has purchased 10 of 1000 tickets sold in a certain raffle. To determine the five prize winners, five tickets are to be drawn at random and without replacement. Compute the probability that this person wins at least one prize. Hint: First compute the probability that the person does not win a prize. 1.3.12. Compute the probability of being dealt at random and without replacement a 13-card bridge hand consisting of: (a) 6 spades, 4 hearts, 2 diamonds, and 1 club; (b) 13 cards of the same suit. 1.3.13. Three distinct integers are chosen at random from the first 20 positive integers. Compute the probability that: (a) their sum is even; (b) their product is even. 1.3.14. There are five red chips and three blue chips in a bowl. The red chips are numbered 1, 2, 3, 4, 5, respectively, and the blue chips are numbered 1, 2, 3, respectively. If two chips are to be drawn at random and without replacement, find the probability that these chips have either the same number or the same color. 1.3.15. In a lot of 50 light bulbs, there are 2 bad bulbs. An inspector examines five bulbs, which are selected at random and without replacement. (a) Find the probability of at least one defective bulb among the five. (b) How many bulbs should be examined so that the probability of finding at least one bad bulb exceeds 12 ? 1.3.16. For the birthday problem, Example 1.3.3, use the given R function bday to determine the value of n so that p(n) ≥ 0.5 and p(n − 1) < 0.5, where p(n) is the probability that at least two people in the room of n people have the same birthday. 1.3.17. If C1 ,... , Ck are k events in the sample space C, show that the probability that at least one of the events occurs is one minus the probability that none of them occur; i.e., P (C1 ∪ · · · ∪ Ck ) = 1 − P (C1c ∩ · · · ∩ Ckc ). (1.3.15) 1.3.18. A secretary types three letters and the three corresponding envelopes. In a hurry, he places at random one letter in each envelope. What is the probability that at least one letter is in the correct envelope? Hint: Let Ci be the event that the ith letter is in the correct envelope. Expand P (C1 ∪ C2 ∪ C3 ) to determine the probability. 1.3.19. Consider poker hands drawn from a well-shuffled deck as described in Ex- ample 1.3.4. Determine the probability of a full house, i.e, three of one kind and two of another. 1.3.20. Prove expression (1.3.9). 1.3.21. Suppose the experiment is to choose a real number at random in the in- terval (0, 1). For any subinterval (a, b) ⊂ (0, 1), it seems reasonable to assign the probability P [(a, b)] = b − a; i.e., the probability of selecting the point from a subin- terval is directly proportional to the length of the subinterval. If this is the case, choose an appropriate sequence of subintervals and use expression (1.3.9) to show that P [{a}] = 0, for all a ∈ (0, 1). 1.4. Conditional Probability and Independence 23 1.3.22. Consider the events C1 , C2 , C3. (a) Suppose C1 , C2 , C3 are mutually exclusive events. If P (Ci ) = pi , i = 1, 2, 3, what is the restriction on the sum p1 + p2 + p3 ? (b) In the notation of part (a), if p1 = 4/10, p2 = 3/10, and p3 = 5/10, are C1 , C2 , C3 mutually exclusive? For the last two exercises it is assumed that the reader is familiar with σ-fields. 1.3.23. Suppose D is a nonempty collection of subsets of C. Consider the collection of events B = ∩{E : D ⊂ E and E is a σ-field}. Note that φ ∈ B because it is in each σ-field, and, hence, in particular, it is in each σ-field E ⊃ D. Continue in this way to show that B is a σ-field. 1.3.24. Let C = R, where R is the set of all real numbers. Let I be the set of all open intervals in R. The Borel σ-field on the real line is given by B0 = ∩{E : I ⊂ E and E is a σ-field}. By definition, B0 contains the open intervals. Because [a, ∞) = (−∞, a)c and B0 is closed under complements, it contains all intervals of the form [a, ∞), for a ∈ R. Continue in this way and show that B0 contains all the closed and half-open intervals of real numbers. 1.4 Conditional Probability and Independence In some random experiments, we are interested only in those outcomes that are elements of a subset A of the sample space C. This means, for our purposes, that the sample space is effectively the subset A. We are now confronted with the problem of defining a probability set function with A as the “new” sample space. Let the probability set function P (A) be defined on the sample space C and let A be a subset of C such that P (A) > 0. We agree to consider only those outcomes of the random experiment that are elements of A; in essence, then, we take A to be a sample space. Let B be another subset of C. How, relative to the new sample space A, do we want to define the probability of the event B? Once defined, this probability is called the conditional probability of the event B, relative to the hypothesis of the event A, or, more briefly, the conditional probability of B, given A. Such a conditional probability is denoted by the symbol P (B|A). The “|” in this symbol is usually read as “given.” We now return to the question that was raised about the definition of this symbol. Since A is now the sample space, the only elements of B that concern us are those, if any, that are also elements of A, that is, the elements of A ∩ B. It seems desirable, then, to define the symbol P (B|A) in such a way that P (A|A) = 1 and P (B|A) = P (A ∩ B|A). 24 Probability and Distributions Moreover, from a relative frequency point of view, it would seem logically incon- sistent if we did not require that the ratio of the probabilities of the events A ∩ B and A, relative to the space A, be the same as the ratio of the probabilities of these events relative to the space C; that is, we should have P (A ∩ B|A) P (A ∩ B) =. P (A|A) P (A) These three desirable conditions imply that the relation conditional probability is reasonably defined as Definition 1.4.1 (Conditional Probability). Let B and A be events with P (A) > 0. Then we defined the conditional probability of B given A as P (A ∩ B) P (B|A) =. (1.4.1) P (A) Moreover, we have 1. P (B|A) ≥ 0. 2. P (A|A) = 1. ∞ 3. P (∪∞n=1 Bn |A) = n=1 P (Bn |A), provided that B1 , B2 ,... are mutually ex- clusive events. Properties (1) and (2) are evident. For Property (3), suppose the sequence of events B1 , B2 ,... is mutually exclusive. It follows that also (Bn ∩A)∩(Bm ∩A) = φ, n = m. Using this and the first of the distributive laws (1.2.5) for countable unions, we have P [∪∞ n=1 (Bn ∩ A)] P (∪∞ n=1 Bn |A) = P (A) ∞ P [Bn ∩ A] = n=1 P (A) ∞ = P [Bn |A]. n=1 Properties (1)–(3) are precisely the conditions that a probability set function must satisfy. Accordingly, P (B|A) is a probability set function, defined for subsets of A. It may be called the conditional probability set function, relative to the hypothesis A, or the conditional probability set function, given A. It should be noted that this conditional probability set function, given A, is defined at this time only when P (A) > 0. Example 1.4.1. A hand of five cards is to be dealt at random without replacement from an ordinary deck of 52 playing cards. The conditional probability of an all- spade hand (B), relative to the hypothesis that there are at least four spades in the 1.4. Conditional Probability and Independence 25 hand (A), is, since A ∩ B = B, 13 52 P (B) / 5 P (B|A) = = 13395 13 52 P (A) 4 1 + 5 / 5 13 = 13395 13 = 0.0441. 4 1 + 5 Note that this is not the same as drawing for a spade to complete a flush in draw poker; see Exercise 1.4.3. From the definition of the conditional probability set function, we observe that P (A ∩ B) = P (A)P (B|A). This relation is frequently called the multiplication rule for probabilities. Some- times, after considering the nature of the random experiment, it is possible to make reasonable assumptions so that both P (A) and P (B|A) can be assigned. Then P (A ∩ B) can be computed under these assumptions. This is illustrated in Exam- ples 1.4.2 and 1.4.3. Example 1.4.2. A bowl contains eight chips. Three of the chips are red and the remaining five are blue. Two chips are to be drawn successively, at random and without replacement. We want to compute the probability that the first draw results in a red chip (A) and that the second draw results in a blue chip (B). It is reasonable to assign the following probabilities: 3 P (A) = 8 and P (B|A) = 57. Thus, under these assignments, we have P (A ∩ B) = ( 38 )( 57 ) = 15 56 = 0.2679. Example 1.4.3. From an ordinary deck of playing cards, cards are to be drawn successively, at random and without replacement. The probability that the third spade appears on the sixth draw is computed as follows. Let A be the event of two spades in the first five draws and let B be the event of a spade on the sixth draw. Thus the probability that we wish to compute is P (A ∩ B). It is reasonable to take 1339 11 P (A) = 2523 = 0.2743 and P (B|A) = = 0.2340. 5 47 The desired probability P (A ∩ B) is then the product of these two numbers, which to four places is 0.0642. The multiplication rule can be extended to three or more events. In the case of three events, we have, by using the multiplication rule for two events, P (A ∩ B ∩ C) = P [(A ∩ B) ∩ C] = P (A ∩ B)P (C|A ∩ B). 26 Probability and Distributions But P (A ∩ B) = P (A)P (B|A). Hence, provided P (A ∩ B) > 0, P (A ∩ B ∩ C) = P (A)P (B|A)P (C|A ∩ B). This procedure can be used to extend the multiplication rule to four or more events. The general formula for k events can be proved by mathematical induction. Example 1.4.4. Four cards are to be dealt successively, at random and without replacement, from an ordinary deck of playing cards. The probability of receiving a spade, a heart, a diamond, and a club, in that order, is ( 13 13 13 13 52 )( 51 )( 50 )( 49 ) = 0.0044. This follows from the extension of the multiplication rule. Consider k mutually exclusive and exhaustive events A1 , A2 ,... , Ak such that P (Ai ) > 0, i = 1, 2,... , k; i.e., A1 , A2 ,... , Ak form a partition of C. Here the events A1 , A2 ,... , Ak do not need to be equally likely. Let B be another event such that P (B) > 0. Thus B occurs with one and only one of the events A1 , A2 ,... , Ak ; that is, B = B ∩ (A1 ∪ A2 ∪ · · · Ak ) = (B ∩ A1 ) ∪ (B ∩ A2 ) ∪ · · · ∪ (B ∩ Ak ). Since B ∩ Ai , i = 1, 2,... , k, are mutually exclusive, we have P (B) = P (B ∩ A1 ) + P (B ∩ A2 ) + · · · + P (B ∩ Ak ). However, P (B ∩ Ai ) = P (Ai )P (B|Ai ), i = 1, 2,... , k; so P (B) = P (A1 )P (B|A1 ) + P (A2 )P (B|A2 ) + · · · + P (Ak )P (B|Ak ) k = P (Ai )P (B|Ai ). (1.4.2) i=1 This result is sometimes called the law of total probability and it leads to the following important theorem. Theorem 1.4.1 (Bayes). Let A1 , A2 ,... , Ak be events such that P (Ai ) > 0, i = 1, 2,... , k. Assume further that A1 , A2 ,... , Ak form a partition of the sample space C. Let B be any event. Then P (Aj )P (B|Aj ) P (Aj |B) = k , (1.4.3) i=1 P (Ai )P (B|Ai ) Proof: Based on the definition of conditional probability, we have P (B ∩ Aj ) P (Aj )P (B|Aj ) P (Aj |B) = =. P (B) P (B) The result then follows by the law of total probability, (1.4.2). This theorem is the well-known Bayes’ Theorem. This permits us to calculate the conditional probability of Aj , given B, from the probabilities of A1 , A2 ,... , Ak and the conditional probabilities of B, given Ai , i = 1, 2,... , k. The next three examples illustrate the usefulness of Bayes Theorem to determine probabilities. 1.4. Conditional Probability and Independence 27 Example 1.4.5. Say it is known that bowl A1 contains three red and seven blue chips and bowl A2 contains eight red and two blue chips. All chips are identical in size and shape. A die is cast and bowl A1 is selected if five or six spots show on the side that is up; otherwise, bowl A2 is selected. For this situation, it seems reasonable to assign P (A1 ) = 26 and P (A2 ) = 46. The selected bowl is handed to another person and one chip is taken at random. Say that this chip is red, an event which we denote by B. By considering the contents of the bowls, it is reasonable 3 8 to assign the conditional probabilities P (B|A1 ) = 10 and P (B|A2 ) = 10. Thus the conditional probability of bowl A1 , given that a red chip is drawn, is P (A1 )P (B|A1 ) P (A1 |B) = P (A1 )P (B|A1 ) + P (A2 )P (B|A2 ) ( 26 )( 10 3 ) 3 = =. ( 6 )( 10 ) + ( 46 )( 10 2 3 8 ) 19 16 In a similar manner, we have P (A2 |B) = 19. In Example 1.4.5, the pr