Script - An Intuitive Introduction to Probability - Decision-Making in an Uncertain World PDF
Document Details
Uploaded by OrderlyForesight8859
Damietta University
Tags
Summary
This document provides an introduction explanation to probability theory. It covers different types of probability, including classical, relative frequency, and subjective probability. The document uses examples such as dice rolls, roulette wheels, and stock prices to illustrate these concepts. A subjective understanding of the topic is developed.
Full Transcript
1- PROBABILITY THEORY INTRODUCTION WHY STUDY PROBABILITIES? 1. Over the long term, the S&P500 outperforms corporate bonds, which in turn outperforms government bonds. This is due to uncertainty and risk. 2. Oswald Grübel (former CEO of Credit Suisse and UBS) said referring to banks: “The mo...
1- PROBABILITY THEORY INTRODUCTION WHY STUDY PROBABILITIES? 1. Over the long term, the S&P500 outperforms corporate bonds, which in turn outperforms government bonds. This is due to uncertainty and risk. 2. Oswald Grübel (former CEO of Credit Suisse and UBS) said referring to banks: “The more risk you are willing to take, the more profitable you will be.” 3. Patrick Leach (author of “Why Can’t You Just Give Me The Number?”) on the other side: 1. “Business adds value because of the existence of uncertainty.” 2. “... all value generated by business executives comes - directly or indirectly - from how they manage uncertainty. Without uncertainty, a share of a company’s stock is effectively a bond, with guaranteed cash flows. Guaranteed bonds don’t need management. But stocks (or rather, companies issuing stock) certainly do.” 4. Managers must take important decisions in uncertain business environments. 5. The most fundamental concept of dealing with risk and uncertainty? Probabilities! SOME TERMINOLOGY Random Experiment A process leading to an uncertain outcome. State Space The state space is the collection of all possible outcomes of a random experiment, usually denoted by S. Basic Outcome A possible outcome of a random experiment. Event An event is a subset of basic outcomes. Any event which consists of a single outcome in the state space is called a simple event. Probability A probability is a measure for how likely an event of a random experiment Basic Outcome 2 A possible outcome of a random experiment. Event An event is a subset of basic outcomes. Any event which consists of a single outcome in the state space is called a simple event. Probability A probability is a measure for how likely an event of a random experiment is. Notation: The probability on an event A is usually denoted by P ( A). EXAMPLES Before we can talk about probabilities, we need a good understanding of random experiments, state spaces, basic outcomes and events. Let us look at some simple examples. EXAMPLE 1: THE SIX-SIDED DIE BASIC OUTCOME A basic outcome of rolling a six-sided die is for example a ‘6’. RANDOM EXPERIMENT Rolling a six-sided die is a random experiment, since the outcome is uncertain. However, once the die is rolled, the outcome can be precisely assessed. STATE SPACE The state space S for rolling a six-sided die is equal to {1, 2, 3, 4, 5, 6}. EVENT The eventA = ' score is smaller than 4' is {1, 2, 3}. The event B = ' score is 8' is equal to ∅ (the empty set), because this is impossible with a six-sided die. EXAMPLE 2: ROULETTE WHEEL IN MONTE CARLO Map of Roulette Wheel in Las Vegas with two 'Zeros' 3 6 9 12 15 18 21 24 27 30 33 36 0 2 5 8 11 14 17 20 23 26 29 32 35 00 1 4 7 10 13 16 19 22 25 28 31 34 Map of Roulette Wheel in Monte Carlo with one 'Zero' 3 6 9 12 15 18 21 24 27 30 33 36 0 2 5 8 11 14 17 20 23 26 29 32 35 1 4 7 10 13 16 19 22 25 28 31 34 While a roulette wheel in Las Vegas has two zeros, a roulette wheel in Monte Carlo only has a single zero. Besides this difference, both roulette wheels are the same. 3 While a roulette wheel in Las Vegas has two zeros, a roulette wheel in Monte Carlo only has a single zero. Besides this difference, both roulette wheels are the same. Every outcome on a roulette wheel is equally likely, because each basket at each number has exactly the same size. Therefore it is just as likely that the ball lands on the single zero as on any other number on the roulette wheel. RANDOM EXPERIMENT Rolling a ball in a roulette wheel is a random experiment, since the outcome is uncertain. However, once the ball lands in a basket, the outcome can be precisely assessed. STATE SPACE The state space S for a roulette wheel in Monte Carlo is equal to {1, 2,..., 35, 36, 0}. BASIC OUTCOME A basic outcome of a game of roulette could be for example ‘18’, ‘0’, ‘black’ or ‘odd and red’. Each outcome represents exactly one number, has a color and is even or odd. EVENT The event A = ‘the ball lands on a number smaller than 4’ is {1, 2, 3}. The event B = ' the ball lands on red and black' is ∅ (the empty set), because this is impossible. This can be seen on the map of the roulette wheel above. There are no numbers that are red and black at the same time. Now we can turn to the definition of a probability. In fact, there are three possible definitions that are applied in practice. PROBABILITY DEFINITIONS CLASSICAL PROBABILITY Classical Probability Number of outcomes that satisfy the event P( A) = Total number of outcomes in the state space Remark: This method assumes all outcomes in the sample space to be equally likely to occur. EXAMPLES 4 EXAMPLE 1 CONT’D: THE SIX-SIDED DIE In this example, we throw a six-sided die. Recall that the state space S is equal to {1, 2, 3, 4, 5, 6}. QUESTION What is the probability of getting a ‘6’? Solution. All outcomes for throwing a six-sided die are equally likely (one can say in this case that the die is ‘fair’). Therefore we can use classical probability to determine this particular probability. In this example, S, i.e. the total number of outcomes, is equal to six. There is exactly one outcome that satisfies the event of getting a ‘6’ and this is rolling a ‘6’. Hence, the probability of getting a ‘6’ is exactly equal to 1 6 ( ≈ 0.167). EXAMPLE 2 CONT’D: ROULETTE WHEEL IN MONTE CARLO Map of Roulette Wheel in Monte Carlo with a single 'Zero' 3 6 9 12 15 18 21 24 27 30 33 36 0 2 5 8 11 14 17 20 23 26 29 32 35 1 4 7 10 13 16 19 22 25 28 31 34 Recall that the state space S for a roulette wheel in Monte Carlo is equal to {1, 2,..., 35, 36, 0}. QUESTION What is the probability that the ball lands on a red number? Solution. In this question, S represents the state space that contains all possible outcomes of this random experiment. Every basket on the roulette wheel has the same size and hence all outcomes on the roulette wheel are equally likely. Therefore we can use classical probability to assess this particular probability. In this example, S consists of 37 elements in total. There are 18 outcomes that satisfy the event of getting a red number. Hence, the probability that the ball lands on a red number is 18 equal to ( ≈ 0.486). 37 RELATIVE FREQUENCY PROBABILITY Relative Frequency Probability P( A) = Number of times the event A occurs in repeated trials Total number of trials in a random experiment Remark: P ( A) onverges to the true probability in the limit. 5 Relative Frequency Probability P( A) = Number of times the event A occurs in repeated trials Total number of trials in a random experiment Remark: P ( A) onverges to the true probability in the limit. EXAMPLES EXAMPLE 3: COMBUNOX Combunox is a drug that contains a combination of oxycodone and ibuprofen. Oxycodone is an opioid pain medication. Ibuprofen is a non steriodal anti- inflammatory drug (NSAID). Combunox works by reducing substances in the body that cause pain, fever, and imflammation. During an experiment, it was reported that 49 out of 923 people vomited after taking Combunox. QUESTION What is the probability of vomiting after taking Combunox? Solution. We want to determine the relative frequency that people vomited after taking combunox and therefore we can use relative frequency probability to determine this particular probability. The total number of trials in this random experiment is equal to 923. The number of times that people vomited is equal to 49. The 49 probability of vomiting after taking Combunox is equal to ( ≈ 5.3 % ). 923 EXAMPLE 4: GOOGLE STOCK PRICE The end-of-day stock prices of Google are given in the figure below from 04-01-2014 to 04-01-2015 (which is equivalent to roughly 250 trading days). QUESTION Assume it is 04-01-2015. Based on the figure above, what is the probability that there will be a stock price increase on the next trading day? Hint: first determine the state space. Solution. In order to determine the state space initially, we need to determine all possible prices that the Google stock can attain. _ The minimum price of the stock is zero. 6 _ The maximum price cannot be derived from historical stock price data with certainty. We can define the following three events for the Google stock price change: 1. ‘Up’ (denoted by u), in case the stock price increases. 2. ‘Even’ (denoted by m), in case the stock price remains unchanged. 3. ‘Down’ (denoted by d ), in case the stock price decreases. Therefore the state space S is equal to u, m, d . Even though we defined the state space, it is impossible to determine the probability for each of the three elements in the state space, let alone for each price that is theoretically possible. SUBJECTIVE PROBABILITY Subjective Probability An individual opinion or belief about the probability of occurence. EXAMPLES EXAMPLE 5: STEVE JOBS Steve Jobs had to use subjective probability in order to predict the probability of a success of the first iPad. There was no data available on which Steve Jobs could rely (except his experience and beliefs) and therefore he could not use relative frequency probability or classical probability. Considering the huge success of the iPad, Steve Jobs assessed this subjective probability quite well. EXAMPLE 6: WHAT IS YOUR SUBJECTIVE PROBABILITY? Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. QUESTION Which of the following eight options is more likely for Linda to be? (1) Linda is a teacher in elementary school. (2) Linda works in a bookstore and takes Yoga classes. (3) Linda is active in the feminist movement. (4) Linda is psychiatric social worker. (5) Linda is a member of the League of Women Voters. (6) Linda is a bank teller. (7) Linda is an insurance salesperson. (8) Linda is a bank teller and active in a feminist movement. SET THEORY 7 SET THEORY INTRODUCTION For simple probability calculations, it is useful to know just a little set theory. Therefore we cover in this section a few basic definitions. This part may be boring, but please stay with us. Below are a few definitions for arbitrary sets A and B. SET THEORY: DEFINITIONS Union The union of two events A and B is the set that contains all the events that are either in A or in B or in both sets. Notation: A ⋃ B. Intersection The intersection of two events A and B is the set that contains all the events that are both in A and in B. Notation: A ⋂ B. Complement The complement of an event A with respect to an event B refers to all elements that are in B, but not in A. This is usually denoted by B ⋂ A c or B\A. Mutually Exclusive Two events A and B are said to be mutually exclusive if they have no basic outcomes in common. This is usually denoted by B ⋂ A = ∅. Collectively Exhaustive Events A1 , A 2 ,..., An are said to be collectively exhaustive events if A1 ⋃ A 2... ⋃ An = S, i.e. the events in union completely cover the sample space S. SET THEORY: ILLUSTRATIONS WITH VENN DIAGRAMS 8 The shaded area is A ⋃ B The shaded area is A ⋂ B The shaded area is A^c Illustration of Mutually Exclusive EventsS A B A⋂B=∅ EXAMPLES EXAMPLE 1 CONT’D: THE SIX-SIDED DIE In this example, we throw a six-sided die. Let A be the event that the outcome is either 1, 2 or 3. Let B be the event that the outcome is an even number, so either 2, 4 or 6. QUESTION Determine the following sets: _ The union of A and B, A ⋃ B. _ The intersection of A and B, A ⋂ B. _ The completement of A, A c. 9 _ The complement of B, B c. Solution. S for throwing a six-sided die is equal to First of all, recall that the state space {1, 2, 3, 4, 5, 6}. Also, both sets A and B are a subset of S (that means, both A and B are ‘contained’ inside S). In this example, A = {1, 2, 3} and B = { 2, 4, 6}. _ A ⋃ B = {1, 2, 3, 4, 6}, as these are the numbers that are either in A or in B. _ A ⋂ B = { 2}, as this is the only number that is both in A and in B or both. _ A c = {4, 5, 6}, as these are all the numbers in S that are not in A. _ B c = {1, 3, 5}, as these are all the numbers in S that are not in B. The shaded area is A ⋃ B The shaded area is A ⋂ B The shaded area is A^c The shaded area is B^c 10 LAWS OF PROBABILITY INTRODUCTION Given a sample space S, the probabilities assigned to events must always satisfy the three laws of probability, which are listed below. LAWS OF PROBABILITY Law 1 P( S) = 1, where S is the state space. Law 2 For any event A, 0 ≤ P( A) ≤ 1. The probability of an event can never be negative or larger than 1. Law 3 A and B (disjoint means that A ⋂ B = ∅ ), For two disjoint events P ( A ⋃ B) = P( A) + P( B). TRIVIAL IMPLICATION FOR LAWS OF PROBABILITY THEORY A and B, it always holds that P( A ⋂ B) ≤ P( A) and also that For two events P( A ⋂ B) ≤ P( B). The Venn diagrams below show this trivial implication. The shaded area is A 11 The shaded area is B The shaded area is A ⋂ B EXAMPLES EXAMPLE 6 CONT’D: WHAT IS YOUR SUBJECTIVE PROBABILITY? Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. QUESTION Which of the following eight options is more likely for Linda to be? (1) Linda is a teacher in elementary school. (2) Linda works in a bookstore and takes Yoga classes. (3) Linda is active in the feminist movement. (4) Linda is a psychiatric social worker. (5) Linda is a member of the League of Women Voters. (6) Linda is a bank teller. (7) Linda is an insurance salesperson. (8) Linda is a bank teller and active in a feminist movement. Solution. Out of the eight possible events that we have, it is possible to say something about the relative likelihood of the following three events: (3) Linda is active in the feminist movement. (6) Linda is a bank teller. (8) Linda is a bank teller and active in a feminist movement. First, let A be the event that Linda is a bank teller and B be the event that Linda is active in a feminst movement. By using trivial implication, we get: _ P( A ⋂ B) ≤ P( A), _ P( A ⋂ B) ≤ P( B). Therefore the event that Lisa is active in a feminist movement and also a bank teller is less likely than that she is only a bank teller. 12 INDEPENDENCE INTRODUCTION QUESTION Assume you throw a fair, six-sided die once. You write down the outcome and then you throw the die again. _ What is the probability of getting a ‘1’ after the first roll? _ What is the probability of getting a ‘2’ after the second roll, given that you got a ‘1’ after the first roll? Solution. 1 _ The probability of getting a ‘1’ after the first roll is. 6 _ The probability of getting a ‘2’ after the second roll is completely unaffected by the outcome of the first roll. In other words, the probability of getting a ‘2’ after the second roll is independent from previous outcomes. Therefore the probability of getting a ‘2’ after the second roll 1 is also equal to. 6 QUESTION Assume you throw again a fair six-sided die twice. What is the probability that you first roll a ‘1’ and then a ‘2’? Solution. This question is slightly different from the previous question. In this question, we need to calculate P( ' 1' ⋂ ' 2'). As determined in the previous question, both 1 individual probabilities are equal to. We can multiply the individual probabilities, 6 because two subsequent rolls of a die are independent. Therefore we get P( ' 1' ⋂ ' 2') = P( ' 1') · P( ' 2') = 1 · 1 = 1 ( ≈ 0.0278). 6 6 36 13 INDEPENDENCE Two events A and B can be independent in the sense that the outcome of A does not influence the outcome of B and vice versa. Multiplication Rule Two events A and B are (statistically) independent if and only if the probability of both A and B occuring is the product of the probabilities of the two events, P( A ⋂ B) = P A and B = P( A) · P( B). EXAMPLES EXAMPLE 7: GAME OF CRAPS Craps is a dice game in which the players make wagers on the outcome of the roll, or a series of rolls, of a pair of dice. Let us assume that there are two dice and that each player can bet on the sum of the two dice. Note that the sum is at least equal to 2 (a ‘1’ and a ‘1’) and at most equal to 12 (a ‘6’ and a ‘6’). The state space S for the sum of the two dice is therefore equal to { 2, 3,..., 11, 12}. QUESTION _ What is the probability that the sum is equal to the the minimum value (e.g. 2)? _ What is the probability that the sum is equal to the maximum value (e.g. 12)? Solution. The outcome of the first die is completely independent from the outcome of the second die. We know that the individual probabilities of getting for example a ‘1’ or 1 a ‘6’ with either of the two dice is equal to. Since the outcomes of the two dice 6 are independent, we can use the multiplication rule. We get: 1 _ P Sum equals 2 = P( ' 1' ⋂ ' 1') = P( ' 1') · P( ' 1') = 6 · 61 = 1 36. 1 _ P Sum equals 12 = P( ' 6' ⋂ ' 6') = 36 QUESTION _ What is the probability that the sum is equal to the the median value (e.g. 7)? _ What is the probability that the sum is equal to the second largest value (e.g. 11)? Solution. _ The outcome of the first die is independent from the outcome of the second die. The outcomes of the first respectively the second die can be: - ‘3 and 4’ or ‘4 and 3’ or ‘2 and 5’ or ‘5 and 2’ or ‘1 and 6’ or ‘6 and 1’. Hence, there are in total six possible combinations of numbers that satisfy the event ‘sum equal to 7’ and there are 36 possible outcomes in 6 total. Therefore P Sum equals 7 = 36 = 61. 14 The outcome of the first die is independent from the outcome of the second die. The outcomes of the first respectively the second die can be: - ‘3 and 4’ or ‘4 and 3’ or ‘2 and 5’ or ‘5 and 2’ or ‘1 and 6’ or ‘6 and 1’. Hence, there are in total six possible combinations of numbers that satisfy the event ‘sum equal to 7’ and there are 36 possible outcomes in 6 total. Therefore P Sum equals 7 = 36 = 61. _ The outcome of the first die is again independent from the outcome of the second die. The outcomes of the first respectively the second die can be: - ‘6 and 5’ or ‘5 and 6’. Hence, there are in total two possible combinations of numbers that satisfy the event ‘sum equal to 11’ and there are 36 possible outcomes in 2 1 total. Therefore P Sum equals 11 = 36 = 18. EXAMPLE 7 CONT’D: GAME OF CRAPS The probability distribution for the sum of two dice is not uniform (i.e. not each sum is equally likely to occur). We can simulate the outcomes of the two dice multiple times and see how often we get each sum. new set of trials total trials 500 number of trials 50 50 trials out of 500 LAW OF LARGE NUMBERS It is clear from the simulation above that the probability that the sum equals 7 is far greater than the probability that the sum equals 2 or 3. If it were possible to do the simulation infinitly often, all probabilities will converge to their real exact probabilities. If we perform a large number of simulations (for example 1.000 or 10.000), we can already expect it to converge to quite accurate estimations of the true probabilities. This essentially means that in the long run, the relative frequency probability converges to classical probability. This principle is called the law of large numbers. EXAMPLE 8: APPLE STOCK Suppose that during the last 250 trading days, the Apple stock went up (denoted by u) on 150 days and went down (denoted by d ) on 100 days. There were no days that the stock price of Apple remained unchanged and therefore we omit this event in this example. We assume that the probability of a future up movement respectively _ a future down movement of the stock price is equal to: 150 _ P(u) = = 0.6, 250 15 100 _ P d = 250 = 0.4. In addition we assume that the performance of the Apple stock price on the current trading day is independent of the performance of the Apple stock price on previous trading days. This means that the probability of an up or down movement from one day to the next is not affected by the number of previous up and down movements, e.g. if the Apple stock went up three days in a row, the probability that it goes up the next day remains unchanged. Do you think this assumption is reasonable? QUESTION In this question, we consider a week from Monday to Friday. _ What is the probability that the stock price of Apple will increase on each consecutive day in the week? _ What is the probability that the Apple stock price will decrease on Monday, but subsequently will increase on Tuesday, Wednesday, Thursday and Friday? _ What is the probability that the Apple stock price will decrease on either Monday, Tuesday, Wednesday, Thursday or Friday and will increase on the other four days? Solution. _ There are five trading days in a week, so we need to calculate the probability that there will be five subsequent up movements of the Apple stock price. It is given that each movement of the stock price is independent of the previous one. Therefore, we get by the multiplication rule: Independence P(u ⋂ u ⋂ u ⋂ u ⋂ u) = P(u) · P(u) · P(u) · P(u) · P(u) = 0.6 · 0.6 · 0.6 · 0.6 · 0.6 = 0.6 5 ≈ 0.078. _ Similar reasoning as for the previous question, we get by the multiplication rule: Independence Pd ⋂ u ⋂ u ⋂ u ⋂ u = Pd · P(u) · P(u) · P(u) · P(u) = 0.6 · 0.4 · 0.6 3 ≈ 0.052. _ The decrease could happen either on Monday, Tuesday, Wednesday, Thursday or Friday. As calculated in the previous question, the probability that the down movement happens on for example Monday is equal to 0.052. This probability does not depend on which day the down movement happens! Therefore, by using the multiplication rule for each scenario and summing these up, we get: Pd ⋂ u ⋂ u ⋂ u ⋂ u + Pu ⋂ d ⋂ u ⋂ u ⋂ u + Pu ⋂ u ⋂ d ⋂ u ⋂ u + Pu ⋂ u ⋂ u ⋂ d ⋂ u + Pu ⋂ u ⋂ u ⋂ u ⋂ d ≈ 5 · 0.052 = 0.26. 16 APPLICATIONS MONTY HALL PROBLEM The Monty Hall problem is based on the game shows ‘Let’s make a Deal’ and named after the host Monty Hall. There are three doors. Behind two doors is a goat and behind one door there is a car. The game contestant does not know behind which door the car is and he must find the car in order to win it. The game contestant can choose one door. The host, who knows behind which door the car is, then opens a door not chosen by the contestant and with a goat behind. The game contestant is then asked if he wants to stay with his initial door or switch to the other door that is still closed. ☹ ☹ ☺ Goat Goat Car QUESTION Should the game contestant switch doors or stick to his original choice? Solution. We only give the idea of the correct solution. The formal solution is beyond the scope of this lecture. We have: 1 _ P ' car behind door number 1' = 3 , 2 _ P ' car behind door number 2' or ' car behind door number 3' = 3. Therefore switching doors means the host automatically ‘pushes’ the game contestant out of one wrong door. We get: 2 P ' winning' given ' switching' = 3 and P ' winning' given ' not switching' = 13. number of trials 462 chance of winning without switching = 0.335 chance of winning with switching = 0.665 wins 300 250 200 150 100 50 trials 100 200 300 400 17 THE WORLD’S MOST INTELLIGENT WOMAN The Monty Hall Problem became famous when it was addressed as a question to Marilyn vos Savant’s “Ask Marilyn” column in Parade magazine in 1990. In 1991, she was listed in the Guinness Book of World Records Hall of Fame for “Highest I.Q.”. She answered a question similar to the Monty Hall problem and came up with the solution that switching doors is the only rational choice the game contestant can make. Her arguments were heavely critized by Professors and PhD’s in Mathematics. Some quotes: _ - Mary Jane Still, Professor at Palm Beach Junior College: “Our math department had a good, self-righteous laugh at your expense” _ - Robert Sachs, Professor of Mathematics at George Mason University. “You blew it!” _ - E. Ray Bobo, Professor of Mathematics at Georgetown University. “You are utterly incorrect” TAKE AWAY RANDOM EXPERIMENT A process leading to an uncertain outcome. STATE SPACE The state space is the collection of all possible outcomes of a random experiment, denoted by S. BASIC OUTCOME This is the outcome of a random experiment. EVENT An event is a subset of basic outcomes. PROBABILITY A probability is a measure for how likely an outcome of a random experiment is. For an event A, this is usually denoted by P ( A). LAWS OF PROBABILITIES Law 1. For every state space S, P( S) = 1. Law 2. For any event A, 0 ≤ P( A) ≤ 1. Law 3. For disjoint events A and B, P( A ⋃ B) = P( A) + P( B). 18 INDEPENDENCE Two events A and B are (statistically) independent if and only if the probability of both A and B occuring is the product of the probabilities of the two events, P( A ⋂ B) = P A and B = P( A) · P( B). BEST PRACTICES 1. Make sure that your sample space includes all of the possibilities. 2. Check that the probabilities assigned to all of the possible outcomes add up to 1. PITFALLS 1. Do not multiply probabilities of dependent events. 2. Avoid assigning the same probability to every outcome (unless, of course, you are convinced that all outcomes are equally likely). 3. Do not confuse independent events with disjoint events. 2 - CONDITIONAL PROBABILITY INTRODUCTION CONTINGENCY TABLE Contingency Table A contingency table shows counts of cases on one categorical variable contingent on the value of another (for every combination of both variables). Contingency tables are useful for calculating conditional probabilities, as they contain all the ingredients necessary for the computation. EXAMPLES EXAMPLE 1: AMAZON.COM We want to investigate which host sends more buyers to the internet shopping website Amazon.com. In order to answer this question, we must gather data on two (categorical) variables: 1. The host that identifies the originating site, which is either MSN, RecipeSource, or Yahoo. 2. A binary variable that indicates whether the visit results in a purchase. The contingency table below shows data for web shoppers at Amazon.com. It contains the following pieces of information: _ The website through which a customer ended up at Amazon.com _ Whether the customer made a purchase or not MSN Recipe Source Yahoo Total No Purchase 6.973 4.282 5.848 17.103 Purchase 285 1 230 516 Total 7.258 4.283 6.078 17.619 The contingency table above shows counts (number of customers) and not probabilities. We can transform the counts into probabilities by dividing each number by the total (17.619 in this case). The table below shows the probabilities corresponding to the counts in the contingency table above. MSN Recipe Source Yahoo Total No Purchase 0.396 0.243 0.332 0.971 Purchase 0.016 0.000 0.013 0.029 Total 0.412 0.243 0.345 1.0 PROBABILITY TYPES 2 PROBABILITY TYPES MARGINAL, JOINT AND CONDITIONAL PROBABILITIES Marginal Probability The marginal probability is the unconditional probability on an event. This probability is not conditioned on any other event. Notation: The marginal probabilities of two arbitrary events A and B is denoted by respectively P( A) and P( B). Joint Probability The joint probability is the probability of simultaneous events. This is the probability of the intersection of two or more events. Notation: The joint probability of two arbitrary events A and B is denoted by P( A ⋂ B). Conditional Probability The conditional probability is the probability of an event given that some other event has occured. The information from the event that has occured can influence the probability of the original event. A and B, the conditional probability of A given Notation: For two events B is denoted by P( A B) and the conditional probability of B given A is denoted by P( B A). General Rule In general, for two arbitrary events A and B, P( A B) ≠ P( B A). The first figure below shows the ‘old’ sample space S for two events A and B. Note that there is an overlap between A and B. The second figure below shows the ‘new’ sample space S for two events A and B, because it shows the relation with respect to conditional probabilities. The conditional probability can be calculated by dividing the probability of the dark blue area ( P( A ⋂ B)) by the probability of the light blue area ( P( B)). Note that the area of A ⋂ B can be at most as large as the area of B (which is the case if B is fully contained in A). Therefore, if we calculate the conditional probability P( A B) = P( PA(⋂ B) B) , we immediately have that P( A B) is smaller than or equal to 1. blue area ( P( A ⋂ B)) by the probability of the light blue area ( P( B)). Note that the area of A ⋂ B can be at most as large as the area of B (which is the case if B is fully contained in A). Therefore, if we calculate the conditional probability 3 P( A B) = P( PA(⋂ B) B) , we immediately have that P( A B) is smaller than or equal to 1. 'Old' Sample SpaceS A B Relationship: P(A|B) = P(A ⋂ B) / P(B) PROOF OF P( A B) ≠ P( B A) Assume that: _ P( A) ≠ P( B),P( A) ≠ 0 and P( B) ≠ 0. We get: P( A ⋂ B) [ P( A ⋂ B) = P( B ⋂ A)] P( B ⋂ A) P( B ⋂ A) P( A B) = P( B) = P( B) ≠ P ( A) = P( B A), where we used in the second last step that P( A) ≠ P( B). EXAMPLES EXAMPLE 1 CONT’D: AMAZON.COM Consider the internet shopping example we saw earlier. Recall that the contingency table showed the following pieces of information: _ The website through which a customer ended up at Amazon.com _ Whether the customer made a purchase or not The probability types that are given in a table corresponding to the contingency table are always _ The marginal probabilities _ The joint probabilities The conditional probabilities can be calculated based upon these marginal and joint probabilities. In the table below, the marginal probabilities are visible in the column ‘Total’ and the row ‘Total’. The joint probabilities are visible in the columns ‘MSN’, ‘Recipe Source’ and ‘Yahoo’ and the rows ‘NP’ and ‘P’. MSN Recipe Source Yahoo Total P P(MSN⋂P) P(Recipe Source⋂P) P(Yahoo⋂P) P(P) NP P(MSN⋂NP) P(Recipe Source⋂NP) P(Yahoo⋂NP) P(NP) 4 P(MSN⋂NP) P(Recipe Source⋂NP) P(Yahoo⋂NP) P(NP) Total P(MSN) P(Recipe Source) P(Yahoo) 1 QUESTION Consider again the internet shopping example that we saw earlier. Customers from which website are most likely to make a purchase on Amazon.com? Hint: treat that customers make a purchase as a given. Solution. The table below shows again the data (in terms of probabilities) for web shopping at Amazon.com. The table shows the website through which a customer ended up at Amazon.com and whether the customer made a purchase or not. MSN Recipe Source Yahoo Total No Purchase 0.396 0.243 0.332 0.971 Purchase 0.016 0.000 0.013 0.029 Total 0.412 0.243 0.345 1 In the question it is given that the customer made a purchase. Therefore we need to calculate the following probabilities: _ P MSN Purchase _ P Recipe Source Purchase _ PYahoo Purchase We get: P( MSN ⋂ Purchase) _ P MSN Purchase = P( Purchase) = 0.016 0.029 ≈ 0.552. P( Recipe Source ⋂ Purchase) _ P Recipe Source Purchase = P( Purchase) = 0.000 0.029 ≈ 0. _ PYahoo Purchase = P(Yahoo ⋂ Purchase) P( Purchase) = 0.013 0.029 ≈ 0.448. Therefore, based on the contingency table above, it is most likely that visitors from MSN make a purchase at Amazon.com PROBABILITY TREES PROBABILITY TREE Probability Tree A probability tree is a graphical depiction of conditional probabilities. It shows a sequence of events as a path, like branches of a tree. 5 EXAMPLES EXAMPLE 2: SUCCESS OF TV ADVERTISING Assume there are three programs that can be viewed on a Sunday evening. Viewers can either watch ‘60 Minutes’, ‘Desperate Housewives’ or a football match. We want to investigate how successfull TV advertisement is given the TV program that can be watched. For each program, we collect data on the percentage of viewers that: _ Watch the ads, _ Skip the ads. MARGINAL PROBABILITIES It is given that the viewers watch the three respective shows with the following probabilities: _ P (60 Minutes) = 0.15, _ P Desperate Housewives = 0.35, _ P Football match = 0.5. The three probabilities above represent marginal probabilities. These probabilities are illustrated in the probability tree below. TV Show 0.15 0.35 0.5 60 Minutes Desperate Housewives Football match CONDITIONAL PROBABILITIES There are six different conditional probabilities on whether a viewer watches ads or not, e.g.: _ P Sees ads Football Match = 0.5, _ P Skips ads Football Match = 0.5, _... _ P Skips ads 60 Minutes = 0.1. These conditional probabilities are illustrated in the probability tree below. 6 TV Show 0.15 0.35 0.5 60 Minutes Desperate Housewives Football match 0.9 0.1 0.2 0.8 0.5 0.5 60 Minutes 60 Desperate and Minutes Desperate Housewives Housewives FootballFootball match match Seesand adsSkips andads Sees andads andads Skips Seesandads Skips ads WORKING WITH PROBABILITY TREES Working with Probability Trees Let A1 , A 2 ,..., An be n mutually exclusive and collectively exhaustive events. In addition, let B1 , B 2 ,..., Bk be k mutually exclusive and collectively exhaustive events. Then P( A1 ⋂ B 2) is the joint probability on the events A1 and B 2. Computing the marginal probability of A1 can be done in the following way: P( A1) = P( A1 ⋂ B1) + P( A1 ⋂ B 2) +... + P( A1 ⋂ Bk). EXAMPLES QUESTION Using the probability tree above, what is the joint probability for ‘Football match’ and ‘Sees Ads’ (e.g. P Sees Ads ⋂ Football match) and what is the marginal probability for ‘Sees Ads’ (e.g. P Sees Ads)? Solution. _ P Sees Ads ⋂ Football match. We know from conditional probability that P Sees Ads ⋂ Football match =. It is given that P Sees Ads Football match · P Football match P Football match = 0.5 and P Sees Ads Football match = 0.5. Therefore P Sees Ads ⋂ Football match = 0.5 · 0.5 = 0.25. _ P Sees Ads. We know from ‘working with probability trees’ that we can calculate P Sees Ads the following way: P Sees Ads = P60 Min ⋂ Sees Ads +. We get: P D.H.⋂Sees Ads + P Football match ⋂ Sees Ads P Sees Ads = 0.15 · 0.9 + 0.35 · 0.2 + 0.5 · 0.5 = 0.455. P Sees Ads. 7 We know from ‘working with probability trees’ that we can calculate P Sees Ads the following way: P Sees Ads = P60 Min ⋂ Sees Ads +. We get: P D.H.⋂Sees Ads + P Football match ⋂ Sees Ads P Sees Ads = 0.15 · 0.9 + 0.35 · 0.2 + 0.5 · 0.5 = 0.455. QUESTION Consider the previous question. Fill in the contingency table below. 60 Minutes Desperate Housewives Football match Total Sees Ads Skips Ads Total 1.0 Solution. P Sees Ads = 0.455 and From the previous question, we know that P Sees Ads ⋂ Football match = 0.25. Therefore we can calculate P Skips Ads and P Skips Ads ⋂ Football match. We get: _ P Skips Ads = 1 - 0.455 = 0.545, _ P Skips Ads ⋂ Football match =. P Football match - P Football match ⋂ Sees Ads = 0.5 - 0.25 = 0.25 P(60 Minutes) = 0.15, From the probability tree, we know that P Desperate Housewives = 0.35 and P Football match = 0.50. Similar reasoning as in the previous question for the other joint probabilities yields the contingency table below. 60 Minutes Desperate Housewives Football match Total Sees Ads 0.135 0.07 0.25 0.455 Skips Ads 0.015 0.28 0.25 0.545 Total 0.15 0.35 0.50 1.0 EXAMPLE 3: SPAM FILTER Assume workers of a company want to filter out junk mail from important mail messages. They base their method on past data. For example, 20 % of all emails that were considered junk mail contained the word combination “Nigerian general”. Past data indicates the following probabilities: _ P Nigerian general appears Junk mail = 0.20, _ P Nigerian general appears Not junk mail = 0.001, _ _ P Junk mail = 0.50. QUESTION Fill in the contingency table below. Junk mail Not junk mail Total 8 Not junk mail Nigerian general appears Nigerian general does not appear Total 1.0 Solution. _ It is given that P Junk mail = 0.5. Therefore P Not junk mail = 1 - 0.5 = 0.5. _ P Nigerian general appears ⋂ Junk mail = P Nigerian general appears Junk mail · P Junk mail = 0.20 · 0.50 = 0.10. _ P Nigerian general appears ⋂ Not junk mail = P Nigerian general appears Not junk mail · P Not junk mail = 0.001 · 0.50 = 0.0005. _ Similar reasoning for P Nigerian general does not appear ⋂ Junk Mail and P Nigerian general does not appear ⋂ Not junk mail leads to the contingency table below. Junk mail Not junk mail Total Nigerian general appears 0.1 0.0005 0.1005 Nigerian general does not appear 0.4 0.4995 0.8995 Total 0.5 0.5 1.0 QUESTION Using the contingency table above, calculate the probability that an email should be considered junk mail given that the phrase “Nigerian general” appears. Solution. P Junk mail Nigerian general appears =. P( Junk mail ⋂ Nigerian general appears) 0.1 P( Nigerian general appears) = 0.1005 = 0.995 We can conclude that email messages to this employee with the phrase “Nigerian general” have a high probability (more than 99 % ) of being spam. The spam filter should move emails containing this phrase straight to the junk folder. APPLICATIONS BIRTHDAY PROBLEM 9 QUESTION Assume there are 30 people in one room. How much would you bet that there is nobody in the room who shares the same birthday, assuming that a year has 365 days? Solution. Let n be the number of people in the room. We denote A the event that at least two people share the same birthday. The probability on A is then denoted by P ( A). Calculating P ( A) results in the following table: A 5 10 20 30 40 50 60 P(A) 0.0271 0.1169 0.4114 0.7063 0.8912 0.9794 0.9941 Note that for only 30 people, there is almost a 71 % chance that at least two people share the same birthday. This is surprisingly likely for a group that seems so small compared to the number of days in a year! The graph below shows on the x-axis the number of people in one room and on the y-axis the probability that at least two people share the same birthday. number of people chosen at random (n) 64 Probability of at Least 2 of the 64 People Chosen Sharing the Same Birthday = 99.7190 % 100 80 Probability of a Match (%) 60 40 20 0 0 20 40 60 80 100 Number of People CALCULATION Let us make the following two assumptions: _ Every day of the year is equally likely to be a birthday _ There are 365 days in a year The probability that at least two people share the same birthday (denoted by P( A)) is equal to 1 minus the probability that nobody shares the same birthday (denoted by P( A c)). 10 The probability that at least two people share the same birthday (denoted by P( A)) is equal to 1 minus the probability that nobody shares the same birthday (denoted by P( A c)). For n people in the room, there are in total365 n possible outcomes and the event A can happen in 365 · 364 ·... · ( 365 - n + 1) ways. We get: 365· 364· 363·...·( 365-n+ 1) P( A) = 1 - P( A c) = 1 - 365 n. Putting n = 30 people into the formula above yields indeed an probability of 0.7063 that at least two people share the same birthday. NAIVE BAYES CLASSIFICATION “Naive Bayes is one of the most efficient and effective inductive learning algorithms for machine learning and data mining. Its competitive performance in classification is surprising, because the conditional independence assumption on which it is based, is rarely true in real-world applications.” (Zhang, 2004) _ Naive Bayes is useful for classification of emails whether they belong to the spam folder or are legitimate. _ Naive Bayes is useful for machine learning, (algorithms that can learn from data). _ Naive Bayes is useful for clustering techniques, where all kinds of objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). BAYES’ RULE Naive Bayes Classification Naive Bayes classification depends on Bayes’ Rule. For two events A and B, Bayes’ rule can be expressed by the following relationship: P( B ⋂ A) P( B ⋂ A)= P( A ⋂ B) P( A ⋂ B) P( A B)· P( B) P( B A) = P( A) = P( A) = P( A) , provided that P( A) ≠ 0. Here, P ( B) is called the a priori probability of B and P( B A) is called the a posteriori probability of B. Note that we expressed the probability of B given A in terms of A given B. Conditional Independence Assumption Naive Bayes Classification relies heavily on two assumptions: 1. All events are equally important (they all have equal weights). 2. All events are mutually independent. 11 EXAMPLE 4: A NEW DAY Let us assume the weather is as follows: Outlook Temperature Humidity Windy Play yes no yes no yes no yes no yes no Sunny 2 3 Hot 2 2 High 3 4 False 6 2 9 5 Overca- 4 0 Mild 4 2 Normal 6 1 True 3 3 st Cold 3 1 Rainy 3 2 Converting the counts into probabilities yields the table below. Note that the table above is not a contingency table! Outlook Temperature Humidity Windy Play yes no yes no yes no yes no yes no 2 3 2 2 3 4 6 2 9 5 Sunny Hot High False 9 5 9 5 9 5 9 5 14 14 4 4 2 6 1 3 3 Overca- 0 Mild Normal True 9 9 5 9 5 9 5 st 3 1 3 2 Cold Rainy 9 5 9 5 QUESTION What is the probability that the outlook is sunny, the temperature is considered cold, the humidity is high and that it is windy and play is either ‘yes’ or ‘no’? Solution. We get for a sunny outlook, a cool temperature, high humidity, windy weather and play equal to yes: _ 2 9 · 93 · 93 · 93 · 14 9 ≈ 0.0053. We get for no sunny outlook, no cool temperature, no high humidity, no windy weather and play equal to no: 3 _ 5 · 15 · 45 · 35 · 145 ≈ 0.0206. So, transforming this to probabilities by normalizing the above values, we get 0.0053 _ P( ' yes') = 0.0053 + 0.0206 = 0.205 = 20.5 %. 0.0206 _ P( ' no') = 0.0053 + 0.0206 = 0.795 = 79.5 %. We can immediately see that these probabilities add up to 1. TAKE AWAY ORDER OF UNIONS AND INTERSECTIONS For events A and B, 1. P( A ⋂ B) = P( B ⋂ A). 2. P( A ⋃ B) = P( B ⋃ A). 12 CONTINGENCY TABLE A contingency table shows counts of cases on one categorical variable contingent on the value of another (for every combination of both variables). GENERAL RULE In general, for two events A and B, P( A B) ≠ P( B A). MARGINAL PROBABILITY The unconditional probability on an event. JOINT PROBABILITY The probability on simultaneous events. CONDITIONAL PROBABILITY The probability of an event given some other event. For two events A and B, it holds that: P ( A ⋂ B) P( A B) = P ( B) , where P( B) ≠ 0. P ( B ⋂ A) P( B A) = P( A) , where P( A) ≠ 0. INDEPENDENCE Two events A and B are independent if P( A ⋂ B) = P( B ⋂ A) = P( A) · P( B) A and B, combining the rules for conditional probability and independence yields: For independent events P( A B) = P( PA(⋂ B) B) = P( AP)(·BP)( B) = P( A). P( B ⋂ A) P( A)· P( B) P( B A) = P( A) = P( A) = P( B). PROBABILITY TREE A probability tree is a graphical depiction of conditional probabilities. BAYES’ RULE P( A B)· P( B) P( B A) = P( A) , provided that P( A) ≠ 0. BEST PRACTICES 1. Presume events are dependent and use the General Multiplication Rule. 2. Use tables to organize probabilities. 3. Check that you have included all events. PITFALLS 1. Do not confuse P( A B) for P( B A). 2. Do not confuse counts with probabilities in contingency tables. 3. Understand that conditional probabilities are not shown in contingency tables, but can be calculated directly from them. 13 1. Do not confuse P( A B) for P( B A). 2. Do not confuse counts with probabilities in contingency tables. 3. Understand that conditional probabilities are not shown in contingency tables, but can be calculated directly from them. 4 - RANDOM VARIABLES INTRODUCTION RANDOM VARIABLES Random Variable A random variable is a variable whose value is subject to variations due to chance. Notation: A random variable is usually denoted with a capital letter, for example X or Y. The individual values are denoted by respectively x and y. Probability The likelihood that a random variable X is equal to an individual value x. Notation: P( X = x) = P( x) = p( x). DISCRETE DISTRIBUTIONS DISCRETE RANDOM VARIABLES There are two types of random variables, discrete random variables and continuous random variables. In this section, we discuss discrete random variables. Discrete Random Variable A random variable X is said to be discrete, if X can take on only a finite number of values (or at most a countably infinite number). Probability Mass Function A probability mass function (abbreviated PMF) completely describes the probability properties of the random variable. It shows the probability that a random variable X is exactly equal to some deterministic value x. Cumulative Distribution Function The cumulative distribution function (abbreviated CDF) shows the probability that a random variable X takes a value less than or equal to a deterministic value x. Probability Mass Function A probability mass function (abbreviated PMF) completely describes the probability properties of the random variable. It shows the probability that a 2 random variable X is exactly equal to some deterministic value x. Cumulative Distribution Function The cumulative distribution function (abbreviated CDF) shows the probability that a random variable X takes a value less than or equal to a deterministic value x. Notation: F ( x) = P( X ≤ x), where -∞ < x < ∞. Discrete Uniform Distribution The discrete uniform distribution is a symmetric probability distribution whereby a finite number of values are equally likely to be observed; every one of n 1 values has equal probability. n EXAMPLES EXAMPLE 1: FAIR DIE - THE DISCRETE UNIFORM DISTRIBUTION The number of possible outcomes for throwing a fair die is obviously finite. Let X be the random variable representing the possible outcomes of throwing a fair die. _ Reason that X is a discrete random variable. _ Reason that X is uniformly distributed (the discrete case). Solution. _ The number of possible outcomes for X is clearly finite. Therefore X is a discrete random variable. _ All outcomes are equally likely. Therefore X is uniformly distributed. Since X is a discrete random variable, X follows the discrete uniform distribution. PMF AND CDF In the figures below, the probability mass function and the cumulative distribution function of X are shown. 3 EXAMPLE 2: SUM OF TWO DICE In this example, two fair dice are thrown simultaneously. The sum is taken of the outcomes of each individual die. Each sum has a different probability. Consider the case where the sum is equal to 7 and where the sum is equal to 2. From the lecture on independence, recall that there are more combinations possible where the sum is equal to 7 than where the sum is equal to 2 and therefore the former is more likely to occur than the latter. QUESTION Let us denote by X the sum of the outcomes of the two dice. The state space S in this example is equal to { 2, 3,..., 11, 12}. _ Reason that X is a discrete random variable. _ Reason that X is not uniformly distributed. Solution. _ The number of possible outcomes for X is clearly finite. Therefore X is a discrete random variable. _ All possible outcomes for the sum of two fair dice are not equally likely. Therefore X is not uniformly distributed. PMF AND CDF In the figures below, the probability mass function and the cumulative distribution function of X are shown. 4 EXAMPLE 3: NUMBER OF DEFECTIVE PIECES Assume there are 10 lamps in one box and they could either be new and working (with probability 0.7) or they could be old and defect (with probability 0.3). QUESTION _ Reason that X is a discrete random variable. _ Reason that X is not uniformly distributed. Solution. _ The number of possible outcomes for X is clearly finite. Therefore X is a discrete random variable. _ All possible outcomes are not equally likely. This can be seen for example in the probability mass function of X. PMF AND CDF In the figures below, the probability mass function and the cumulative distribution function of X are shown. In this particular case, X is said to be binomially distributed. The binomial distribution will be treated in further detail in lecture 5 of this course. 5 EXAMPLE 4: WAITING TIMES AT CALL CENTERS The priority of every call center should be handling calls from customers efficiently and minimizing waiting times. This is not a straightforward task, as call arrivals and service times of calls are random. The difficult part is to staff each call center with a sufficient amount of people. Too many people is costly and inefficient, but usefull at times when all of a sudden many calls arrive. Too few people is cost efficient, but leads to long waiting when many calls arrive at the same time. CALL ARRIVALS A call center can be naturally viewed as a queueing system. Here we look at the likelihood of a certain number of calls arriving per minute. We denote by X the number of calls that is arriving per minute. Note that X is a discrete random variable. We omit technical details here, but this type of process can be modelled with the Poisson distribution. PMF AND CDF In the figures below, the probability mass function and the cumulative distribution function of X are shown. 6 CONTINUOUS DISTRIBUTIONS CONTINUOUS RANDOM VARIABLES Continuous Random Variable A random variable X is said to be continuous, if the number of possible outcomes for X is (uncountably) infinite. Probability Density Function For a continuous random variable, the probability density function, usually denoted by f ( x), has the following properties: 1. f ( x) ≥ 0 2. ∫-∞ f ( x) ⅆ x = 1, where ∫ ∞ refers to the integral. IfX is a continuous random variable with a density function f , then for any a < b, the probability that X falls in the interval a, b is the area under the density function between a and b: Pa < X < b = ∫ab f ( x) ⅆ x. Cumulative Distribution Function The cumulative distribution function (abbreviated CDF) shows the probability that a random variable X takes a value less than or equal to a deterministic value x. It shows the area under the probability density function from -∞ to x. Notation: F ( x) = P( X ≤ x), where -∞ < x < ∞. a Cumulative Distribution Function 7 The cumulative distribution function (abbreviated CDF) shows the probability that a random variable X takes a value less than or equal to a deterministic value x. It shows the area under the probability density function from -∞ to x. Notation: F ( x) = P( X ≤ x), where -∞ < x < ∞. EXAMPLES Returns Returns show the relative percentage of increase or decrease of the price of a stock at time t in comparison to a price at time t - 1. The return is calculated based on the price at time t, Pt, and the price at time t - 1, Pt-1. A time step can be a second, a day, a month, etc. Pt- Pt-1 Notation: Rt = Pt - 1. Remark: For finance experts, we consider returns without dividens. Histogram A histogram is a graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous random variable. It shows the counts of the number of times the stock price had a certain increase or decrease. RETURNS Assume that the price of an Apple stock on 01-01-2015 is $100. We want to make a prediction for the Apple stock price in one year from now by predicting the daily returns. We make the following two assumptions: _ The returns are normally distributed _ There are 250 trading days in one year The 250-day forecast for the returns is visible in the figure below. We can clearly see that the returns ‘hover’ around zero. This makes intuitively sense, because (a series of) negative returns are usually followed by (a series of) positive returns. HISTOGRAM AND CDF We can visualize the count of daily stock return increases and decreases in a histogram. The histogram is shown in the top figure below together with a fitted normal distribution. The CDF for the daily return forecasts is shown in the bottom figure below. 8 We can visualize the count of daily stock return increases and decreases in a histogram. The histogram is shown in the top figure below together with a fitted normal distribution. The CDF for the daily return forecasts is shown in the bottom figure below. SUMMARY MEASURES INTRODUCTION Below the probability density function from the standard normal distribution is shown (i.e. the normal distribution with mean 0 and variance 1. This will be explained in chapter 6 of this lecture). Showing the PDF can be informative (as in the standard normal case), but usually probability distributions are difficult to grasp. In order to communicate the largest amount of information as simple as possible, we use summary measures. MEAN AND EXPECTED VALUE Measure of Location The arithmetic mean is normally used as a measure of location/central tendency. Notation: the arithmetic mean is usually denoted by μ. Expected Value The expected value E[ X ] of a discrete random variable X is the probability-weighted sum of all possible values. Notation: E[ X ] = x1 p( x1) + x 2 p( x 2) +... + xk p( xk), for k possible Measure of Location The arithmetic mean is normally used as a measure of location/central tendency. 9 Notation: the arithmetic mean is usually denoted by μ. Expected Value The expected value E[ X ] of a discrete random variable X is the probability-weighted sum of all possible values. Notation: E[ X ] = x1 p( x1) + x 2 p( x 2) +... + xk p( xk), for k possible values of X. EXAMPLES EXAMPLE 7: EXPECTED VALUE OF A FAIR DIE S of the fair die is equal to {1, 2, 3, 4, 5, 6} and that Recall that the state space 1 every possible outcome has an equal probability of. We denote by X the 6 random variable that represents the outcome of rolling the fair die. The table below shows P( X = x) for each possible outcome. x 1 2 3 4 5 6 1 1 1 1 1 1 P(X=x) 6 6 6 6 6 6 QUESTION What is the expected value of the fair die? Solution. Every outcome is equally likely and therefore we assign equal weight to each outcome. We simply multiply each outcome by its individual probability. We get: E[ X ] = 1 · 61 + 2 · 61 + 3 · 61 + 4 · 61 + 5 · 61 + 6 · 61 = 3.5. Interpretation: The expected value of an extremely large number of dice rolls will very likely almost be equal to 3.5. Note that the average of rolling the die is not an element of the state space. EXAMPLE 8: EXPECTED VALUE OF A LOADED DIE A loaded die has instead of the ‘1’ an extra ‘4’. Therefore a ‘4’ is twice as likely as each other individual outcome. We denote by X the random variable that represents the outcome of rolling the loaded die. The table below shows the P( X = x) for each possible outcome. x 2 3 4 5 6 1 1 1 1 1 P(X=x) 6 6 3 6 6 QUESTION 10 What is the expected value of the loaded die? Solution. Multiplying each outcome by its individual probability yields the expected value of X. We get: E[ X ] = 2 · 61 + 3 · 61 + 4 · 2 6 + 5 · 61 + 6 · 61 = 4. Interpretation: The expected value of an extremely large number of dice rolls with the loaded die will very likely (almost) be equal to 4. RANDOM VARIABLES AS MODELS INTRODUCTION A random variable is a type of statistical model. In business applications, a statistical model usually represents a simplified or idealized view of reality. EXAMPLES EXAMPLE 9: VANGUARD 500 INDEX FUND You are considering an investment in the oldest S&P 500 index fund. The “Vanguard 500 Index Fund” (ticker symbol: VFINX) was the first index fund for individual investors. This mutual fund invests in 500 of the largest U.S. companies, which span many different industries and account for about 75 % of the U.S. stock market’s value. In the top figure below, the daily price development is visible from 01-01-1980 to 01-01-2015. The time period 01-01-1977 to 12-31-1980 is not visible in the figure below. We are interested in the quarterly returns, which we can visualize in a histogram. The histogram of the quarterly returns from 01-01-1977 to 01-01-2015 is displayed in the bottom figure below. 11 QUESTION Denote the first quarterly return in 1977 by R1, the second quarterly return by R 2. Using similar notation for all subsequent returns up to the fourth quarterly return in 2014, R152. Express the historical average of the returns in terms of R1 ,..., R152. Solution. Using the notation introduced in the question, we can express the average R1+ R 2+...+ R152 quarterly return in terms of R1 ,..., R152 in the following way: 152. Calculating the average quarterly return for our data sample yields R1+ R 2+...+ R152 152 = 2.97 % ≈ 3 %. Based on a statistical model that assumes that all past outcomes are equally likely to occur, the forecast for the first quarter in 2015 would be 3 %. EXAMPLE 9 CONT’D: VANGUARD 500 INDEX FUND Based on the histogram of the quarterly return data of VFINX, we can construct the probability distribution by deriving a table for the returns with their corresponding probabilities. This table is shown below. Return -14.7491 -7.6566 -2.14 2.3877 7.1442 13.8112 p 0.0723 0.0592 0.171 0.2565 0.2763 0.1644 PMF AND CDF In the figure below, the probability mass function and the cumulative distribution function of X are visible. 12 EXAMPLE 9 CONT’D: VANGUARD 500 INDEX FUND Now, the economic forecast for the next quarter is very good. Therefore, we build a model by subjectively changing the probabilities (and rounding the conditional expectations). There are six scenarios with six probabilities. These probabilities with their corresponding returns are displayed in the table below. Return -15.0 -7.5 -2.5 2.5 8.0 14.0 p 0.02 0.05 0.13 0.30 0.30 0.20 Our new expected value of the next quarterly return of VFINX is equal to 4.95% based on the model where we subjectively changed the probabilities. PMF AND CDF In the figure below, the probability mass function and the cumulative distribution function of X are visible. SHORTCOMINGS OF EXPECTED VALUE AS A MEASURE We have a new expected value of 4.95% in our new model. This single number does not indicate anything about the risk of investing in VFINX. 13 We have a new expected value of 4.95% in our new model. This single number does not indicate anything about the risk of investing in VFINX. VARIATION INTRODUCTION We need a measure which captures the ‘deviation’ from the mean, i.e. the variation of VFINX. If all historical data points are very close to the mean, then the realized return of VFINX will likely not be far away from the expected value and therefore there is not much risk. If, on the other hand, many historical data points are ‘far away’ from the mean, one could argue that it is more likely that VFINX either greatly outperforms the expectation or that a huge negative return is realized. This could indicate that investing in VFINX is very risky. How could we measure this variation? SIMPLE DEVIATION Simple Deviation The simple deviation for a set of data values xi, i = 1,..., n, is equal to the sum of the differences of the data values and the mean μ. Notation: ∑in=1 ( xi - μ ). EXAMPLES EXAMPLE 10: SIMPLE DEVIATION OF A FAIR DIE Consider once more the fair die with state space S = {1, 2, 3, 4, 5, 6}. QUESTION Calculate the simple deviation for the fair die. Solution. The simple deviation in this example with n = 6 and μ = 3.5 is equal to ∑i6=1 ( xi - 3.5). The results for xi with i = 1,..., 6 are visible in the table below. x_i 1 2 3 4 5 6 x_i-μ -2.5 -1.5 -0.5 0.5 1.5 2.5 So, taking the sum ∑i6=1 ( xi - 3.5) = - 2.5 - 1.5 - 0.5 + 0.5 + 1.5 + 2.5 = 0. Therefore the simple deviation is equal to zero in this example. 14 SIMPLE DEVIATION AS A MEASURE OF VARIATION In example 10, the simple deviation was equal to zero. In fact, it will turn out that the simple deviation will always be equal to zero for any random variable X , as will be proved later. PROOF THAT SIMPLE DEVIATION IS EQUAL TO ZERO Let us assume that there are n observations for a random variable X. Using the definition of simple deviation, we get: ∑in=1 ( xi - μ ) = 1 ∑in=1 xi - n · μ = ∑in=1 xi - n · n ∑in=1 xi = ∑in=1 xi - ∑in=1 xi = 0. ABSOLUTE VALUE FUNCTION The kink at the origin in the absolute value function is undesirable, because therefore the function x is not differentiable at x = 0. VARIANCE The next easiest measure for variation is taking the squares of the differences of the data values and the mean and sum those values up. This leads to a measure of variation called variance. Variance: General Case The variance of a random variable X is the expected value of squared deviations fromμ. Notation: σ 2 = Var[ X ] = E( X - ( E[ X ])) 2 = E X 2 - ( E[ X ]) 2, where E[ X ] is the expected value of X and E X 2 is the second moment* of X. Variance: Discrete Case Only the notation changes for the variance. Notation: Var[ X ] = ∑in=1 ( xi - E[ X ]) 2 p( xi), where p( xi)is the probability of X being equal to xi. When all n values are equally likely, the expression simplifies to 1 Var[ X ] = n ∑in=1 ( xi - E[ X ]) 2. Variance: Discrete Case 15 Only the notation changes for the variance. Notation: Var[ X ] = ∑in=1 ( xi - E[ X ]) 2 p( xi), where p( xi)is the probability of X being equal to xi. When all n values are equally likely, the expression simplifies to 1 Var[ X ] = n ∑in=1 ( xi - E[ X ]) 2. Remark: the units of the variance are the squared units of the random variable. Therefore it is convenient to take the square root of the variance. This leads us to the standard deviation. Standard Deviation The standard deviation of a random variable X is the square-root of the variance of X. Notation: SD[ X ] = σ = Var[ X ]. REMARK ON SQUARE ROOTS OF SQUARED UNITS In general, the square root cancels out the square of a squared number. For example, 9 = 32 = 3. However, the square root of the variance does not cancel out the square. Illustration: ! 32 + 42 = 5 2 = 5 ≠ 7 = 3 + 4. VAR[ X ] ≠ SDEV [ X ] To show that the square-root of the variance is not equal to the simple deviation of X: Var[ X ] = ∑in=1 ( xi - μ ) 2 · p( xi) = ∑in=1 xi 2 + μ 2 - 2 · xi μ · p( xi) ! ≠ ∑in=1 ( xi - μ ) · p( xi) = SDev[ X ] EXAMPLES EXAMPLE 7 CONT’D: VARIANCE OF A FAIR DIE Consider once more the fair die with state space S = {1, 2, 3, 4, 5, 6}. QUESTION 16 Calculate the variance and standard deviation for the fair die. Solution. Recall from the definition that for a discrete random variable X with n = 6, Var[ X ] = ∑i6=1 ( xi - μ ) 2 p( xi) = ( x1 - μ ) 2 p( x1) + ( x 2 - μ ) 2 p( x 2) + ( x 3 - μ ) 2 p( x 3) + ( x4 - μ ) 2 p( x4) + ( x 5 - μ ) 2 p( x 5) + ( x6 - μ ) 2 p( x6). With mean μ = 3.5, we get: x_i p(x_i) (x_i-μ) (x_i-μ)^2 p(x_i)·(x_i-μ)^2 1 1 -2.5 6.25 1.04167 6 1 2 -1.5 2.25 0.375 6 1 3 -0.5 0.25 0.04167 6 1 4 0.5 0.25 0.04167 6 1 5 1.5 2.25 0.375 6 1 6 2.5 6.25 1.04167 6 Var[ X ] =. 1.04167 + 0.375 + 0.04167 + 0.04167 + 0.375 + 1.04167 ≈ 2.917 SD[ X ] = Var[ X ] = 2.917 ≈ 1.71. Hence, the variance is equal to approximately 2.917 and the standard deviation is equal to approximately 1.71. EXAMPLE 8 CONT’D: VARIANCE OF A LOADED DIE Consider once more the example of the loaded die, which has instead of the ‘1’ an extra ‘4’. Recall that it is therefore twice as likely that a ‘4’ comes up. We denote by X the random variable that represents the outcome of rolling the loaded die. QUESTION Calculate the variance and standard deviation for the loaded die. The probability on each outcome of the loaded die is given in the table below. x 2 3 4 5 6 1 1 1 1 1 P(X=x) 6 6 3 6 6 Solution. Recall from the definition that for a discrete random variable X with n = 5 we have, Var[ X ] = ∑i5=1 ( xi - μ ) 2 p( xi) = ( x1 - μ ) 2 p( x1) + ( x 2 - μ ) 2 p( x 2) + ( x 3 - μ ) 2 p( x 3) + ( x4 - μ ) 2 p( x4) + ( x 5 - μ ) 2 p( x 5). The mean μ is equal to 4. We get: x_i p(x_i) (x_i-E[X]) (x_i-E[X])^2 p (x_i)·(x_i-E[X])^2 2 1/6 -2 4 2/3 17 3 1/6 -1 1 1/6 4 2/6 0 0 0 5 1/6 1 1 1/6 6 1/6 2 4 2/3 5 Var[ X ] = ∑i6= 2 ( xi - μ ) 2 p( xi) = 3. 5 SD[ X ] = Var[ X ] = 3 ≈ 1.291. Hence, the variance is equal to approximately 1.667 and the standard deviation is equal to approximately 1.291. EXAMPLE 9 CONT’D: VFINX Recall the model we made that we used to predict the next quarterly returns of VFINX. The model consists of six different returns with six subjectively assigned probabilities, which are displayed in the table below. Calculate the variance of this statistical model. p 0.02 0.05 0.13 0.3 0.3 0.2 Return -15 -7.5 -2.5 2.5 8 14 Solution. For a discrete random variable X , the variance is given by: Var[ X ] = x1 - μ ) 2 p( x1) + ( x 2 - μ ) 2 p( x 2) +... + ( xn - μ ) 2 p( xn). In our case, n = 6, so we get: Var[ X ] = ( x1 - μ ) 2 p( x1) + ( x 2 - μ ) 2 p( x 2) + ( x 3 - μ ) 2 p( x 3) +... +. ( x6 - μ ) 2 p( x6) With μ = 4.95 % , this yields to the table below. x_i p(x_i) (x_i-μ) (x_i-μ)^2 p(x_i)·(x_i-μ)^2 -15 0.02 -19.95 398.0025 7.9600 -7.5 0.05 -12.45 155.0025 7.7501 -2.5 0.13 -7.45 55.0025 7.2153 2.5 0.3 -2.45 6.0025 1.8008 8 0.3 3.05 9.3025 2.7907 14 0.2 9.05 81.9025 16.3805 Var[ X ] =. 7.9600 + 7.7501 + 7.2153 + 1.8008 + 2.7907 + 16.3805 = 43.8975 SD[ X ] = Var[ X ] = 43.8975 = 6.6255. Hence, the variance is equal to approximately 43.90 and the standard deviation is equal to approximately 6.63. CALCULATION RULES 18 CALCULATION RULES INTRODUCTION We can add constants to random variables or multiple random variables by constants. This has an impact on the expected value, the variance and the standard deviation. The rules for these operations on random variables are given below. OPERATIONS ON RANDOM VARIABLES Adding a constant 1. E [ X ± a] = E [ X ] ± a 2. Var[ X ± a] = Var[ X ] 3. SD[ X ± a] = SD[ X ] Multiplying by a constant 1. E [c · X ] = c · E [ X ] 2. Var[c · X ] = c 2 · Var[ X ] 3. SD[c · X ] = c · SD[ X ], where c is the absolute value of c Linear Functions of X (follows from adding and multiplying a constant) 1. E [a + c · X ] = a + c · E [ X ] 2. Var[a + c · X ] = c 2 · Var[ X ] 3. SD[a + c · X ] = c · SD[ X ] EXAMPLES QUESTION Assume you throw a fair die once. Calculate: _ The expected value of X times 5, _ The variance of X times 5. Solution. Denote by X the random variable representing the outcome of throwing a fair die. _ The expected value of throwing a fair die once is equal to 3.5, i.e. E[ X ] = 3.5. We now need to calculate E[ 5 · X ]. We can use the rule for multiplying by a constant, i.e. E[c · X ] = c · E[ X ]. By using this rule, we get E[ 5 · X ] = 5 · E[ X ] = 5 · 3.5 = 17.5. The expected value of throwing a fair die once is equal to 3.5, i.e. 19 E[ X ] = 3.5. We now need to calculate E[ 5 · X ]. We can use the rule for multiplying by a constant, i.e. E[c · X ] = c · E[ X ]. By using this rule, we get E[ 5 · X ] = 5 · E[ X ] = 5 · 3.5 = 17.5. _ The variance of throwing a fair die once is equal to 2.92. We can use the rule for multiplying by a constant, i.e Var[c · X ] = c 2 · Var[ X ]. By using this rule, we get Var[ 5 · X ] = 25 · 2.92 = 72.92. APPLICATIONS SHARPE RATIO One of the most popular measures to compare investments with different means and standard deviations is the Sharpe Ratio. Sharpe Ratio The Sharpe ratio is the average return μ earned in excess of the risk-free rate r f per unit of volatility (equal to SD) σ or total risk. By subtracting the risk-free rate r f from the mean return μ , the performance associated with risk-taking activities can be isolated. The higher the Sharpe Ratio, the better the performance of the investment μX - r f Notation: S( X ) = σX , where X is the investment opportunity. Risk-Free Interest Rate The risk-free interest rate r f is the theoretical rate of return of an investment with no risk of financial loss. Assume you can invest in either Disney or Mc Donald’s stocks. The mean and standard deviation for both stock returns are given in the table below. Company Random Variable Mean SD Disney D 0.61% 8.3% Mc Donald's M 0.53% 7.6% QUESTION Suppose the risk-free rate is 0.4%. In which company should you invest when you compare their Sharpe Ratios? Solution. Let us denote by μ D and σ D the mean and standard deviation of Disney’s stock returnand _ and by μ M and σ M the mean and standard deviation of Mc Donald’s stock return. Calculating the Sharpe Ratios for respectively Disney and Mc Donald’s yields: 20 μ D -r f 0.61-0.40 _ S( D) = σD = 8.3 ≈ 0.0253. μ M -r f _ S( M ) = σM = 0.537.6 -0.40 ≈ 0.0171. According to the Sharpe Ratio, Disney should be preferred to McDonald’s. FINANCIAL DISTRESS AND AGENCY COST Assume company XYZ has a loan of $10 million that is due at the end of the year. Company XYZ is in financial distress, because the market value of its assets will at the end of the year only be $9 million. In that case, company XYZ has to default on its debt. Company XYZ considers a new strategy with no upfront investment. _ The probability of success of the new strategy is only 20%. Hence the probability of failure of the new strategy is 80%. _ If the new strategy is successfull, the value of the firm’s assets will increase to $15 million. _ If the new strategy is not successfull, the value of the firm’s assets will fall to $5 million. Old Strategy [mln] Success [mln] Failure [mln] Value of Assets 9 15 5 Debt 9 10 5 Equity 0 5 0 QUESTION Consider the table above. Calculate the expected value of the XYZ’s assets under the new strategy. Is it beneficial for the equity and/or bond holders if company XYZ executes this strategy? Solution. We denote by the random variable X the possible values of XYZ’s assets. We get: _ E[ X ] = 0.2 · $15mln.+ 0.8 · $5mln. = $7mln. Hence, under the new strategy, the expected value of XYZ’s assets will decrease from $9mln. to $7mln. _ If company XYZ does nothing, it will ultimately default and equity holders will get nothing with certainty. If the new strategy is implemented and succeeds, equity holders will get $5 million in total. The expected payoff under the new strategy is equal to 0.8 · $0mln. + 0.2 · $5mln. = $1mln. Therefore equity holders have nothing to lose from this strategy. _ If company XYZ does nothing, it will ultimately default and debt holders will get $9 million with certainty. If the new strategy is implemented and succeeds, debt holders will get $10 million in total. If the new strategy is implemented and fails, debt holders will get $5 million in total. The expected payoff under the new strategy is equal to 0.8 · $5mln.+ 0.2 · $10mln. = $6mln. Therefore debt holders have a lot to lose from this strategy. 21 If company XYZ does nothing, it will ultimately default and debt holders will get $9 million with certainty. If the new strategy is implemented and succeeds, debt holders will get $10 million in total. If the new strategy is implemented and fails, debt holders will get $5 million in total. The expected payoff under the new strategy is equal to 0.8 · $5mln.+ 0.2 · $10mln. = $6mln. Therefore debt holders have a lot to lose from this strategy. The results are summarized in the table below. Old Strategy [mln] Success [mln] Failure [mln] Expected [mln] Value of Assets 9 15 5 7 Debt 9 10 5 6 Equity 0 5 0 1 Effectively, the equity holders are gambling with the debt holder’s money. Shareholders have an incentive to invest in negative-NPV projects (where NPV stands for Net Present Value) that are risky, even though a negative-NPV project destroys value for the firm overall. TAKE AWAY RANDOM VARIABLE A random variable (usually denoted with a capital letter) is a variable whose value is subject to variations due to chance. INDIVIDUAL VALUE A value that is not subject to variations due to chance, denoted by lowercase letters x, y, z, etc. PROBABILITY The likelihood that a random variable is equal to an individual value. PROBABILITY MASS FUNCTION A probability mass function (abbreviated PMF) completely describes the probability properties of a discrete random variable. It shows the probability that a discrete random variable X is exactly equal to some value. PROBABILITY DENSITY FUNCTION A probability density function (abbreviated PDF) describes the relative likelihood for a continuous random variable X on a given value. PROBABILITY DISTRIBUTION A probability distribution is a statistical function that shows the possible values and likelihoods that a discrete or continuous random variable can take within a given range. _ There exist many different probability distributions. 22 CUMULATIVE DISTRIBUTION FUNCTION The cumulative distribution function (abbreviated CDF), shows the probability that a random variable X takes a value less than or equal to x. Notation: F ( x) = P( X ≤ x), where -∞ < x < ∞. HISTOGRAM A histogram is a graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous random variable. It shows for example the counts of the number of times the stock price had a certain increase or decrease. MEASURE OF LOCATION The arithmetic mean is normally used as a measure of location/central tendency. MEASURE OF STATISTICAL DISPERSION The standard deviation is normally used as a measure of statistical dispersion. EXPECTED VALUE The expected value of a discrete random variable X is the probability-weighted sum of all possible values. Notation: E[ X ] = x1 p( x1) + x 2 p( x 2) +... + xk p( xk), for k possible values, x1, x 2 ,... xk for X. VARIANCE: GENERAL CASE The variance of a random variable X is the expected value of its squared deviations from E [ X ]. Notation: σ 2 = Var[ X ] = E( X - E[ X ]) 2 = E X 2 - ( E[ X ]) 2 , where E[ X ] is the expected value of X and E X 2 is the second moment of X. Note that μ = E[ X ]. VARIANCE: DISCRETE CASE Only the notation changes for the variance. Var[ X ] = ∑in=1 ( xi - μ ) 2 p( xi), where p( xi) is the probability of X being equal to xi. When all n values are 1 equally likely to occur, the expression simplifies to Var[ X ] = ∑in=1 ( xi - μ ) 2, n where μ is the mean of X. STANDARD DEVIATION The standard deviation of a random variable X is the square-root of the variance of X. Notation: SD[ X ] = σ = VAR[ X ]. 23 The standard deviation of a random variable X is the square-root of the variance of X. Notation: SD[ X ] = σ = VAR[ X ]. ADDING A CONSTANT 1. E[ X ± a] = E[ X ] ± a. 2. Var[ X ± a] = Var[ X ]. 3. SD[ X ± a] = SD[ X ]. MULTIPLYING BY A CONSTANT 1. E [c · X ] = c · E [ X ]. 2. Var[c · X ] = c 2 · Var[ X ]. 3. SD[c · X ] = c · SD[ X ], where c is the absolute value of c. LINEAR FUNCTIONS OF X This follows from adding and multiplying a constant. 1. E[a + c · X ] = a + c · E[ X ]. 2. Var[a +c · X ] = c 2 · Var[ X ]. 3. SD[a + c · X ] = c · SD[ X ]. SHARPE RATIO The Sharpe ratio is the average return μ earned in excess of the risk-free rate r f per unit of volatility (equal to SD) σ or total risk. By subtracting the risk-free rate rf from the mean return μ , the performance associated with risk-taking activities can be isolated. The higher the Sharpe Ratio, the better the performance of the investment μX - r f Notation: S( X ) = σX , where X is the investment opportunity. BEST PRACTICES 1. Use random variables to represent uncertain outcomes. 2. Calculate the standard deviation by calculating the square-root of the variance. 3. Apply the formula for means and variances carefully, i.e. multiply each probability by the appropriate weight. PITFALLS 1. Do not mix up X with x. 2. Apply the square after you deducted the mean in the variance formula. 3. Understand that the mean or expectation of a discrete random variable could possibly never be attained. 24 BINOMIAL DISTRIBUTION INTRODUCTION FLORIDA ELEVATED ROADWAYS Florida Elevated Roadways INC. (“FERI”), is a construction company in Southern Florida that is specialized in building elevated roadways. FERI’s most recent construction project included building 218 pillars for an elevated expressway near Miami. The Florida Department of Transportation (FDOT) carefully evaluates each and every pillar whether it satisfies the proper FDOT specifications. Toshi D., a Florida construction engineer and proud Kellogg alumnus, tells us that FDOT classifies such pillars as a “failure” if they do not satisfy its pre-specified technical requirements. In Toshi’s experience, the probability that a pillar is classified as a failure in a project like the one in Miami is 0.5%. Toshi also informs us that we can safely assume that all pillars are constructed independently of each other. We are interested in the probability that for example ‘none’, ‘at most one’ or ‘at least two’ of the 218 pillars will fail. We are also interested in how many pillars we can expect to fail out of the total number of pillars and the standard deviation of the total number of failed pillars in this construction project. EXAMPLES EXAMPLE 1: FIVE PILLARS Before we analyze the case with 218 pillars, let us first assume that there are 5 pillars in total. The probability that a pillar is classified as a failure is still 0.5 % and we can still safely assume that all pillars are constructed independently from each other. QUESTION What is the probability that none of the five pillars is classified as a failure? Solution. We denote by ni the event that pillar i (with i = 1,..., 5), is not classified as a failure. We need to calculate the following probability: P(n1 ⋂ n 2 ⋂ n 3 ⋂ n4 ⋂ n 5). The probability that pillar i is classified as a failure is independent from the probability that pillar j (with j = 1,..., 5 and j ≠ i) is classified as a failure. Using this independence, we can rewrite this probability as a product of all the individual probabilities. We get: We denote by ni the event that pillar i (with i = 1,..., 5), is not classified as a 25 failure. We need to calculate the following probability: P(n1 ⋂ n 2 ⋂ n 3 ⋂ n4 ⋂ n 5). The probability that pillar i is classified as a failure is independent from the probability that pillar j (with j = 1,..., 5 and j ≠ i) is classified as a failure. Using this independence, we can rewrite this probability as a product of all the individual probabilities. We get: Independence P ( n1 ⋂ n 2 ⋂ n 3 ⋂ n 4 ⋂ n 5 ) =. It is given that P(ni) = 0.995 (for P(n1) · P(n 2) · P(n 3) · P(n4) · P(n 5) i = 1,..., 5). Therefore we get P(n1 ⋂ n 2 ⋂ n 3 ⋂ n4 ⋂ n 5) = 0.995 5 ≈ 0.975. Therefore, the probability that none of the five pillars is classified as a failure is approximately 97.5 %. QUESTION What is the probability that exactly the fourth pillar is classified as a failure? Solution. We denote by ni the event that pillar i (with i = 1, 2, 3, 5), is not classified as a failure. We denote by f4 the event that pillar 4 is classified as a failure. We need to calculate the following probability: Pn1 ⋂ n 2 ⋂ n 3 ⋂ f4 ⋂ n 5. Using the fact that the probability that pillar i is classified as a failure is independent from the probability that pillar j (with j ≠ i) is classified as a failure, we can rewrite this probability as a product of all the individual probabilities. We get: Independence Pn1 ⋂ n 2 ⋂ n 3 ⋂ f4 ⋂ n 5 =. Since P(ni) = 0.995 for P(n1)