RSMT 5004 Lesson 2 Probability Rules PDF
Document Details
Uploaded by UnboundGradient1686
Tags
Summary
This document is a set of notes on probability rules for a course called RSMT 5004 Lesson 2. It covers topics like introduction to probability, types of probability, probability rules, and calculations. There is no mention of an exam board, year, or questions in the provided text sample.
Full Transcript
Lesson 2 – Probability Rules Lesson Agenda Introduction to Probability o Basics of Probability o Types of Probability Probability Rules o Complements, Intersections, Unions, and Mutually Exclusive Events o Addition Rule, Conditional Probabili...
Lesson 2 – Probability Rules Lesson Agenda Introduction to Probability o Basics of Probability o Types of Probability Probability Rules o Complements, Intersections, Unions, and Mutually Exclusive Events o Addition Rule, Conditional Probability Rule, Multiplication Rule o Dependent vs. Independent Events Probability Calculations Using Contingency Tables Probability Calculations Using Probability Trees Course Learning Outcomes Covered Apply probability rules to calculate probabilities of simple and compound events. Introduction to Probability Probability is all around us in our everyday lives. You have likely used and interacted with probability even if you haven’t been aware of it. Let’s see what you think about probability so far. On your phone, tablet, or computer, open a web browser and go to menti.com. When I launch the Mentimeter activity, you will see a code on my screen. Enter this code to gain access to the Menti activity. This is not a quiz, so you will not need to specify a screen name. RSMT 5004 – Lesson 2 1 Basics of Probability The probability of an event is a measure of the likelihood of that event occurring. Probabilities are decimal values that take on values from 0 to 1. They can be converted to, and are often thought of as, percentages by multiplying by 100. By using a 0 to 100 scale in the Menti activity, you were effectively estimating the likelihoods of those events using percentages. The Probability Scale: A probability of 0 means that the event cannot occur – it is impossible. A probability of 1 means that the event is guaranteed to occur – it is certain. A probability of 1/2 or 0.5 means it is just as likely that the event will occur as it is that the event will not occur – it is equally likely to occur or not occur. An event with a probability between 0 and 0.5 is less likely to occur and more likely to not occur – it is an unlikely to occur event. An event with a probability between 0.5 and 1 is more likely to occur and less likely to not occur – it is a likely to occur event. A bit more specifically, probability is the long-term relative frequency of an event. A key point about probability is that it provides an expectation of how likely an event is to occur. It is possible that an event with a lower probability will still occur over an event with a higher probability. However, in the long-term, we would expect the more likely event to occur more frequently and the less likely event to occur less frequently. This is because the relative frequencies approach the theoretical probabilities of the respective events in the long-term. When working with probabilities, keep them as accurate as possible (i.e. retain as many decimal places of accuracy as possible). Ideally, carry all of the decimals throughout the calculation and avoid rounding throughout a calculation. Only your final answer should be rounded; typically to three or four decimal places. If you do opt to round during a calculation, a general rule of thumb is to keep at least two decimals places beyond what is required in your final answer (i.e. keep at least six decimal places of accuracy during a calculation if you’re rounding your final answer to four decimal places). Key Definitions: An experiment is a process that leads to the occurrence of a single outcome o e.g., rolling a single fair die An outcome is a particular result of an experiment o e.g., a result of 4 after rolling the die The sample space is the set of all possible outcomes of an experiment o e.g., the sample space for rolling a single die is {1, 2, 3, 4, 5, 6} RSMT 5004 – Lesson 2 2 An event is any collection of outcomes of an experiment o An event could be a single outcome or a combination of two or more outcomes o A simple event is an event consisting of a single outcome ▪ e.g., rolling a 4 on a fair die o A compound event is an event that consists of more than one outcome ▪ e.g., rolling an even number on a fair die ▪ e.g., rolling a number higher than 2 on a fair die Notation: P(A) denotes the "probability of event A occurring" We generally use capital letters to denote generic events, but can also use text to describe the event: P(rolling a 4) = 1/6 Types of Probability There are three general approaches to determining probabilities: 1. The Classical (or Theoretical) Approach In this approach, we assume that all outcomes in a sample space are equally likely to occur. If this is true, then 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑤𝑎𝑦𝑠 𝐴 𝑐𝑎𝑛 𝑜𝑐𝑐𝑢𝑟 𝑃(𝐴) = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 Example: rolling a 4 on a fair die There are six possible outcomes {1, 2, 3, 4, 5, 6} when rolling a fair die, each one equally likely. There is one outcome (rolling a 4) which satisfies the event and six possible outcomes. Thus, 1 𝑃(𝑟𝑜𝑙𝑙𝑖𝑛𝑔 𝑎 4) = 6 2. The Empirical Approach (or Relative Frequency Approach) In this approach, we conduct an experiment (or look at historical data) and count the number of times the event occurs. This count is then divided by the total number of trials to come up with the empirical probability. 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝐴 𝑖𝑠 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑃(𝐴) = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑒𝑝𝑖𝑡𝑖𝑡𝑖𝑜𝑛𝑠 𝑜𝑓 𝑡ℎ𝑒 𝑒𝑥𝑝𝑒𝑟𝑖𝑚𝑒𝑛𝑡 RSMT 5004 – Lesson 2 3 Example: batting averages in baseball In 2024, Bobby Witt of the Kansas City Royals had the highest batting average of any player (with more than 100 at bats) at 0.332. Batting averages are found by looking at a player’s total number of at bats and seeing what portion of those resulted in a hit. So, we are dividing the number of hits by the total number of at bats. Batting averages serve as an estimate for how likely it is that a player will get a hit. For example, we’d expect that Bobby Witt will get a hit 33.2% of the time or about 1/3 of the time. 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 ℎ𝑖𝑡𝑠 211 𝑃(𝐵𝑜𝑏𝑏𝑦 𝑊𝑖𝑡𝑡 𝑔𝑒𝑡𝑠 𝑎 ℎ𝑖𝑡) = = ≈ 0.332 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑡 𝑏𝑎𝑡𝑠 636 3. The Subjective (or Experiential) Approach In this approach, the probability of a particular event is estimated using one's relevant knowledge and experience. There is no formula for this approach as it is very subjective. Example: the likelihood the Toronto Maple Leafs win the Stanley Cup Oddsmakers working for gambling establishments are required to predict the likelihood of various sporting events such as the outcomes of games or even the outcome of a season – such as which NHL team will win the Stanley Cup. By the classical approach to probability, one would assume that each team is equally likely to win and thus the probability that the Maple Leafs would win the Stanley Cup is 1/32 (since there are 32 teams). As in any sport, this is not a safe assumption. Teams with better players, fewer injuries, better schedules, etc. will have an advantage and be more likely to win it all. Oddsmakers ultimately have to take all of this information and come up with a probability that each team will win the Stanley Cup. These probabilities are then used to calculate the odds (payouts for bets). As the 2019-2020 NHL season got underway, DraftKings had the Maple Leafs’ NHL Championship odds at +900 meaning that a $100 bet would win $1000 ($900 plus the $100 bet back). This implies a probability of 0.1 (= 100/1000) that the Leafs win the Stanley Cup. In reality, since gambling establishments want to make money and not just break even, their calculated probability is probably slightly higher than this implied probability. The approach used to calculate probabilities will depend on the setting. In our course we will be working with the classical approach. Above it was stated that probability is the long-term relative frequency of an event and that the relative frequencies approach the theoretical probabilities of the events in the long-term. This idea is more formally known as the law of large numbers, and it means that in the long-term we would expect the RSMT 5004 – Lesson 2 4 Empirical Approach to result in the same probability as the Classical Approach. To see the law of large numbers in effect, play around with the following simulators: Coin Flip Simulator – try it using a single coin Dice Roll Simulator – try it using a single die For a single coin flip, we would expect each event (heads or tails) to be equally likely to occur. Thus, over 10 flips, we would expect 5 heads and 5 tails. What actually happens? Are the outcomes equally distributed? What about over 50 flips? 100 flips? The same idea holds true for rolling a single die. Each of the six sides is equally likely to occur. Do the results match this expectation over 100 rolls? 1,000 rolls? 10,000 rolls? Probability Rules Let us continue with the idea of rolling dice. Recall that the sample space (visualized as the rectangle) is made up of all possible outcomes: {1, 2, 3, 4, 5, 6}. Often, we can visualize the probability of events in a Venn diagram. We will now consider two events: rolling an even number – visualized by the left circle rolling a number greater than 2 – visualized by the right circle Which results in the following Venn diagram. even # #>2 Complements The complement of an event consists of all outcomes not included in the original event. The word "not" is often associated with the complement. The complement of event A is denoted A ̅ (or A′) and consists of all outcomes in which A does not occur. If we consider the event of rolling an even number (shown in even # #>2 the darker blue circle on the left), then there are three values {2, 4, 6} which satisfy the event. The complement to rolling an even number would be all remaining outcomes inside the sample space (shown in light orange): this would be the odd numbers {1, 3, 5}. 3 3 𝑃(𝑒𝑣𝑒𝑛 #) = 6 = 0.5 ̅̅̅̅̅̅̅̅̅ 𝑃(𝑒𝑣𝑒𝑛 #) = 𝑃(𝑜𝑑𝑑 #) = 6 = 0.5 RSMT 5004 – Lesson 2 5 If we consider the event of rolling a number greater than 2 even # #>2 (shown in the darker blue circle on the right), then there are four values {3, 4, 5, 6} which satisfy the event. The complement to rolling a number greater than 2 would be all remaining outcomes inside the sample space (shown in light orange): this would be the numbers less than or equal to 2 {1, 2}. 4 𝑃(# > 2) = ≈ 0.6667 6 2 𝑃(#̅̅̅̅̅̅̅̅ > 2) = 𝑃(# ≤ 2) = ≈ 0.3333 6 For complementary events A and A ̅, an outcome must be in either A or A̅. These two complementary ̅) = 1. Conversely, events cover the entire sample space and thus P(A) + P(A ̅) = 1- P(A), and P(A ̅) P(A) = 1 - P(A ̅) is This fact can be useful in cases where calculating P(A) is quite cumbersome to calculate and P(A much more easily calculated. Repeating the same calculations from above, 3 ̅̅̅̅̅̅̅̅̅ 𝑃(𝑜𝑑𝑑 #) = 𝑃(𝑒𝑣𝑒𝑛 #) = 1 − = 1 − 0.5 = 0.5 6 4 ̅̅̅̅̅̅̅̅ 𝑃(# ≤ 2) = 𝑃(# > 2) = 1 − ≈ 1 − 0.6667 = 0.3333 6 Intersections The intersection of two events is the set of outcomes that satisfies both events at the same time. The word "and" is often associated with the intersection. It is denoted P(A ∩ B) = P(A and B) = P(both A and B occur) Visually, this can be seen as the intersection or overlap between the two events (circles) and it is comprised of the outcomes that satisfy both event A and event B. Recall the event of rolling an even number contained the even # #>2 outcomes {2, 4, 6} while the event of rolling a number greater than 2 contained the outcomes {3, 4, 5, 6}. The intersection of these two events would be the outcomes that appear in both events: {4, 6}. 2 𝑃(𝑒𝑣𝑒𝑛 # ∩ # > 2) = ≈ 0.3333 6 RSMT 5004 – Lesson 2 6 Unions The union of two events is the set of outcomes that satisfies either event. The word "or" is often associated with the union. It is denoted P(A ∪ B) = P(A or B) = P(event A occurs, event B occurs, or both A and B occur) Visually, this can be seen as the area covered by any of the two events (circles), so it is the combined area of the two circles. Recall the event of rolling an even number contained the even # #>2 outcomes {2, 4, 6} while the event of rolling a number greater than 2 contained the outcomes {3, 4, 5, 6}. The union of these two events would be the outcomes that appear in either of the two events: {2, 3, 4, 5, 6}. 5 𝑃(𝑒𝑣𝑒𝑛 # ∪ # > 2) = ≈ 0.8333 6 Note: The default “or” is inclusive meaning we would include the outcomes satisfying event A, event B, or both. Additional language would be used for the “exclusive or” which excludes the outcomes appearing in both events. An example of this would be “what is the probability that event A or event B occurs, but not both?” The Addition Rule Intuitively, when calculating the probability of P(A ∪ B), we add the number of ways A can occur and the number of ways B can occur. 3 4 7 P(even # ∪ # > 2) = P(even #) + P(# > 2) = + = 6 6 6 This, however, does not match the 5/6 found above and also violates a key property of probability: probabilities can never be greater than 1 (or 100%). What happened here, is that we have double counted the outcomes that appear in both events (4 and 6). So, we must perform this calculation in a way that only counts each outcome once. This is done by removing the outcomes that would be double counted. Thus, the formal addition rule states: P(A ∪ B) = P(A) + P(B) − P(A ∩ B) or P(A or B) = P(A) + P(B) − P(A and B) Based on the above, 3 4 2 5 P(even # ∪ # > 2) = P(even #) + P(# > 2) − P(𝑒𝑣𝑒𝑛 # ∩ # > 2) = + − = 6 6 6 6 RSMT 5004 – Lesson 2 7 Mutual Exclusivity Not all events will have overlap. Two events are considered to be mutually exclusive (or disjoint) if they cannot occur at the same time. When two events are mutually exclusive, P(A and B) = 0. Let us consider two new events: rolling a number greater than 4 rolling a number less than 4 The event of rolling a number greater than 4 consists of the #>4 #2 can think that our sample space has now shrunk to only include the outcomes greater than 2: {3, 4, 5, 6}. All other outcomes have been ruled out. Within this revised ‘sample space’, we now ask the question, what is the probability that an even number was rolled? Well, this would be the space that was originally the intersection of the two events: {4, 6}. 2 P(𝑒𝑣𝑒𝑛 # | # > 2) = = 0.5 4 Using the formula presented above, we would have a slightly different process lead to the same result. P(𝑒𝑣𝑒𝑛 # ∩ # > 2) 2/6 0.3333... P(𝑒𝑣𝑒𝑛 # | # > 2) = = = = 0.5 P(# > 2) 4/6 0.6666... The Multiplication Rule While the Addition Rule is used to calculate the probability of the union of two events, the Multiplication Rule is used to calculate the intersection of two events – that is when event A and event B both need to occur. These events can happen simultaneously (as seen above) or sequentially. Intuitively, when calculating the probability of an intersection, we may think to multiply the probability of event A occurring by the probability of event B occurring – especially knowing that we are talking about the Multiplication Rule. However, we must be careful to account for dependencies between probabilities. Accounting for these dependencies, the formal multiplication rule states: P(A ∩ B) = P(A) ∙ P(B|A) or P(A and B) = P(A) ∙ P(B given A) Note: this is actually a rearranged form of the conditional probability rule. In many instances, the probability of the intersection can be calculated directly with the information given, but in certain circumstances the multiplication rule will apply. RSMT 5004 – Lesson 2 9 Dependent vs. Independent Events The reason we need to write P(B|A) and can't just write P(B) is that event A's occurrence may affect the likelihood of event B occurring. Think about drawing cards from a standard deck of playing cards. The first card drawn will impact the probabilities of all future draws: there is now one less card available within the deck. Such events are called dependent events. However, when we consider the act of rolling a die, one roll does not affect the probabilities of future rolls: there are still six equally likely sides on the die. These events are called independent events. More formally, two events are considered to be independent if the occurrence of one does not affect the probability of occurrence for the other. For independent events, P(B|A) = P(B) Thus, for independent events, P(A and B) = P(A) ∙ P(B|A) = P(A) ∙ P(B). Probability Calculations Using Contingency Tables Instead of using a Venn diagram, we can portray data in a contingency table. This method of portraying data makes calculating probabilities easier as the data is broken down into very clear segments based on two variables. To demonstrate the use of contingency tables, let’s consider the following example: Upon visiting their website, a clothing retailer prompts customers to fill out a short survey including information about their age. They track the customer through their transaction to determine whether they ultimately make a purchase or not. The retailer has summarized the first 500 customer responses in the following contingency table. Made a Purchase Age Group Did Not Make a Purchase Total 25 or younger 17 68 85 Between 25 and 50 106 190 296 50 or older 62 57 119 Total 185 315 500 Now, let’s see how probability calculations are handled with a contingency table as opposed to Venn diagrams. RSMT 5004 – Lesson 2 10 What is the probability that a randomly selected customer made an online purchase when visiting the retailer’s website? This question focuses on a single section within the column variable. As a result, we can simply use the column total for Made a Purchase. Age Group Made a Purchase Did Not Make a Purchase Total 25 or younger 17 68 85 Between 25 and 50 106 190 296 50 or older 62 57 119 Total 185 315 500 From the table, we can see that 185 total respondents made a purchase – this is the sum of all purchasers from each age group (17 + 106 + 62). This is out of 500 total respondents. Thus, 185 P(purchase) = = 0.3700 500 What is the probability that a randomly selected respondent is between 25 and 50 years old? This question focuses on a single section within the row variable. As a result, we can simply use the row total for Between 25 and 50. Age Group Made a Purchase Did Not Make a Purchase Total 25 or younger 17 68 85 Between 25 and 50 106 190 296 50 or older 62 57 119 Total 185 315 500 From the table, we can see that 296 total respondents were aged between 25 and 50 – this is the sum of those who made purchases and did not make purchases (106 + 190). Like before, this is out of 500 total respondents. Thus, 296 P(between 25 and 50) = = 0.5920 500 RSMT 5004 – Lesson 2 11 What is the probability that a randomly selected respondent is older than 25? This question focuses on multiple sections within the row variable. As a result, we will need to add up the row totals that satisfy this event: Between 25 and 50 and 50 or older. We are effectively using the Addition Rule, but the rows (not counting the total row) are mutually exclusive to each other in a contingency table. While it does not come up in this question, the columns are also mutually exclusive to each other in a contingency table. Age Group Made a Purchase Did Not Make a Purchase Total 25 or younger 17 68 85 Between 25 and 50 106 190 296 50 or older 62 57 119 Total 185 315 500 From the table, we can see that 296 total respondents were aged between 25 and 50 and 119 total respondents were 50 or older. Since our focus is on those older than 25, we will add these two. Like before, this is out of 500 total respondents. Thus, 296 + 119 415 P(older than 25) = = = 0.8300 500 500 What is the probability that a randomly selected respondent is 50 or older and made a purchase? This question focuses on a single cell within the contingency table where a specific row and column overlap – this is the intersection of 50 or older and Made a Purchase. As a result, we can simply use the single cell value where these two sections overlap. Age Group Made a Purchase Did Not Make a Purchase Total 25 or younger 17 68 85 Between 25 and 50 106 190 296 50 or older 62 57 119 Total 185 315 500 From the table, we can see that 62 respondents were both 50 or older and made a purchase. As an intersection, this is still out of the 500 total respondents. Thus, 62 P(50 or older ∩ purchase) = = 0.1240 500 RSMT 5004 – Lesson 2 12 What is the probability that a randomly selected respondent is older than 25 and made a purchase? This question combines the ideas from the last two questions: an intersection using multiple rows. Here, we will need to add together two of the inner cells: the intersection of Between 25 and 50 and Made a Purchase and the intersection of 50 or older and Made a Purchase. Age Group Made a Purchase Did Not Make a Purchase Total 25 or younger 17 68 85 Between 25 and 50 106 190 296 50 or older 62 57 119 Total 185 315 500 From the table, we can see that 106 respondents were aged between 25 and 50 and made a purchase and an additional 62 were aged 50 or older and made a purchase. Like before, we will add these two and the total will still be the 500 total respondents. Thus, 106 + 62 168 P(older than 25) = = = 0.3360 500 500 Given that a randomly selected respondent made a purchase, what is the probability they are 25 or younger? The phrase “given” indicates we are dealing with conditional probability. We are told the respondent made a purchase; so, we focus on the respondents in the Made a Purchase group. Within this group, we are interested in knowing the probability they are 25 or younger. This leads us to the group of respondents 25 or younger who made a purchase. Age Group Made a Purchase Did Not Make a Purchase Total 25 or younger 17 68 85 Between 25 and 50 106 190 296 50 or older 62 57 119 Total 185 315 500 From the table, we can see that out of 185 respondents who made a purchase, 17 of them were 25 or younger. Thus, 17 P(25 or younger | purchase) = = 0.09189189 … ≈ 0.0919 185 Technically, the conditional probability formula uses probabilities in both the numerator and denominator. However, both the intersection probability (numerator) and the marginal probability (denominator) would be over the same total of 500. These two 500s effectively cancel each other out. RSMT 5004 – Lesson 2 13 P(25 or younger ∩ purchase) 17/500 0.0340 P(25 or younger | purchase) = = = ≈ 0.0919 P(purchase) 185/500 0.3700 If a randomly selected respondent was 50 or older, what is the probability that they did not make a purchase? The phrase “if” is another way to indicate conditional probability. We are told to assume the respondent was 50 or older; so, we focus on those respondents in the 50 or older group. Within this group, we are interested in knowing the probability they did not make a purchase. This leads us to the group of respondents who are 50 or older and Did Not Make a Purchase. Age Group Made a Purchase Did Not Make a Purchase Total 25 or younger 17 68 85 Between 25 and 50 106 190 296 50 or older 62 57 119 Total 185 315 500 From the table, we can see that out of 119 respondents aged 50 or older, 57 of them did not make a purchase. Thus, 57 P(no purchase | 50 or older) = = 0.478991 … ≈ 0.4790 119 Like in the previous question, the conditional probability formula could be used more accurately to use probabilities in both the numerator and denominator. We would still arrive at the same answer. P(no purchase ∩ 50 or older) 57/500 0.1140 P(no purchase | 50 or older) = = = ≈ 0.4790 P(50 or older) 119/500 0.2380 If two randomly selected respondents are selected, what is the probability that they both made purchases? Despite the use of the phrase “if” again here, this is not conditional probability. In this case, the initial phrase tells us that two respondents are being selected and we want to know how likely it is that both made purchases. Another way we can think of this question is “what is the probability that one randomly selected respondent made a purchase and then a second randomly selected respondent also made a purchase?” Thinking of the question phrased in this way using “and then” may make it more apparent that the Multiplication Rule is required here: P(A ∩ B) = P(A) ∙ P(B|A) In this question, we could more specifically write P(1𝑠𝑡 purchase ∩ 2𝑛𝑑 purchase) = P(1𝑠𝑡 purchase) ∙ P(2𝑛𝑑 purchase | 1𝑠𝑡 purchase) RSMT 5004 – Lesson 2 14 Like we did in the first question, we are focusing on the number of respondents in the Made a Purchase group (regardless of age). However, we must also figure out how to calculate the conditional probability. We do not actually need to use the typical conditional probability formula here, but rather ask “how does the probability change when selecting a second respondent who made a purchase if we have already selected one respondent who made a purchase?” Age Group Made a Purchase Did Not Make a Purchase Total 25 or younger 17 68 85 Between 25 and 50 106 190 296 50 or older 62 57 119 Total 185 315 500 From the table, we can see that 185 total respondents made a purchase out of the 500 total respondents. So, the first piece can be filled in right away. 185 P(1𝑠𝑡 purchase ∩ 2𝑛𝑑 purchase) = ∙ P(2𝑛𝑑 purchase | 1𝑠𝑡 purchase) 500 So now we must answer the question posed above: how does the probability change for the second selection? The answer relates to a larger topic of sampling with or without replacement (coming up next). In this case, we need to address the fact that there is now one less respondent who made a purchase (185 – 1 = 184) and, thus, one less total respondent that could be selected (500 – 1 = 499). 185 184 185 ∙ 184 34 040 P(1𝑠𝑡 purchase ∩ 2𝑛𝑑 purchase) = ∙ = = = 0.13643286... ≈ 0.1364 500 499 500 ∙ 499 249 500 Sampling Without Replacement vs. Sampling With Replacement Within the multiplication rule, P(B|A) is used since event A's occurrence may affect the likelihood of event B occurring. If the two events are independent, then P(B|A) = P(B) and the multiplication rule simplifies to P(A and B) = P(A) ∙ P(B). When we are working with sample data, the idea of dependence vs. independence is extremely important as sampling can be done with replacement or without replacement. When drawing cards from a deck of cards, we typically do not replace each card after it is drawn. This is the idea of sampling without replacement. Like drawing cards from a deck of cards, sampling without replacement results in dependent events. In a single 'trial', we draw one item, observe it, then set it aside. The 'pool' we’re drawing from now contains one less possibility which affects the probability of events in future draws (trials). This is the default assumption when working with sample data. On the other hand, if we were to draw a card, observe it, then shuffle it back into the deck, we would have sampling with replacement. This type of sampling results in independent events. In a single 'trial', we draw one item from the sample space, observe it, then return it to the 'pool'. The subsequent 'trial' has the exact same environment as before –all the same outcomes with all the same probabilities as before. Thus, probabilities are not affected by past events when sampling with replacement. This is the default assumption when working with a population (where counts are often not specified). RSMT 5004 – Lesson 2 15 Example: If we solved the previous problem assuming replacement, we would get a slightly different answer: 185 𝟏𝟖𝟓 185 ∙ 185 34 225 P(1𝑠𝑡 purchase ∩ 2𝑛𝑑 purchase) = ∙ = = = 0.1369 500 𝟓𝟎𝟎 500 ∙ 500 250 000 The difference seems small, but since the overall goal of statistics is to use sample information to make inferences about a larger population, a small difference can have substantial effects when applied to a large population. When sampling a relatively small amount from a large population, the difference between sampling with replacement and sampling without replacement is very small – the change in probabilities is little enough to be ignored. So, even though selections are probably made without replacement (dependent) we can treat them as if they were with replacement (independent). Example: According to Stats Canada, the population of Toronto in 2021 was 2,794,356 and 1,020,980 of those 2,794,356 Torontonians were married. If we randomly select five individuals living in Toronto, what is the probability all five will be married? Without Replacement: 1,020,980 1,020,979 1,020,978 1,020,977 1,020,976 = ∙ ∙ ∙ ∙ 2,794,356 2,794,355 2,794,354 2,794,353 2,794,352 = 0.36537220 ∙ 0.36537197 ∙ 0.36537175 ∙ 0.36537152 ∙ 0.36537129 = 0.00651141 ≈ 0.0065 With Replacement: 1,020,980 1,020,980 1,020,980 1,020,980 1,020,980 = ∙ ∙ ∙ ∙ 2,794,356 2,794,356 2,794,356 2,794,356 2,794,356 = 0.36537220 ∙ 0.36537220 ∙ 0.36537220 ∙ 0.36537220 ∙ 0.36537220 = 0.00651145 ≈ 0.0065 It is not until the 8th decimal place that we see a difference between the two calculations. This is why we can assume sampling with replacement when working with populations. RSMT 5004 – Lesson 2 16 Probability Calculations Using Probability Trees A probability tree is another method of portraying probabilities (or frequencies) to ultimately make probability calculations easier to visualize and solve. In a probability tree, events are laid out sequentially. The order of events is especially important when events are dependent upon one another. When the events are independent, the order becomes less important. We will start with an independent case: flipping a coin three times. Each result of each flip will either be Heads or Tails: 2 possible outcomes. The second flip also has 2 possible outcomes, and same with the third flip. So, there should be 8 (= 2 × 2 × 2) possible outcomes when flipping a coin three times. Can you list them all? (The answer can be seen in the tree on the next page.) This task becomes easier if you have some sort of method or structure to the process. A probability tree provides that structure. In a probability tree, each event is laid out in sequence and ‘resolved’ one at a time. In the case of dependent events, future outcomes may change or be affected by the preceding events – usually it is the probabilities that are affected, but the possible events could change as well. A probability is associated with each individual outcome. As the tree progresses, these probabilities would be conditional upon the preceding outcomes. The next page contains a probability tree for flipping a coin three times. Note that the subsequent flips are independent of the preceding flips. Every flip of a coin has a 50-50 (i.e. ½ or 0.5 probability) chance of being heads or tails. RSMT 5004 – Lesson 2 17 1 1 1 1 P(HHH) = ∙ ∙ = 2 2 2 8 = 0.125 1 1 1 1 P(HHT) = ∙ ∙ = 2 2 2 8 = 0.125 1 1 1 1 P(HTH) = ∙ ∙ = 2 2 2 8 = 0.125 1 1 1 1 P(HTT) = ∙ ∙ = 2 2 2 8 = 0.125 1 1 1 1 P(THH) = ∙ ∙ = 2 2 2 8 = 0.125 1 1 1 1 P(THT) = ∙ ∙ = 2 2 2 8 = 0.125 1 1 1 1 P(TTH) = ∙ ∙ = 2 2 2 8 = 0.125 1 1 1 1 P(TTT) = ∙ ∙ = 2 2 2 8 = 0.125 Each of the terminal nodes represents the intersection of the outcomes along the path. For example, P(HTH) is really P(H ∩ T ∩ H) or even more specifically P(1𝑠𝑡 H ∩ 2𝑛𝑑 T ∩ 3𝑟𝑑 H). The joint probabilities are calculated at the end of each path by multiplying the probabilities along the path. The probability tree can then be used to assist with various probability calculations. RSMT 5004 – Lesson 2 18 What is the probability of flipping exactly three Heads? Looking at the tree, we can see there is only one outcome in which all three flips resulted in Heads. The probability of this single outcome is 1 P(3 H) = 8 What is the probability of flipping exactly one Heads? Looking at the tree, we can see there are three outcomes in which the three flips only result in a single Heads being flipped. Thus, the probability of this is 1 1 1 3 P(1 H) = + + = = 0.375 8 8 8 8 What is the probability of flipping at least one Heads? Looking at the tree, we can see that seven outcomes have at least one Heads being flipped. Thus, the probability of this is 1 1 1 1 1 1 1 7 P(≥ 1 H) = + + + + + + = = 0.875 8 8 8 8 8 8 8 8 With this question, we could also have considered using the complement. The opposite of “at least one Heads” is “no Heads”. There is only one outcome in which no Heads are flipped. This could be used as follows 1 7 P(≥ 1 H) = 1 − P(0 H) = 1 − = = 0.875 8 8 Given that the first flip was Heads, what is the probability of flipping at least two Heads? For this, we can apply the conditional probability formula. Our entire focus is on the top half where the first flip was Heads. Here, 3 of the outcomes have a first flip of Heads and at least 2 total Heads and that is out of 4 total outcomes where the first flip was Heads. The numerator and denominator, when expressed as a probability, would each be over 8, the total number of outcomes. P(≥ 2 H ∩ 1𝑠𝑡 𝐻) 3/8 3 P(≥ 2 H | 1𝑠𝑡 𝐻) = = = = 0.750 P(1𝑠𝑡 𝐻) 4/8 4 Now let’s look at a dependent case: Imagine that approximately 35% of adult smokers aged 65 and older have lung cancer. A particular cancer screening method will correctly provide a positive result (indicate the presence of cancer) 96% of the time in patients with cancer and will correctly provide a negative result (indicate an absence of cancer) 92% of the time in patients without cancer. RSMT 5004 – Lesson 2 19 We will begin by drawing the structure of the tree. From above we know that the test result (positive or negative) depends on cancer status. So, whether or not an individual has cancer should be the first event, and the test result will be the second event. Then we can add in the probabilities given in the scenario above. P(cancer) = 0.35 P(+| cancer) = 0.96 P(−| no cancer) = 0.92 All the branches out of a single node should add up to one (1.00). We can use this fact to fill in the missing probabilities: P(no cancer) = 1 − P (cancer) = 1 − 0.35 = 0.65 P(−| cancer) = 1 − P(+| cancer) = 1 − 0.96 = 0.04 P(+| no cancer) = 1 − P(−| no cancer) = 1 − 0.92 = 0.08 RSMT 5004 – Lesson 2 20 Now that the tree is filled in, we would calculate the joint probability at each terminal node. From here, we could answer various probability questions. What percent of smokers 65 or over would test positive for cancer? What is the likelihood that a smoker who tested positive for cancer does not actually have cancer? What is the likelihood that a smoker who tested negative for cancer does not actually have cancer? Notice that using a probability tree has allowed us to flip the order of the condition. Note: There is a formal rule called Bayes’ Theorem that leads to this result, but the formula looks more complicated than the process we followed. You can look up Bayes’ Theorem separately if you’re interested. It will not be tested specifically. RSMT 5004 – Lesson 2 21