Understanding Randomness PDF
Document Details
Uploaded by PalatialSakura
Tags
Summary
This chapter explores the concept of randomness. It discusses different methods of generating randomness, including pseudorandom numbers and random number generators. It explains why randomness is important in statistics.
Full Transcript
10 Understanding...
10 Understanding Randomness W e all know what it means for something to be random. Or do we1? Many chil- dren’s games rely on chance outcomes. Rolling dice, spinning spinners, and shuffling cards all select at random. Adult games use randomness as well, from card games to lotteries to Bingo. What’s the most important aspect of the random- ness in these games? It must be fair. The most decisive What is it about random selection that makes it seem fair? It’s really two things. conceptual event of twentieth First, nobody can guess the outcome before it happens. Second, when we want things to century physics has been the be fair, usually some underlying set of outcomes will be equally likely (although in many discovery that the world games, some combinations of outcomes are more likely than others). is not deterministic.... Randomness is not always what we might think of as “at random.” Random outcomes A space was cleared for have a lot of structure, especially when viewed in the long run. You can’t predict how a fair coin will land on any single toss, but you’re pretty confident that if you flipped it thou- chance. —Ian Hocking, sands of times you’d see about 50% heads. As we will see, randomness is an essential tool The Taming of Chance of Statistics. Statisticians don’t think of randomness as the annoying tendency of things to be unpredictable or haphazard. Statisticians use randomness as a tool. In fact, without deliberately applying randomness, we couldn’t do most of Statistics, and this book would stop right about here.2 But truly random values are surprisingly hard to get. Just to see how fair humans are at selecting, pick a number at random from the top of the next page. Go ahead. Turn the page, look at the numbers quickly, and pick a number at random. Ready? Go. 1 Don’t say “random” when you mean “unexpected”, as in “I got a random call from Jose last night.”? Really? Your friend Jose was dialing random numbers and just happened to phone you?” Ri-i-i-i-ight! 2 Don’t get your hopes up. 265 266 PART III Gathering Data 1 2 3 4 It’s Not Easy Being Random Did you pick 3? If so, you’ve got company. Almost 75% of all people pick the number 3. About 20% pick either 2 or 4. If you picked 1, well, consider yourself a little different. Only about 5% choose 1. Psychologists have proposed reasons for this phenomenon, but for us, it simply serves as a lesson that we’ve got to find a better way to choose things at random. So how should we generate random numbers? It’s surprisingly difficult to get ran- dom values even when they’re equally likely. Computers, calculators, and today smart- The generation of phones have become a way to generate random numbers. Even though they often do much random numbers is too better than humans, they can’t generate truly random numbers either. Start a program or important to be left to app from the same place, and it will always follow exactly the same path, so such numbers chance. generated are not truly random. Technically, “random” numbers generated this way are —Robert R. Coveyou, pseudorandom numbers. Fortunately, pseudorandom values are virtually indistinguishable Oak Ridge National Laboratory from truly random numbers, and that’s usually good enough. There are ways to generate random numbers so that they are both equally likely and truly random. There are published tables of carefully generated random numbers.3 Or, we can find genuinely random digits on the Internet. The sites use methods like timing the decay of a radioactive element to generate truly random digits.4 A string of random digits might look like this: 2217726304387410092537086270581997622725849795907032825001108963 3217535822643800292254644943760642389043766557204107354186024508 8906427308645681412198226653885873285801699027843110380420067664 8740522639824530519902027044464984322000946238678577902639002954 8887003319933147508331265192321413908608674496383528968974910533 6944182713168919406022181281304751019321546303870481407676636740 6070204916508913632855351361361043794293428486909462881431793360 7706356513310563210508993624272872250535395513645991015328128202 An ordinary deck of playing cards, The best ways we know to generate data that give a fair and accurate picture of the like the ones used in bridge and world rely on randomness, and the ways in which we draw conclusions from those data many other card games, consists depend on the randomness, too. If this sounds familiar to you, it should. In previous chap- of 52 cards. There are numbered ters we’ve shared with you some simulations we’ve done that use randomness to consider cards (2 through 10), and face cards things like “accidental correlation.” In this unit we’re going to let you take the reins and (Jack, Queen, King, Ace) whose value depends on the game you are start designing and conducting simulations yourself. playing. Each card is also marked by 3 one of four suits (clubs, diamonds, Believe it or not, in the mid-1900s scientists often had entire books on their shelves that contained nothing but hearts, or spades) whose signifi- tables of random digits! You’ll find a single table of random digits like that in the back of this book. 4 cance is also game-specific. For example, www.random.org or www.randomnumbers.info. CHAPTER 10 Understanding Randomness 267 THE ACHILLES’ HEEL OF PSEUDORANDOMNESS When people play poker using playing cards, the very visible act of shuffling the deck assures all players that the cards dealt will be unpredictable. But poker machines, found in many U.S. casinos, typically shuffle the virtual deck of cards used in the game using pseudorandom numbers. These not-truly-random numbers seem unpredictable, but in fact are generated using sophisticated hidden computer code; if you know the code, you can predict all the numbers. In 2014, a group of Russian hackers figured out this hidden code for a collection of poker machines in the United States. Four of them were soon earning about $250,000 a week between them by placing one-cent bets when they knew they would lose and $100 bets when they knew they would win. Interestingly, it was statistics that helped the casinos— and later the cops—detect this behavior! Some of those involved went to jail.5 Let’s Simulate! B Suppose a cereal manufacturer puts pictures of famous athletes on cards in boxes of cereal A S in the hope of boosting sales. The manufacturer announces that 20% of the boxes contain a K E picture of basketball star LeBron James, 30% a picture of race car driver Danica Patrick, T and the rest a picture of golf champion Dustin Johnson. You want all three pictures. How B N A L A many boxes of cereal do you expect to have to buy in order to get the complete set? L S How can we answer questions like this? Well, one way is to buy hundreds of boxes of LeBron James C cereal to see what might happen. But let’s not. Instead, we’ll consider using a random A model. Why random? When we pick a box of cereal off the shelf, we don’t know what R DANIC picture is inside. We’ll assume that the pictures are randomly placed in the boxes and that SPORTS RS PATRICA the boxes are distributed randomly to stores around the country. Why a model? Because ALL-STA K G O we won’t actually buy the cereal boxes. We can’t afford all those boxes and we don’t want L to waste food. So we need an imitation of the real process that we can manipulate and con- F trol. In short, we’re going to simulate reality. A simulation mimics reality by using random numbers to represent the outcomes of Dustin n real events. Just as pilots use flight simulators to learn about and practice real situations, Johnso we can learn a great deal about the real events by carefully modeling the randomness and analyzing the simulation results. The question we’ve asked is how many boxes do you expect to buy to get a complete IT’S ALL RANDOM! card collection. But we can’t answer our question by completing a card collection just Modern physics has shown once. We want to understand the typical number of boxes to open, how that number varies, that randomness is not just a mathematical game; it is and, often, the shape of the distribution. So we’ll have to do this over and over. We call fundamentally the way the each time we obtain a simulated answer to our question a trial. universe works. For the sports cards, a trial’s outcome is the number of boxes. We’ll need at least 3 boxes to get one of each card, but with really bad luck, you could empty the shelves of Regardless of improve- several supermarkets before finally finding the card you lack to get all 3. So, the possible ments in data collection or outcomes of a trial are 3, 4, 5, or lots more. But our simulation can’t simply pick one of in computer power, the best we can ever do, according those numbers at random, because they’re not equally likely. We’d be surprised if we only to quantum mechanics... is needed 3 boxes to get all the cards, but we’d probably be even more surprised to find that predict the probability that it took exactly 7,359 boxes. In fact, the reason we’re doing the simulation is that it’s hard an electron, or a proton, or to guess how many boxes we’d expect to have to open. a neutron, or any other of nature’s constituents, will be found here or there. Building a Simulation Probability reigns supreme We know how to find equally likely random digits—roll a ten-sided die, for example, or in the microcosmos. look in a random digits table, or use a computer program. How can we get from there to —Brian Greene, The simulating the trial outcomes? We know the relative frequencies of the cards: 20% LeBron, Fabric of the Cosmos: Space, Time, and the Texture of Reality (p. 91) 5 www.abc.net.au/news/science/2017-06-13/dr-karl-how-russian-cheats-beat-the-pokies/8607598 268 PART III Gathering Data 30% Danica, and 50% Dustin. So, we can interpret the digits 0 and 1 as finding LeBron; 2, 3, and 4 as finding Danica; and 5 through 9 as finding Dustin to simulate opening one box. Opening one box is the basic building block, called a simulation component. But the component’s outcome isn’t the result we want. We need to observe a sequence of compo- nents until our card collection is complete. The trial’s outcome is called the response variable; for this simulation that’s the number of components (boxes) in the sequence. Let’s look at the steps for making a simulation: Specify how to model a component outcome using equally likely random digits: 1. Identify the component to be repeated. In this case, our component is the opening of a box of cereal. 2. Explain how you will model the component’s outcome. The digits from 0 to 9 are equally likely to occur. Because 20% of the boxes contain LeBron’s picture, we’ll use 2 of the 10 digits to represent that outcome. Three of the 10 digits can model the 30% of boxes with Danica’s cards, and the remaining 5 digits can represent the 50% of boxes with Dustin. One possible assignment of the digits, then, is 0, 1 LeBron 2, 3, 4 Danica 5, 6, 7, 8, 9 Dustin. Specify how to simulate trials: 3. Explain how you will combine the components to model a trial. We pretend to open boxes (repeat components) until our collection is complete. We do this by look- ing at each random digit and indicating what picture it represents. We continue until we’ve found all three. 4. State clearly what the response variable is. What are we interested in? We want to find out the number of boxes it might take to get all three pictures. Put it all together to run the simulation: 5. Run several trials. For example, consider the third line of random digits shown earlier (p. 266): 8906427308645681412198226653885873285801699027843110380420067664. Let’s see what happens. The first random digit, 8, means you get Dustin’s picture. So the first component’s outcome is Dustin. The second digit, 9, means Dustin’s picture is also in the next box. Con- tinuing to interpret the random digits, we get LeBron’s picture (0) in the third, Dustin’s (6) again in the fourth, and finally Danica (4) on the fifth box. Because we’ve now found all three pictures, we’ve finished one trial of our simulation. This trial’s outcome is 5 boxes. Now we keep going, running more trials by looking at the rest of our line of random digits: 89064 2730 8645681 41219 822665388587328580 169902 78431 1038 042006 7664. It’s best to create a chart to keep track of what happens: Trial Outcomes: Trial y = Number Number Component Outcomes of boxes 1 89064 = Dustin, Dustin, LeBron, Dustin, Danica 5 2 2730 = Danica, Dustin, Danica, LeBron 4 3 8645681 = Dustin, Dustin, Danica,... , LeBron 7 4 41219 = Danica, LeBron, Danica, LeBron, Dustin 5 5 822665388587328580 = Dustin, Danica,... , LeBron 18 6 169902 = LeBron, Dustin, Dustin, Dustin, LeBron, Danica 6 7 78431 = Dustin, Dustin, Danica, Danica, LeBron 5 8 1038 = LeBron, LeBron, Danica, Dustin 4 9 042006 = LeBron, Danica, Danica, LeBron, LeBron, Dustin 6 10 7664 … = Dustin, Dustin, Dustin, Danica … ? CHAPTER 10 Understanding Randomness 269 ✴ Analyze the response variable: 16 6. Collect and summarize the results of all the trials. You know how to summarize and display a response variable. You’ll certainly want to report the shape, center, and spread, and depending on the question asked, you may want to include more. 12 7. State your conclusion, as always, in the context of the question you wanted to answer. Based on this simulation, we estimate that customers hoping to complete their card collection will need to open a median of 5 boxes, but it could take a lot 8 more. If you fear that these may not be accurate estimates because we ran only nine trials, you are absolutely correct. The more trials the better, and nine is woefully inadequate. How 4 many is enough? We’ll explore that question later in this chapter. # of Boxes F O R E X A MP L E Simulating a Dice Game The game of 21 can be played with an ordinary 6-sided die. Competitors each roll the die repeatedly, trying to get the highest total less than or equal to 21. If your total exceeds 21, you lose. Suppose your opponent has rolled an 18. Your task is to try to beat him by getting more than 18 points without going over 21. How many rolls do you expect to make, and what are your chances of winning? QUESTION: How will you simulate the components? ANSWER: A component is one roll of the die. I’ll simulate each roll by looking at a random digit from a table or an Internet site. The digits 1 through 6 will represent the results on the THE EASY WAY die; I’ll ignore digits 7–9 and 0. Some internet sites and QUESTION: How will you combine components to model a trial? What’s the response (Good News!) your graphing variable? calculator will allow you to simply generate random ANSWER: I’ll add components until my total is greater than 18, counting the number of digits from 1 to 6, just like a rolls. If my total is greater than 21, it is a loss; if not, it is a win. There are two response die. That makes things variables. I’ll count the number of times I roll the die, and I’ll keep track of whether I win much easier, because then or lose. you don’t need to ignore the QUESTION: How would you use these random digits to run trials? Show your method bogus “rolls” of 0, 7, 8, and 9 that show up in random clearly for two trials. digit tables. 91129 58757 69274 92380 82464 33089 ANSWER: I’ve marked the discarded digits in color. Trial #1: 9 1 1 2 9 5 8 7 5 7 6 Total: 1 2 4 9 14 20 Outcomes: 6 rolls, won Trial #2: 9 2 7 4 9 2 3 8 0 8 2 4 6 Total: 2 6 8 11 13 17 23 Outcomes: 7 rolls, lost QUESTION: Suppose you run 30 trials, getting the outcomes tallied here. What is your conclusion? ANSWER: Based on my simulation, Number of rolls Result when competing against an opponent 4 >>>> Won >>>> >>>> >>>> >>>> > who has a score of 18, I expect my 5 >>>> >>>> Lost >>>> >>>> turn to usually last 5 or 6 rolls, and I should win about 70% of the time. 6 >>>> >>>> > 7 >>>> 8 > 270 PART III Gathering Data JUST CHECKING The baseball World Series consists of up to seven games. The first Let’s set up the simulation: team to win four games wins the series. The first two are played at 1. What is the component to be repeated? one team’s home ballpark (Team A), the next three at the other team’s park (Team B), and the final two (if needed) are played back 2. How will you model each component from equally likely at Team A’s park. Records over the past century show that there is a random digits? home field advantage; in any game the home team has about a 55% 3. How will you model a trial by combining components? chance of winning. Does the current system of alternating ball- 4. What is the response variable? parks even out the home field advantage? How often will Team A, who begins at home, win the series? 5. How will you analyze the response variable? ST E P- BY- ST E P E X A M P L E Simulation Fifty-seven students participated in a lottery for a particularly desirable dorm room— a triple with a fireplace and private bath in the tower. Twenty of the participants were members of the same varsity team. When all three winners were members of the team, the other students cried foul. QUESTION: Could an all-team outcome reasonably be expected to happen if everyone had a fair shot at the room? THINK PL PLAN State the problem. Identify the I’ll use a simulation to investigate whether it’s unlikely that important parts of your simulation. im three varsity athletes would get the great room in the dorm if the lottery were fair. COMPONENTS Identify the components. A component is the selection of a student. OUTCOMES State how you will model each component I’ll look at two-digit random numbers. using equally likely random digits. You can’t just use the digits from 0 to 9 because the outcomes you are simulating Let 00–19 represent the 20 varsity applicants. are not multiples of 10%. Let 20–56 represent the other 37 applicants. There are 20 and 37 students in the two groups. This time Skip 57–99. If I get a number in this range, I’ll throw it away you must use pairs of random digits (and ignore some of and go back for another two-digit random number. them) to represent the 57 students. TRIAL Explain how you will combine the components to Each trial consists of identifying pairs of digits as V (varsity) simulate a trial. In each of these trials, you can’t choose or N (nonvarsity) until 3 people are chosen, ignoring out-of- the same student twice, so you’ll need to ignore a random range or repeated numbers (X)—I can’t put the same person in number if it comes up a second or third time. Be sure to the room twice. mention this in describing your simulation. RESPONSE VARIABLE Define your response variable. The response variable is whether or not all three selected students are on the varsity team. CHAPTER 10 Understanding Randomness 271 SHOW ME MECHANICS Run several trials. Carefully Trial Number Component Outcomes All Varsity? rec record the random numbers, indicating 74 02 94 39 02 77 55 1 No the corresponding 1) th d component outcomes (here, varsity, X V X N X X N nonvarsity, or ignored number) and 18 63 33 25 2) the value of the response variable. 2 No V X N N 05 45 88 91 56 3 No V N X X N 39 09 07 4 No N V V 65 39 45 95 43 5 No X N N X N 98 95 11 68 77 12 17 6 Yes X X V X X V V 26 19 89 93 77 27 7 No N V X X X N 23 52 37 8 No N N N 16 50 83 44 9 No V N X N 74 17 46 85 09 10 No X V N X V ANALYZE Summarize the results across all trials to answer “All varsity” occurred once, or 10% of the time. the initial question. TELL CONCLUSION Describe what the simulation In my simulation of “fair” room draws, the three people shows, and interpret your results in the chosen were all varsity team members only 10% of the time. context of the real world. While this result could happen by chance, it is not particularly likely. I’m suspicious, but I’d need many more trials and a smaller frequency of the all-varsity outcome before I would make an accusation of unfairness. TI TIPS Generating Random Numbers Instead of using coins, dice, cards, or tables of random numbers, you may decide to use your calculator for simulations. There are several random number generators offered in the MATH PROB menu. randInt( is of particular importance. This command will produce any number of random integers in a specified interval. In the dialog box, the lower and upper bounds determine the interval and n indicates how many random numbers you want to simulate. It’s okay to leave n blank if you only want to simulate one number. Here are some examples showing how to use randInt for simulations: ◆ randInt(0,1) randomly chooses a 0 or a 1. This is an effective simulation of a coin toss. You could let 0 represent tails and 1 represent heads. ◆ randInt(1,6) produces a random integer from 1 to 6, a good way to simulate rolling a die. 272 PART III Gathering Data ◆ randInt(1,6,2) simulates rolling two dice. To do several rolls in a row, just hit ENTER repeatedly. ◆ randInt(0,9,5) produces five random integers that might represent the pictures in the cereal boxes. Our run gave us one LeBron (0, 1), two Danicas (2, 3, 4), and two Dustins (5–9). ◆ randInt(0,56,3) produces three random integers between 0 and 56, a nice way to simulate the dorm room lottery. The window shows 4 trials, but we would skip the third one because one student was chosen twice. In none of the remaining 3 trials did three athletes (0–19) win. How Many Trials? When we showed you how to do simulations, first looking for the cereal box pictures The simulation... Results so far... and then checking the fairness of dorm room assignments, we ran just 10 trials. Let’s Box Random Picture Number of Percent number digit found LeBrons LeBron see why that’s not really enough. How? With a simulation, of course! 1 8 Dustin 0 out of 1 0% As an easy example, we’ll just pretend to open cereal boxes looking for pictures 2 3 Danica 0 out of 2 0% of LeBron James. While he’s in 20% of the boxes, that doesn’t tell us what we’ll 3 3 Danica 0 out of 3 0% actually find as we go box by box. The table shows the results of the first 10 trials. 4 0 LeBron 1 out of 4 25% Remember that the intent of a simulation is to gain insight about situations 5 6 Dustin 1 out of 5 20% we don’t understand. If we didn’t already know that LeBron’s picture is in 20% of 6 1 LeBron 2 out of 6 33% the boxes, these 10 trials wouldn’t tell us that. At best, we might feel comfortable 7 9 Dustin 2 out of 7 28% guessing that fewer than half of the boxes contain LeBron’s picture, but 10 trials 8 2 Danica 2 out of 8 25% just isn’t enough to say anything very definitive. 9 9 Dustin 2 out of 9 22% For the homework6 exercises we suggest you do 20 trials. How much better is 10 1 LeBron 3 out of 10 30% that? Let’s open more cereal boxes. Look at the graph displaying LeBron’s per- centage after each of the first 10 trials in the table above and for 10 more trials. 35 % of Cards for LeBron James 30 25 20 15 10 5 0 0 5 10 15 20 25 Number of Trials Now would we conclude that 20% was correct? Probably not, even though the esti- mated percentages do seem to be settling down a bit. It appears 20 trials is still too few. So let’s go big. We used a computer to run 1000 trials. The graph on the next page shows what happened. 6 Stop making that face. You knew there’d be homework. CHAPTER 10 Understanding Randomness 273 35 % of Cards for LeBron James 30 25 20 15 10 5 0 0 200 400 600 800 1000 1200 Number of Trials It appears that in this simulation there were quite a few LeBron pictures in the first 100 (or so) boxes, but as the number of trials mounted the percentage drifted toward the true value of 20%. With 1,000 trials we might be able to make a pretty good guess about the cereal boxes. Frankly, though, this is a pretty simple situation. In the real world, simu- lations are used to explore very complex issues like climate change, election outcomes, and even national defense. Those investigations require tens or even hundreds of thou- sands of trials!7 W H AT C A N G O W R O N G ? ◆ Don’t overstate your case. Let’s face it: In some sense, a simulation is always wrong. After all, it’s not the real thing. We didn’t buy any cereal or run a room draw. So beware of confusing what really happens with what a simulation suggests might happen. Never forget that future results will not match your simulated results exactly. ◆ Model outcome chances accurately. A common mistake in constructing a simulation is to adopt a strategy that may appear to produce the right kind of results, but that does not accurately model the situation. For example, in our room draw, we could have gotten 0, 1, 2, or 3 team members. Why not just see how often these digits occur in random digits from 0 to 9, ignoring the digits 4 and up? 32179005973792524138 321xx00xxx3xx2x2x13x This “simulation” makes it seem fairly likely that three team members would be chosen. There’s a big problem with this approach, though: The digits 0, 1, 2, and 3 occur with equal frequency among random digits, making each outcome appear to happen 25% of the time. In fact, the selection of 0, 1, 2, or all 3 team members are not all equally likely outcomes. In our correct simulation, we estimated that all 3 would be chosen only about 10% of the time. If your simulation overlooks important aspects of the real situation, your model will not be accurate. Simulations. Improve your ◆ Run enough trials. Simulation is cheap and fairly easy to do. Don’t try to draw conclusions based predictions by running thousands of trials. on 5 or 10 trials (even though we did for illustration purposes here). We’ll get a better handle on how many trials to use in later chapters. For now, err on the side of large numbers of trials. 7 We hope that makes you feel better about doing just 20 trials for the homework. See, we’re actually being nice to you! 274 PART III Gathering Data W H AT H AV E W E L E A R N E D ? We’ve learned to harness the power of randomness. We’ve learned that a simulation model can help us investigate a question for which many outcomes are possible, we can’t (or don’t want to) collect data, and a mathematical answer is hard to calculate. We’ve learned how to base our simulation on random values generated by a computer, gener- ated by a randomizing device such as a die or spinner, or found on the Internet. Like all models, simulations can provide us with useful insights about the real world. TERMS Random An outcome is random if we know the possible values it can have, but not which particu- lar value it takes. A random outcome is free of human influence. (p. 265) Generating random numbers Random numbers are hard to generate. Nevertheless, several Internet sites offer an unlimited supply of equally likely random values. (p. 266) Simulation A simulation models a real-world situation by using random-digit outcomes to mimic the uncertainty of a response variable of interest. (p. 267) Trial The sequence of several components representing events that we are pretending will take place. (p. 267) Simulation component A component uses equally likely random digits to model simple random occurrences whose outcomes may not be equally likely. (p. 268) Response variable Values of the response variable record the results of each trial with respect to what we were interested in. (p. 268) ON THE COMPUTER Simulation Simulations are best done with the help of technology simply because running more trials makes for a better simulation, and computers are fast. There are special computer programs designed for simulation, and most statistics packages and calculators can at least generate random numbers to support a simulation. All technology-generated random numbers are pseudorandom. The random numbers available on the Internet may techni- cally be better, but the differences won’t matter for any simulation of modest size. Pseudorandom numbers generate the next random value from the previous one by a specified algorithm. But they have to start somewhere. This starting point is called the “seed.” Most programs let you set the seed. There’s usually little reason to do this, but if you wish to, go ahead. If you APPLET reset the seed to the same value, the programs will generate the same sequence of Generate random numbers “random” numbers. EXERCISES 1. Random outcomes For each of the following scenarios, b) A friend asks you to quickly name a professional sports decide if the outcome is random. team. Is the sports team named random? a) Flip a coin to decide who takes out the trash. Is who takes c) Names are selected out of a hat to decide roommates in a out the trash random? dormitory. Is your roommate for the year random? CHAPTER 10 Understanding Randomness 275 2. More random outcomes For each of the following scenarios, a) Use a random integer from 0 through 9 to represent the num- decide if the outcome is random. ber of heads when 9 coins are tossed. a) You enter a contest in which the winning ticket is selected b) A basketball player takes a foul shot. Look at a random digit, from a large drum of entries. Was the winner of the contest using an odd digit to represent a good shot and an even digit random? to represent a miss. b) When playing a board game, the number of spaces you move c) Use random numbers from 1 through 13 to represent the is decided by rolling a six-sided die. Is the number of spaces denominations of the cards in a five-card poker hand. you move random? 12. More bad simulations Explain why each of the following c) Before flipping a coin, your friend asks you to “call it.” Is simulations fails to model the real situation: your choice (heads or tails) random? a) Use random numbers 2 through 12 to represent the sum of 3. The lottery Many states run lotteries, giving away millions of the faces when two dice are rolled. dollars if you match a certain set of winning numbers. How are b) Use a random integer from 0 through 5 to represent the num- those numbers determined? Do you think this method guaran- ber of boys in a family of 5 children. tees randomness? Explain. c) Simulate a baseball player’s performance at bat by letting 0 = an out, 1 = a single, 2 = a double, 3 = a triple, and 4. Games Many kinds of games people play rely on random- 4 = a home run. ness. Cite three different methods commonly used in the attempt to achieve this randomness, and discuss the 13. Wrong conclusion A Statistics student properly simulated the effectiveness of each. length of checkout lines in a grocery store and then reported, “The average length of the line will be 3.2 people.” What’s 5. Birth defects The American College of Obstetricians and wrong with this conclusion? Gynecologists says that out of every 100 babies born in the United States, 3 have some kind of major birth defect. How 14. Another wrong conclusion After simulating the spread of a would you assign random numbers to conduct a simulation disease, a researcher wrote, “24% of the people contracted the based on this statistic? disease.” What should the correct conclusion be? 6. Colorblind By some estimates, about 10% of all males have 15. Election You’re pretty sure that your candidate for class presi- some color perception defect, most commonly red– green dent has about 55% of the votes in the entire school. But you’re colorblindness. How would you assign random numbers to worried that only 100 students will show up to vote. How often conduct a simulation based on this statistic? will the underdog (the one with 45% support) win? To find out, you set up a simulation. 7. Geography An elementary school teacher with 25 students plans to have each of them make a poster about two different a) Describe how you will simulate a component. states. The teacher first numbers the states (in alphabetical b) Describe how you will simulate a trial. order, from 01-Alabama to 50-Wyoming), then uses a random c) Describe the response variable. number table to decide which states each kid gets. Here are the 16. Two pair or three of a kind? When drawing five cards ran- random digits: domly from a deck, which is more likely, two pairs or three of a 45921 01710 22892 37076 kind? A pair is exactly two of the same denomination. Three of a kind is exactly 3 of the same denomination. (Don’t count a) Which two state numbers does the first student get? three 8’s as a pair—that’s 3 of a kind. And don’t count 4 of the b) Which two state numbers go to the second student? same kind as two pair—that’s 4 of a kind, a very special hand.) How could you simulate 5-card hands? Be careful; once you’ve 8. Get rich Your state’s BigBucks Lottery prize has reached picked the 8 of spades, you can’t get it again in that hand. $100,000,000, and you decide to play. You have to pick five numbers between 1 and 60, and you’ll win if your numbers a) Describe how you will simulate a component. match those drawn by the state. You decide to pick your “lucky” b) Describe how you will simulate a trial. numbers using a random number table. Which numbers do you c) Describe the response variable. play, based on these random digits? 17. Cereal In the chapter’s example, 20% of the cereal boxes con- 43680 98750 13092 76561 58712 tained a picture of LeBron James, 30% Danica Patrick, and the rest Dustin Johnson. Suppose you buy five boxes of cereal. 9. Play the lottery Some people play state-run lotteries by Estimate the probability that you end up with a complete set of always playing the same favorite “lucky” number. Assuming the pictures. Your simulation should have at least 20 runs. that the lottery is truly random, is this strategy better, worse, or the same as choosing different numbers for each play? 18. Cereal again Suppose you really want the LeBron James pic- Explain. ture. How many boxes of cereal do you need to buy to be pretty sure of getting at least one? Your simulation should use at least 10. Play it again, Sam In Exercise 8 you imagined playing the 10 trials. lottery by using random digits to decide what numbers to play. Is this a particularly good or bad strategy? Explain. 19. Multiple choice You take a quiz with 6 multiple choice ques- tions. After you studied, you estimated that you would have about 11. Bad simulations Explain why each of the following simulations an 80% chance of getting any individual question right. What are fails to model the real situation properly: your chances of getting them all right? Use at least 20 trials. 276 PART III Gathering Data 20. Lucky guessing? A friend of yours who took the multiple 35 32 choice quiz in Exercise 19 got all 6 questions right, but now 30 27 claims to have guessed blindly on every question. If each Number of Tests 25 Frequency of question offered 4 possible answers, do you believe her? 20 Explain, basing your argument on a simulation involving at 15 least 10 trials. 15 10 7 7 21. Beat the lottery Many states run lotteries to raise money. A 4 4 5 3 Web site advertises that it knows “how to increase YOUR 1 chances of Winning the Lottery.” They offer several systems and criticize others as foolish. One system is called Lucky 0 2 4 6 8 10 Numbers. People who play the Lucky Numbers system just Number of Tests pick a “lucky” number to play, but maybe some numbers are 26. Basketball strategy Late in a basketball game, the team that is luckier than others. Let’s use a simulation to see how well this behind often fouls someone in an attempt to get the ball back. system works. Sometimes the rules put the foul shooter in a “one-and-one” sit- To make the situation manageable, simulate a simple uation. This means that if the shooter misses the first free throw, lottery in which a single digit from 0 to 9 is selected as the he stops shooting and gets no points. But if he makes the first winning number. Pick a single value to bet, such as 1, and keep shot, he gets to shoot again. If he misses the second shot, he has playing it over and over. You’ll want to run at least 100 trials. scored one point; if he makes the second shot, he has scored (If you can program the simulations on a computer, run several two points. Suppose the opposing player has made 72% of his hundred. Or generalize the questions to a lottery that chooses foul shots this season. two- or three-digit numbers—for which you’ll need thousands of trials.) a) Create a plan for a simulation to estimate the number of points he will score in a one-and-one situation. a) What proportion of the time do you expect to win? b) Here is a display of the results of 100 trials of a simulation. b) Would you expect better results if you picked a “luckier” Use these results to estimate the average number of points he number, such as 7? (Try it if you don’t know.) Explain. will score in one-and-one situations. 22. Random is as random does The “beat the lottery” Web site Frequency of Points 50 49 discussed in Exercise 21 suggests that because lottery numbers 40 are random, it is better to select your bet randomly. For the 28 30 same simple lottery in Exercise 21 (random values from 0 to 9), 23 generate each bet by choosing a separate random value between 20 0 and 9. Play many games. What proportion of the time do 10 you win? 23. It evens out in the end The “beat the lottery” Web site of 0 1 2 Points Exercise 21 notes that in the long run we expect each value to turn up about the same number of times. That leads to their 27. Still learning? As in Exercise 25, assume that your chance of recommended strategy. First, watch the lottery for a while, passing the driver’s test is 34% the first time and 72% for sub- recording the winners. Then bet the value that has turned up the sequent retests. Estimate the percentage of those tested who still least, because it will need to turn up more often to even things do not have a driver’s license after two attempts. out. If there is more than one “rarest” value, just take the lowest one (because it doesn’t matter). Simulating the simplified lot- 28. Blood donors A person with type O-positive blood can receive tery described in Exercise 21, play many games with this sys- blood only from other type O donors. About 44% of the U.S. tem. What proportion of the time do you win? population has type O blood. At a blood drive, how many poten- tial donors do you expect to examine in order to get three units 24. Play the winner? Another strategy for beating the lottery is of type O blood? the reverse of the system described in Exercise 23. Simulate the simplified lottery described in Exercise 21. Each time, bet the 29. Free groceries To attract shoppers, a supermarket runs a weekly number that just turned up. The Web site suggests that this contest that involves “scratch-off” cards. With each purchase, cus- method should do worse. Does it? Play many games and see. tomers get a card with a black spot obscuring a message. When the spot is scratched away, most of the cards simply say, “Sorry— 25. Driving test You are about to take the road test for your driver’s please try again.” But during the week, 100 customers will get license. You hear that only 34% of candidates pass the test the cards that make them eligible for a drawing for free groceries. Ten first time, but the percentage rises to 72% on subsequent retests. of the cards say they may be worth $200, 10 others say $100, 20 Because most teenagers really want to drive, they keep taking may be worth $50, and the rest could be worth $20. To register the test until they pass! those cards, customers write their names on them and put them in a) Create a plan for a simulation to estimate the average number a barrel at the front of the store. At the end of the week the store of tests drivers take in order to get a license. manager draws cards at random, awarding the lucky customers b) The histogram in the next column shows the results of 100 trials free groceries in the amount specified on their cards. The drawings of a simulation. Use these results to estimate the average num- continue until the store has given away more than $500 of free ber of tests drivers take in order to get a license. groceries. Estimate the average number of winners each week. CHAPTER 10 Understanding Randomness 277 30. Find the ace A technology store holds a contest to attract 37. Teammates Four couples at a dinner party play a board game shoppers. Once an hour, someone at checkout is chosen at ran- after the meal. They decide to play as teams of two and to select dom to play in the contest. Here’s how it works: An ace and the teams randomly. All eight people write their names on slips four other cards are shuffled and placed face down on a table. of paper. The slips are thoroughly mixed, then drawn two at a The customer gets to turn over cards one at a time, looking for time. How likely is it that every person will be teamed with the ace. The person wins $100 of store credit if the ace is the someone other than the person he or she came to the party with? first card, $50 if it is the second card, and $20, $10, or $5 if it is the third, fourth, or last card chosen. What is the average dollar 38. Second team Suppose the couples in Exercise 37 choose the amount of store credit given away in the contest? Estimate with teams by having one member of each couple write their names a simulation. on the cards and the other people each pick a card at random. How likely is it that every person will be teamed with someone 31. The family Many couples want to have both a boy and a girl. other than the person he or she came with? If they decide to continue to have children until they have one child of each sex, what would the average family size be? 39. Job discrimination? A company with a large sales staff Assume that boys and girls are equally likely. announces openings for three positions as regional managers. Twenty-two of the current salespersons apply, 12 men and 32. Repeat? You are listening to music on your phone. Your music 10 women. After the interviews, when the company announces player is in shuffle mode. After one of your favorite songs is the newly appointed managers, all three positions go to women. played, you are surprised to hear another song played by the The men complain of job discrimination. Do they have a case? same artist. And then a third! This makes you wonder if shuffle Simulate a random selection of three people from the applicant mode really shuffles all that randomly. However, you realize pool, and make a decision about the likelihood that a fair pro- that you are listening to a playlist with only your five favorite cess would result in hiring all women. artists. So perhaps three in a row with only five different choices (all roughly equal in proportion) isn’t that unusual. Run 40. Smartphones A proud legislator claims that your state’s new a simulation to untangle this dilemma! law banning texting and hand-held phones while driving re- duced occurrences of infractions to less than 10% of all drivers. 33. Dice game You are playing a children’s game in which the While on a long drive home from your college, you notice a few number of spaces you get to move is determined by the rolling people seemingly texting. You decide to count everyone using of a die. You must land exactly on the final space in order to their smartphones illegally who pass you on the expressway for win. If you are 10 spaces away, how many turns might it take the next 20 minutes. It turns out that 5 out of the 20 drivers you to win? were actually using their phones illegally. Does this cast doubt on the legislator’s figure of 10%? Use a simulation to estimate 34. Doubles In the game of Monopoly you roll two 6-sided dice. the likelihood of seeing at least 5 out of 20 drivers using their If both dice show the same number, that is called doubles. If phones illegally if the actual usage rate is only 10%. Explain you roll doubles, you get to roll again. Your friend is lucky and your conclusion clearly. rolls doubles twice in a row. How often do you expect this to happen? (Hint: Using technology to generate two numbers between 1 and 6 is the best way to run this simulation.) JUST CHECKING 35. The hot hand A basketball player with a 65% shooting per- Answers centage has just made 6 shots in a row. The announcer says this player “is hot tonight! She’s in the zone!” Assume the player 1. The component is one game. takes about 20 shots per game. Is it unusual for her to make 6 or 2. I’ll generate random numbers and assign numbers from 00 more shots in a row during a game? to 54 to the home team’s winning and from 55 to 99 to the visitors’ winning. 36. The World Series The World Series is a “best of 7” situation. That is, the teams play until one team has won 4 games. That 3. I’ll generate components until one team wins 4 games. might happen if one team wins 4 in a row. But it might take 7 I’ll record which team wins the series. games until that happens. Suppose that sports analysts consider 4. The response is who wins the series. one team a bit stronger, with a 55% chance to win any individ- 5. I’ll calculate the proportion of wins by Team A (who starts ual game. Estimate the likelihood that the underdog (the team at home). with only a 45% chance) wins the series.