Summary

This document is an OCR past paper for a secondary school unit on sampling distributions. It covers topics, including normal distributions, the 68-95-99.7 rule, z-scores, and practice problems.

Full Transcript

Name: _______________________________________ RS1: Unit 4 Study Guide Randomness in Data: Normal and Sampling Distributions (Topics 9, 12, 14, and 15) Date Knowledge Check Topic(s) New Topics Check Your Understanding...

Name: _______________________________________ RS1: Unit 4 Study Guide Randomness in Data: Normal and Sampling Distributions (Topics 9, 12, 14, and 15) Date Knowledge Check Topic(s) New Topics Check Your Understanding In general, you should be finishing all Mon. Topic 9 the material in the Study Guide for Dec. 2 each topic. Then you should check your answers with the key posted on Schoology. If you have any questions, Tues./Weds. Topic 12 ask in class or sign up for an 8th Dec. 3/4 Paper Part 2 Intro period for extra review time. Most topics also have practice in the Thurs./Fri. Google Doc for the unit. The Dec. 5/6 Topics 9 and 12 Topic 14 problems in the Google Doc also have solutions in the key. If you need more practice, there are Mon. Extra Practice problems for each Topic 15 Dec. 9 topic. Most are from the textbook, but some are in their own Google Doc and many in this unit in particular are Tues./Weds. Topics 14 and 15 included in the Study Guide. Review Unit 4 Dec. 10/11 When you get to the end of the unit, complete the Review Sheet in the Thurs./Fri. Study Guide. The answers are at the Unit 4 Test end of the problems in the Study Dec. 12/13 Guide. Page 1 Topic 9 Revisited: Measures of Spread Complete activity 9-4 in your google doc What you discovered in activity 9-4 is the Empirical Rule which can be used for mound shaped symmetrical distributions: The Empirical Rule (68-95-99.7 Rule) In a normal distribution approximately 68% of the values lie within 1 standard deviation of the mean, approximately 95% of the values lie within 2 standard deviations of the mean, and approximately 99.7% of the values lie within 3 standard deviations of the mean. Complete activity 9-5 in your google doc Using the box on textbook page 183, write the definition of standardization: Write the formula for calculating z-score: From the Watch Out on Page 184, observations above the mean should have ____________ z-scores and observations below the mean should have ___________ z-scores. Recall the Empirical Rule in order to begin the following exercise: Empirical Rule Within ±1σ ~ 68% of data Within ±2σ ~ 95% of data Within ±3σ ~ 99.7%of data Page 2 Dog Ages Here is a sample of ages of the dogs available for adoption at the local Animal Shelter last week: 7 4.5 3 2.6.75 2.1 3.2 9 6 5.5 1.25 Label the number line of the graph below with the approximate mean (4 years) and ±3 standard deviations (2 years) to the right and to the left of the mean. 1. Using the empirical rule, what % of the dogs’ ages are less than 4 years old? (label the diagram and shade the appropriate region) 2. Using the empirical rule, what % of the dogs’ ages are less than 6 years old? (label the diagram and shade the appropriate region) 3. Using the empirical rule, what % of the dogs’ ages are greater than 8 years old? (label the diagram and shade the appropriate region) 4. What % of the dogs’ ages are less than 5 years old? (label the diagram and shade the appropriate region) Page 3 You cannot find this last probability using the Empirical Rule (68-95-99.7 Rule). To find the area under the curve that is to the left of 5 you need Calculus. Luckily this Calculus has already been done and has been recorded in Table II on pages T-3 &4 in the back of your book. This is called the Standard Normal Probabilities Table and it utilizes the standard normal distribution which has a mean of 0 and a standard deviation of 1. A portion of the table is reproduced below. To use the table, you just have to find the z- score for your data value and then look up the z-score on the table to find the probability. x− 5−4 1 z= = = = 0.5  2 2 Once we calculate a z-score, we are using the standard normal distribution to find probability. Label the mean in the center as 0 and label  3 standard deviations of 1 from the mean. Then mark the z-score of 0.5 on the number line of the curve and shade all the region under the curve that is to the left of 0.5. The probability (the area that you shaded) = 0.6915 or 69.15%. On the table, the left-hand margin lists the unit’s place and the tenth’s place of the z-score. The top margin shows the hundredth’s place of the z-score (and also has 0 in the tenth’s place). So, the P(z ≤ 0.50) = 0.6915. The solution to this problem is often written P(x < 5) = P(z ≤ 0.50) = 0.6915 to show the data value from the original distribution. The question asked “less than”, but there is no difference between “less than” and “less than or equal to” because the “equal to” part is a single line of area which can’t really be measured (even with Calculus!). Page 4 5. Use the table on the previous page to find P(z ≤ 0.54) = 6. What % of the dogs ages are less than 7.9 years old? (Be sure to label and shade the curve and write the solution.) 7. What % of the dogs ages are less than 8.24 years old? (Be sure to label and shade the curve and write the solution.) 8. Suppose you wanted to find the percentage of dogs that are between 7.9 and 8.24 years old. How could you use the answers from above to answer this question? You can also use your technology to answer these questions instead of/or if you don’t have a table. We are using the standard normal curve (mean of zero and standard deviation of 1) to answer these questions. You will learn how to use a web app created by the authors of our textbook, Rossman and Chance. You will also learn how to use your calculator. You will need to use your calculator on assessments (knowledge checks, unit tests, the 2nd Quarter Exam). Page 5 For the Rossman/Chance Normal Probability Calculator, enter the mean and SD, then click “Scale to Fit”. Use the two rows at the bottom to find the probabilities you are looking for; you can do less than (), in between or outside (using both rows). If you click on the inequality sign it changes direction. Areas under the curve and the probabilities that a z-score falls within specified interval are one in the same. Therefore, we can use this command/app to find the area which is equivalent to the probability. 9. What % of the dogs ages are more than 8.24 years old? (Be sure to label and shade the curve and write the solution.) 10. What % of the dogs ages are between 2.1 and 7 years old? (Be sure to label and shade the curve and write the solution.) While it is important to find the probabilities of falling within a certain region, we sometimes need the reverse – what score will ensure that you are in the top 25%. Here, you are given the area/probability and asked to find the corresponding data value. To do this, we use the invNorm( function on our calculator. invNorm( finds the number for which the corresponding area occurs to the left of that value. You are answering the question P ( Z  z ) = Given Area where you are finding little z. invNorm( is under 2nd, vars, choice 3. Page 6 For example, what is the highest z-score for dog ages that will mark where the bottom 25% for a standard normal curve begins? What you are looking for is this picture where 25% of the area is shaded, which is represented by P( Z  z ) =.25. Since we don’t know the upper bound, we need to use invNorm(area, mean, standard deviation): We find that P( Z  −.6744) =.25. Since we used mean = 0 and SD = 1, the value given is a z-score. We can then work backwards and discover that: x−4 −.6744 = 2 −1.349 = x − 4 x = 2.651 This means that 25% of the dogs’ ages are less than 2.651 years. Here’s what it looks like in the Rossman/Chance app: You use the same app, but now you input the probability and it will calculate the z-score. You then use the z-score to find a value in the dig ages distribution. 11. Draw a standard normal curve, shade the area under the curve that matches the given probability, then use the invNorm( function or the Rossman/Chance app to find the following z-values: P( Z  z ) = 0.14 P( Z  z ) =.67 P( Z  z ) =.50 P( Z  z ) =.45 P( Z  z ) =.73 Page 7 Before the Knowledge Check on Topic 9, you should be able to: Check When Skill/Concept You Understand Apply the empirical rule when it is appropriate (Activity 9-4) Calculate z-scores and apply them to make relative comparisons (Activity 9-5) Use your calculator (invNorm function) to find a value in the distribution using the probability/area along with the mean and standard deviation (Dog Ages) Personal notes on Topic 9: Extra Practice: Topic 9 Book problems: 9-11, 9-12, 9-14, 9-22 (parts a and c), 9-25 (lists linked on student schedule), 9-28 (list linked on student schedule) Page 8 Topic 12: Normal Distributions Complete Activity 12-1 in your google doc These distributions are both approximately normal. Normal distributions have three three distinguishing characteristics: They are symmetric They are mound shaped They follow a bell-shaped curve. j. The following drawing contains three normal curves; think of them as approximating the distribution of exam scores for three different classes. One curve (call it A) has a mean of 70 and a standard deviation of 5; another curve (call it B) has a mean of 70 and a standard deviation of 10; the third curve (call it C) has a mean of 50 and a standard deviation of 10. Give the appropriate labels (A, B, or C) to the curves below to indicate which curve corresponds to which class. Earlier in this activity, we discovered that if two observations from two normal distributions have the same z-score, then the normal model will predict the same percentage of the distribution will be to the left of those observations. Watch Out: Using page 252 in your textbook, fill in the following based on looking up the area to the left of -1.03 in the Standard Normal Probabilities table: The probability, 0.1515, from the Body Temperatures histogram can be interpreted as… * The ________ under the normal curve to the left of z = -1.03. * Approximately _______ of the body temperatures in the population are _______________ 97.5°F. * If we repeatedly select one healthy adult at random, then in the long run, approximately ______ of those selected adults will have a temperature ________ 97.5°F. * The __________________ of a randomly selected adult having a temperature ______ 97.5°F is 0.1515. Table II and the standard normal probability distribution require that you standardize your value (find the z-score) in order to answer this question. This is not necessary. Your calculator can simply utilize the mean and standard deviation of your distribution (without switching to the standard normal distribution). Under DISTR, choose 2:normalcdf. The inputs are the lower bound of where you are looking for probability, the upper bound for where you are looking for probability, the mean of the distribution, and the SD of the distribution. (The CDF stands for Cumulative Distribution Function and it calculates the area to the left of the value given, which is what we want. Think about the cumulative part as shading in the area. The other option, Page 9 PDF, stands for Probability Denisty Function, and it doesn’t apply to the type of distrbutions we have so don’t choose/use it!) The Rossman/Chance Normal Probability Calculator will also let you use the mean and SD of your distribution. Don’t forget to click “Scale to Fit” after inputting a new mean and SD so the graph resets to an appropriate horizontal axis scale. Type into the “X” box if you are using a value from your distribution (earlier we used the “Z” box since we had z-scores, or really we were in the Standard Normal Distribution so our distribution values WERE z-scores). Here’s what it looks like from the Rossman/Chance Normal Probability Calculator: You input the mean and SD in the top part, then click “Scale to Fit”. In the bottom part you choose the correct direction and input the value from your distribution that you are interested in finding out the probability of in the “X” box. When writing a solution after using your calculator/app or a standard normal probability table, one of the following is necessary for a complete solution: * If you use the standard normal distribution technique shown earlier, writing P(x < 2500) = P(z < -1.404) = 0.0802 is sufficient because it is clear that you used the correct mean and standard deviation when calculating the z-score. * If you use the normal distribution technique, writing P(x < 2500) = 0.0802 is NOT sufficient because there are an infinite number of distributions with values of 2500. You must write P(x < 2500) = 0.0802 given that the mean = 3300 and the standard devation = 570. You may use the appropriate symbols for sample or population means and standard deviations. Complete Activity 12-2 in your google doc Page 10 Turn to pages 255-6 and answer the following questions based on the “Watch Out.” 1. When Using the Normal Probability Table, when you want to find the probability to the right of the z-score, what must you do? 2. When Using the Normal Probability Table, when you want to find the probability between two values, what should you do? 3. Can probability ever be negative? 4. When using the term “at least”, are you looking at area to the left or right of the given value? 5. What is another term for “at least”? 6. If a z-score is too extreme to appear in the table, what should you do? An important step in many statistical analyses is to judge whether or not sample data could have plausibly (likely, possibly) come from a population that follows a normal distribution. You can learn a lot from examining a dotplot or a histogram, but another graphical display has been developed especially for this purpose. A normal probability plot graphs the observed data as the x-values against what would be expected from a theoretical normal distribution (z-scores for each value as the y-values). This is also called a normal quantile plot sometimes. If the data were perfectly normal, this plot would display a straight line. With real data, you look to see if the normal probability plot roughly follows a straight line (an easier task than judging the fit of a curve to a histogram or dotplot). Skewed data reveal a curved pattern in a normal probability plot. If you’re not sure if the plot is close to straight, put your pencil on top of it to see if the data points are covered up. If they are, your distribution is approximately normal. If they’re not, because the plot is curved, your distribution is not approximately normal. Page 11 Stapplet will create a normal probability plot if you choose One Quantitative Variable. It’s an option in the graph type drop down. There is a second way to check for normality. Draw a parallel dotplot and boxplot of the data which you can easily do in Stapplet. If the data is approximately normal, the dotplot will be symmetric and mound shaped; the boxplot will also be symmetric. You need to make and check both plots to use this method to check for normality. The graphs below show a histogram, dotplot, and boxplot for the diastolic blood pressure sample (DIABP) which we saw from the normal probability plot was approximately normal: Be careful! It is very important to examine both plots. The graphs below show a histogram, dotplot, and boxplot for the pulse rate sample (PLRT) which we saw from the normal probability plot was most likely NOT approximately normal, but the boxplot alone doesn’t look that different from the one above: Complete activity 12-3 in your google doc Page 12 Before the Knowledge Check on Topic 12, you should be able to: Check When Skill/Concept You Understand Describe the properties of normal curves (Activity 12-1) Perform calculations, both of the probabilities and percentiles, from a normal distribution (Activities 12-1, 12-2) Use your calculator to find a probability using a lower and upper bound with the mean and standard deviation (Study Guide page 9) Assess whether sample data could plausibly have arisen from a normally distributed population, based on normal probability plots and other graphs of the distribution (Activity 12-3) Personal notes on Topic 12: Extra Practice: Topic 12 Book problems: 12-5 (parts a-d), 12-6, 12-10, 12-11a, 12-16, 12-17, 12-18, 12-19 (lists linked on student schedule), 12-22 Page 13 Topic 14: Sampling Distributions - Means Read the intro to Activity 14-1 in the textbook Complete Activity 14-1 in your google doc When you conduct a study—observational study or experiment—you collect data only once. The data are your sample. In this exercise, however, we investigate what would happen if we sampled many times. You saw in part f that the sample mean varied from sample to sample. This variation, which is predictable over the long run, is known as a sampling distribution. The sample mean is an unbiased estimator of the population mean. In other words, the center of the sampling distribution of x is the population mean. The spread of the sampling distribution of x is smaller than the spread of the population, provided the sample size is greater than one. Read the “Watch Out” on p.295 of your textbook and answer the following questions: 1. What symbol designates population mean? Is this measure fixed or variable? 2. What symbol designates sample mean? Is this measure fixed or variable? Is the long-term pattern to this variation predictable? 3. What other term is given to the “mean of the sample means”? Read paragraph in the middle of page 298 in your textbook and fill in the blanks below. The sampling distribution of the sample means becomes more and more ________________ as the sample size ______________. The theoretical means of these distributions are all equal to the ________________, but the variability in sample means ______________ as the _______________ increases. Complete Activity 14-2 in your google doc Page 14 Read the Central Limit Theorem (CLT) for a Sample Mean on page 299 in your textbook and complete the following for your notes. Suppose a ___________________ sample of size n is taken from a large population (at least __________ larger than the sample size) in which the variable of interest has mean __ and standard deviation __. Then the sampling distribution of the sample mean __ will have the following properties: Shape: Center: Spread: The shape of the distribution will be exactly normal for all sample sizes when the population itself is normal. The shape of a distribution will be approximately normal, regardless of the shape of the population distribution with a large sample size. What is the guideline size unless a population is extremely non-normal? The “Watch Out” on page 300 discusses the CLT for both proportions and means. We are only concerned with means. Answer the following questions. 1. What is the shape of the sampling distribution? 2. What does the mean of the sampling distribution equal? 3. By what factor does the standard deviation of the sampling distribution decrease? 4. Before you apply CLT, what conditions must be met? Complete Activity 14-3 in your google doc Page 15 Before the Knowledge Check on Topic 14, you should be able to: Check When Skill/Concept You Understand Perform simulation analyses to investigate the sampling distribution of a sample mean (Activities 14-1 and 14-2) Describe the impact of the sample size, population mean, population standard deviation, and population shape on the sampling distribution of a sample mean (Activity 14-2) Recognize when the Central Limit Theorem for a sample mean is applicable (Activities 14-2 and 14-3) Perform probability calculations based on the Central Limit Theorem for a sample mean (Activity 14-3) Personal notes on Topic 14: Extra Practice: Topic 14 Book problems: 14-5 (parts a,b – list linked in student schedule), 14-9, 14-14, 14-15a, 14-16, 14-17 Page 16 Topic 15: Central Limit Theorem and Statistical Inference Topic 15 Introduction: As we have seen, we expect there to be some variation between the value of the true population mean and the sample statistic calculated from collected data. The question that we want to ask is “what is the source of this variation?” Consider the following: A simple random sample of 50 female adults in the Washington, D.C. suburbs was taken, and the mean height was found to be 66.3 inches. Census data indicate that the mean height of adult females in the same region is 65.4 inches and that the standard deviation is 3.5 inches. 1. What is the parameter? What symbol is used? 2. What is the statistic? What symbol is used? 3. Draw the normal curve with the population parameter and sampling distribution standard deviation. Shade the region where P( x > 66.3). 4. What is P( x > 66.3)? Remember to use the sampling distribution standard deviation. We should not be surprised by this variability. But what caused it? There are three possible sources of this variation. 1. Bias. If the sampling procedure allowed bias to occur during data collection, then there will be a difference between the population and sample means. 2. Chance Error. In a simple random sample, every possible subset of the population has the same probability of being chosen. Therefore, a subset whose measurements deviate from the expected value can occur strictly by chance. 3. A Significant Event. When the difference between the population and sample means is so great that we can rule out chance error, the observed result is said to cast doubt on the validity of the population mean. This idea is the basis for statistical significance. Roughly speaking, a sample result is said to be statistically significant if it is unlikely to occur due to random sampling variability alone. Or, in other words, a sample statistic is considered significant when the probability of data generated by a simple random sample yielding this statistic is very small. Generally, if probability that the result is less than or equal to.05, then the result is considered significant. If the probability that the result is less than or equal to.01, the result is considered highly significant. Page 17 5. Is the fact that the mean height of our sample is 66.3 significant? This activity has introduced you to the concept of statistical significance. Look at page 312 in your textbook and answer the following questions: 1. How can you determine the statistical significance of a sample statistic? 2. When is a sample result said to be statistically significant? It is unusual to know the population mean and standard deviation, but we often do in our practice activities so we can compare and learn to trust the statistical methods we are learning. Complete Activity 15-2 in your google doc This activity has introduced you to the concept of statistical confidence. It relates to how close you expect a sample statistic to come to its corresponding population value. Turn to page 315 to help answer the following question: What two things determine the distance between the population parameter and sample statistic? Always remember to check the technical conditions on which the validity of the Central Limit Theorem rests. If these conditions are not satisfied, calculations from the CLT can be erroneous and misleading. For sample means, the result holds exactly when the population itself is normally distributed and approximately for large sample sizes, n > 30, from non-normal populations. Usually, we will want to make inferences about an unknown population parameter based on the observed value of a sample statistic. The CLT lets us use statistical inference, constructing confidence intervals which we will learn next and conducting significance testing, which is the focus of Unit 5, to be able to produce and understand statistical reasoning. Page 18 Before the Knowledge Check on Topic 15, you should be able to: Check When Skill/Concept You Understand Perform and interpret results of calculations, based on the Central Limit Theorem, related to the issue of statistical significance (Topic 15 Introduction in Study Guide) Perform and interpret results of calculations, based on the Central Limit Theorem, related to the issue of statistical confidence (Activity 15-2) Recognize the conditions that must be checked to determine whether the Central Limit Theorem is applicable (Study Guide page 18) Personal notes on Topic 15: Extra Practice: Topic 15 Book problems: 15-6, 15-7, 15-19 Page 19 Extra Practice: Review Problems True/False: ____1. The z-score indicates how many standard deviations above or below the mean a particular value falls. ____ 2. The empirical rule for mound-shaped, symmetric distributions states that approximately 68% of the observations fall within 1 standard deviation of the mean, 90% fall within 2 standard deviations of the mean, and 99.7% fall within 3 standard deviations of the mean. ____3. The sample statistic is fixed. ____4. The sample mean varies from sample to sample, but there is a predictable, long-term pattern to this variation. ____5. The variability of the sampling distribution increases as the sample size n increases, by a factor of 1. √𝑛 ____6. The sampling distribution of the sample mean looks more and more like a normal distribution as the sample size increases. Multiple Choice. Choose the best answer. _____1. If the range of a set of data is 36 and the data is mound shaped, then a reasonable estimate of the standard deviation is A. 6 B. 18 C. 72 D. 108 E. 216 _____2. Which of the following is NOT possible? A. The standard deviation is greater than the mean. B. The 5-number summary (min, lower quartile, median, upper quartile, max) has 3 identical values. C. The mean is negative and the standard deviation is positive. D. The interquartile range is equal to the range. E. All of these are possible. Page 20 3. Match the following graphs of normal distributions with the appropriate mean and standard deviation. _____i. Mean 15 and standard deviation 4 _____ ii. Mean 13 and standard deviation 2 _____ iii. Mean 15 and standard deviation 1 _____ iv. Mean 17 and standard deviation 2 4. A student is deciding whether to take the SAT exam a second time. He received a 540 on the verbal section on his first attempt. The mean score on the verbal section was 478 with a standard deviation of 92. He has read that the expected mean of the next test will be 485 with a standard deviation of 95. What is the minimum number of points that he would have to improve his score on his second attempt to improve his performance? A. 3 B. 10 C. 15 D. 18 E. 21 5. A symmetric, mound-shaped distribution has a mean of 42 and a standard deviation of 7. Which of the following is true? A. There are more data values between 42 and 49 than between 28 and 35. B. It is impossible that the distribution contains a data value greater than 70. C. Approximately 95% of the data lie between 35 and 49. D. The interquartile range is approximately 14. Page 21 Short Answer: 1. In general, which vary more: averages or individual observations? 2. Give an intuitive explanation for your answer to question 1. 3. Which vary more: averages based on a few observations or averages based on many observations? 4. Give an intuitive explanation for your answer to question 3. 5. Suppose the IQ scores of students at a certain college follow a normal distribution with mean 115 and standard deviation 12. a. Draw a well-labeled sketch of this distribution. b. Shade in the area corresponding to the proportion of students with an IQ less than 100. Based on this shaded region, make an educated guess as to this proportion of students. c. Use the normal model to determine the proportion of students with an IQ score less than 100. d. Find the proportion of these undergraduates having IQs between 110 and 130. e. With his IQ of 75, Forrest Gump would have a higher IQ than what percentage of these undergraduates? f. Determine how high a student’s IQ must be to be in the top 1% of all IQs at this college. g. Find the z* values that “cut off” the top 5%, top 2.5%, top 1%, and top 0.5% of a standard normal distribution. Once you have completed the review questions above, complete the Self-Check problem at the end of each topic. The textbook page numbers and necessary calculator lists are given below. The solution to each Self-Check follows in the textbook so you can check your answers immediately. Self-Check 12-4, textbook page 257 Self-Check 14-4, textbook page 302 Self-Check 15-4, textbook page 316 Answers Ture/False: 1. T 2. F 3. F 4. T 5. F 6. T Multiple Choice: 1. A 2. E 3i. B 3ii. C 3iii. A 3iv. D 4. B 5. A Short Answer: 1. Individual observations vary more than averages, which “soften” the effects of extremely high or low values in a sample. 2. see above 3. Averages based on a few observations vary more. 4. A small sample contains fewer observations to counterbalance the effect of an extremely large or small value in the sample. 5. a. Sketch: b. 79 91 103 115 127 139 151 Page 22 79 91 103 115 127 139 151 c. See graph above. Area =.10565, so that is the proportion of students with an IQ score less than 100.  110 − 115 130 − 115  d. P(110 < x < 130) = P z  = P ( −5 /12  z  1.25 ) =.8944 –.3385 =  12 12 .556  75 − 115  e. P(x < 75) = P  z   = P ( z  −3.33) =.00043  12  Only 4 hundredths of 1% of these undergraduates would have a lower IQ than Forrest Gump.  x − 115  f. P z   =.99 The z* critical value from table or calculator  12  x − 115 invNorm(.99, 0, 1) = 2.326. Therefore: 2.326 = → x = 142.9. 12 g. 1.645, 1.960, 2.326, 2.576 Glossary Central Limit Theorem for a sample mean: Suppose a simple random sample of size n is taken from a large population (at least ten times larger than the sample size) in which the variable of interest has mean μ and standard deviation σ. Then the sampling distribution of the sample mean x– will have the following properties: Shape: The distribution will be (approximately) normal. Center: The mean will equal μ. Spread: The standard deviation will equal σ/√n. Empirical Rule: With mound-shaped, symmetric distributions, approximately 68% of the observations fall within one standard deviation of the mean, approximately 95% of the observations fall within two standard deviations of the mean, and approximately 99.7% fall within three standard deviations of the mean. Normal distribution: a symmetric, mound-shaped curve, often called bell-shaped; the mean, median and mode are equal and in the middle of the distribution; its shape is defined by its mean and standard deviation Normal probability plot: a useful tool for judging whether or not sample data could plausible have come from a normally distributed population; if the plot is roughly linear, it suggests that the data can reasonable be modeled using a normal distribution Percentile: the percentage of scores that fall at or below a given score Population distribution: a statement of the frequency with which the units of analysis or cases that together make up a population are observed or are expected to be observed in the various classes or categories that make up a variable. This distribution is almost never observed, and the goal in a statistical study is usually to learn about this population distribution from the sample distribution. Sample distribution: This distribution consists of the sample data that you actually observe and analyze. With random sampling, the sample distribution should roughly resemble the population distribution. Page 23 Sampling distribution: This distribution describes how a sample statistic (such as a sample mean) varies if random samples are repeatedly taken from the population. You used simulation to study sampling distributions and now know the theoretical result—the Central Limit Theorem—that describes them. These sampling distributions often behave very differently than the population or sample distribution. Standardization: the process of converting values to standard scores where a standard score is the signed number of standard deviations by which an observation or data is above the mean. Standard scores are also called z-values, z-scores, normal scores, and standardized variables; the use of “Z” is because the normal distribution is also known as the “Z” distribution Statistical confidence: relates to how close you expect a sample statistic to come to its corresponding population value. Although you cannot use a sample statistic to determine a population parameter exactly, you can be reasonably confident that the sample statistic falls within a certain distance of the population parameter. Therefore, once you observe the value of a sample statistic, you can be confident that the population parameter is within that distance of the sample statistic. This distance depends on how confident you want to be and on the size of the sample. Statistical significance: relates to how unlikely an observed sample statistic is to have occurred, assuming some conjectured value for the population parameter. You determine the statistical significance of a sample statistic by exploring the sampling distribution of the statistic, investigating how often an observed sample result occurs simply by random chance. Roughly speaking, a sample result is said to be statistically significant if it is unlikely to occur due to random sampling variability alone. z-score: A standard score found by subtracting the mean from the value of interest and then dividing by the standard deviation. They indicate how many standard deviations above or below the mean a particular value falls. Page 24

Use Quizgecko on...
Browser
Browser