Confidence Intervals for σ, Unknown and Proportions PDF

Summary

This document discusses confidence interval estimation of the population mean when the population variance is unknown. It explains the concept and provides formulas, using examples and tables.

Full Transcript

Chapter 4: Estimation of Parameters Lesson 4: Confidence Interval Estimation of the Population Mean (Part 2) TIM E FRAM E: 60 minutes LESSON OVERVIEW : In this lesson, learners continue to learn about interval estimation of the population mean discussed in the previous lesson, but this time under...

Chapter 4: Estimation of Parameters Lesson 4: Confidence Interval Estimation of the Population Mean (Part 2) TIM E FRAM E: 60 minutes LESSON OVERVIEW : In this lesson, learners continue to learn about interval estimation of the population mean discussed in the previous lesson, but this time under the assumption that the parent distribution follows a normal curve, and that the population variance s2 is unknown. The interval estimate makes use of percentiles of a Student’s t distribution with n-1 degrees of freedom. LEARNING COM PETENCIES At the end of the lesson, the learner should be able to: Construct a (1-α)100% confidence interval estimator of the population mean when the population variance is unknown Use the Student’s t distribution table in getting a tabular value Construct a (1-α)100% confidence interval estimator of the population mean when the population variance is unknown and sample size is large enough to invoke the Central Limit Theorem Interpret confidence interval estimates PRE-REQ UISITE KNOW LEDGE AND SKILLS: Knowledge in confidence interval estimation of the population mean when the population variance is known LESSON OUTLINE A. Construction and interpretation of a (1-α)100% confidence interval estimator of the population mean when the population variance is unknown B. Use of the Student’s t distribution table in getting a tabular value C. Construction and interpretation of a (1-α)100% confidence interval estimator of the population mean when the population variance is unknown and sample size is large enough to invoke the Central Limit Theorem D. Illustration on the computation of an interval estimate of the population mean and its interpretation DEVELO PM ENT O F THE LESSO N First, recall how to construct an interval estimator. ! 336# In this expression, the tabular value depends on the sampling distribution of the sample mean. You learned in the previous lecture that the tabular value to use in the mathematical expression when the population variance is known is to be taken from the standard normal distribution. When the population variance is unknown, there is a slight change in the construction of the confidence interval and the changes involve the tabular value and the standard error of the sample mean. A. Construction and interpretation of a (1-α)100% confidence interval estimator of the population mean when the population variance is unknown With an unknown population variance (σ2), it has to be estimated using a simple random sample of size n. A point estimator of the population variance is the sample variance denoted as s2 and computed as The square root of the sample variance is the sample standard deviation, denoted as s. Such point estimate of the population standard deviation is used in the computation of the standard error of the sample mean and can be computed as a ratio of the sample standard deviation and the square root of the same size or mathematically,. B. Use of the Student’s t distribution table in getting a tabular value The tabular value to use would come from the Student’s t-distribution table. Usually, we use the notation t ( /2,n-1) as a tabular value in the Student’s t-distribution with degrees of freedom α equal to n-1. Such tabular value is also a point in the distribution whose area to its right is equal to α/2. A Student’s t-distribution table (Please see attached table generated using MS Excel®) provides the area or probability to the right of a given value (t0). The illustration below shows a part of the table. The first row of the table provides selected probabilities or areas while the first column provides the degrees of freedom. The intersection of the area and the degrees of freedom is the needed tabular value. df 0.10 0.05 0.025 0.01 0.005 Selected!Probabilities! 1 3.08 6.31 12.71 31.82 63.66 2 1.89 2.92 4.30 6.96 9.92 Degrees!of!Freedom!(df)! 3 1.64 2.35 3.18 4.54 5.84 Tabular!Value!with!area!of! 0.025!to!its!right!in!a!Student’s! 4 1.53 2.13 2.78 3.75 4.60 t!distribution!with!3!df.! 5 1.48 2.02 2.57 3.36 4.03 Graphically,!this!is!shown! 6 1.44 1.94 2.45 3.14 3.71 below:! 0.025! ! !!!!t(0.025,3)=3.18! ! ! 337# ! ! Thus, a (1-α)% confidence interval (CI) of the population mean (µ) when the population variance (σ2) is unknown is constructed as or where and s are the sample mean and sample standard deviation, respectively. Both are computed using a simple random sample of size n. The lower limit of the interval is while the upper limit is For this case, the width of the interval estimate is computed as: and the maximum allowable deviation is C. Construction and interpretation of a (1-α)100% confidence interval estimator of the population mean when the population variance is unknown and sample size is large enough to invoke the Central Limit Theorem A property of the Student’s t distribution is that it approaches the standard normal distribution as its degrees of freedom increase. Since the degrees of freedom that we are concerned about at the moment depend on the sample size n, we can say that as n increases, the Student’s t distribution approaches the standard normal distribution. This is also in consonance to the Central Limit Theorem, discussed in the previous chapter. With these concepts, the tabular value to be used in the construction of the confidence interval for the population mean when the sample size is at least 30 is to be taken from the Z- distribution table. Thus, the following expression is to be used in constructing a (1-α)% confidence interval (CI) of the population mean (µ) when the population variance (σ2) is unknown and the sample size is at least 30: or For this case, the width of the interval estimate is computed as: and the maximum allowable deviation is D. Illustration of the Com putation Again, consider the numerical example used in point and interval estimation of the population mean where the following observed weights (in kilograms) of a random sample of 20 learners were used. ! 338# 40 45 46 48 48 50 55 55 56 58 58 59 60 60 62 62 64 64 65 66 The sample mean is computed as: kg. This time, you don’t have an assumed value of the population standard deviation of the weights of all learners in the class. Because of this situation, there is a need to use a point estimate of the population standard deviation. Using the same sample observations given above, a point estimate of the population standard deviation is With the sample mean and standard deviation, the 95% confidence interval estimate of the true average weight of the learners is Thus, we say that we are 95% confident that the true average weight of all learners in the class is between 52 kg and 60 kg (rounded off to the nearest integer). TEACHER TIPS Use the same numerical example for future lessons. ENRICHM ENT Plan an enrichment activity that involves learners measuring their foot sizes. The teacher records the foot sizes of all learners in class in order to obtain the population mean foot size of the entire class. The class is then divided into groups of 3 to 5 learners. Using a simple random sample of 10 learners, the groups will estimate the average foot size of the entire class. Numeric summaries (mean and five-number summary) and box plots can be used to obtain point and interval estimates, respectively, for the mean foot size of the entire class. The confidence level, or reliability, for the interval estimates computed by the learners is estimated by obtaining the proportion of interval estimates that “trap” the population average foot size of the entire class. REFERENCES Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua, Welfredo Patungan, Nelia Marquez). Philippines: Rex Bookstore. ! 339# Parks, S., Steinwachs, M., Diaz, R., and Molinaro, M. Did I Trap the Median? STatistics Education Web (STEW). Retrieved from https://www.amstat.org/education/stew/pdfs/DidITrapTheMedian.docx De Veau, R. D., Velleman, P. F., and Bock, D. E. (2006). Intro Stats. Pearson Ed. Inc. Freedman, D., Pisani, R, and Purves, R. (2007). Statistics, Fourth Edition. New York: W. W. Norton & Company. Workbooks in Statistics 1: 11th Edition. Institute of Statistics, UP Los Baños, College Laguna 4031! ASSESSM ENT I. Using a problem in Lesson 2 of this Chapter, ask learners to do the computational exercises on the construction of the confidence interval of the population mean when the population variance is unknown. 1. The nickel metal Hydride (Nimh) battery is one of the highly advertised rechargeable batteries today. It is lighter and can last up to 2 to 4 times longer than alkaline or standard Nickel-Cadmium (NiCd) batteries. To evaluate its performance, a random sample of 10 Nimh batteries was taken. The number of photos taken using each battery in a digital camera is given as follows: 405, 564, 342, 456, 435, 543, 473, 452, 462, and 475. Construct and interpret a 95% confidence interval for the true mean number of photos taken using the Nimh battery. Answer: With photos and its standard error is equal to the 95% confidence interval for the true mean number of photos taken using the Nimh battery is. We say that we are 95% confident that the true mean number of photos taken using the Nimh battery is between 416 and 506 photos. Further, we could have the following additional problems: 1. The Municipal Planning Officer of Los Baños wants to determine if the average wage of labourers per hour in the municipality is below Php 320. A random sample of 40 labourers in the municipality yielded a mean of Php 300 per hour with a standard deviation of Php 50 per hour. Using this information, construct a 99% confidence interval estimate of the true average wage rate per hour of labourers in the Municipality of Los Baños. Answer: With pesos and its standard error is equal to the 99% confidence interval for the true average wage rate per hour of labourers in the Municipality of Los Baños is ! 340#. We say that we are 99% confident that the true average wage rate per hour of labourers in the Municipality of Los Baños is between Php 280 and Php 320. 2. A machine produces metal pieces which are cylindrical in shape with an average mean diameter of 14.20 cm if the machine is in good condition. A quality engineer officer evaluates the condition of the machine by using a random sample of 36 runs which resulted to a mean diameter of 14.25 cm with standard deviation of 0.30 cm. Using this information, construct a 95% confidence interval estimate of the true average diameter of the cylindrical metal pieces produced by the machine. Answer: With cm and its standard error is equal to the 95% confidence interval for the true average diameter of the cylindrical metal pieces produced by the machine is. We say that we are 95% confident that the average diameter of the cylindrical metal pieces produced by the machine is between 14.152 and 14.348 cm. II. Provide the Best Choice 1. Which of the following is not true about the Student’s t distribution? a) It has more area in the tails and less in the center than does the normal distribution. b) It is used to construct confidence intervals for the population mean when the population standard deviation is known. c) It is bell-shaped and symmetrical. d) As the number of degrees of freedom increases, the t distribution approaches the normal distribution. ANSWER: B 2. The t distribution a) assumes the population is normally distributed. b) approaches the normal distribution as the sample size increases. c) has more area in the tails than does the normal distribution. d) All of the above. ANSWER: D 3. A major department store chain is interested in estimating the average amount its credit card customers spent on their first visit to the chain’s new store in the mall. Fifteen credit card accounts were randomly sampled and analyzed with the following results: ! 341# and. Assuming the distribution of the amount spent on their first visit is approximately normal, what is the shape of the sampling distribution of the sample mean that will be used to create the desired confidence interval for µ? a) Approximately normal with a mean of Php 2525 b) A standard normal distribution c) A t distribution with 15 degrees of freedom d) A t distribution with 14 degrees of freedom ANSWER: D 4. A major department store chain is interested in estimating the average amount its credit card customers spent on their first visit to the chain’s new store in the mall. Fifteen credit card accounts were randomly sampled and analyzed with the following results: and. Construct a 95% confidence interval for the average amount its credit card customers spent on their first visit to the chain’s new store in the mall. a) 2525 pesos ± 454.5 peos b) 2525 pesos ± 506 pesos c) 2525 pesos ± 550 pesos d) 2525 pesos ± 554 pesos ANSWER: D 5. As an aid to the establishment of personnel requirements, the director of a hospital wishes to estimate the mean number of people who are admitted to the emergency room during a 24-hour period. The director randomly selects 64 different 24-hour periods and determines the number of admissions for each. For this sample, X = 19.8 and s2 = 25. Which of the following assumptions is necessary in order for a confidence interval to be valid? a) The population sampled from has an approximate normal distribution. b) The population sampled from has an approximate t distribution. c) The mean of the sample equals the mean of the population. d) None of these assumptions are necessary. ANSWER: D ! 342# Student’s t Distribution Table probability! t0! selected probability or area to the right of a tabular value (α) Df 0.10 0.05 0.025 0.01 0.005 1 3.08 6.31 12.71 31.82 63.66 2 1.89 2.92 4.30 6.96 9.92 3 1.64 2.35 3.18 4.54 5.84 4 1.53 2.13 2.78 3.75 4.60 5 1.48 2.02 2.57 3.36 4.03 6 1.44 1.94 2.45 3.14 3.71 7 1.41 1.89 2.36 3.00 3.50 8 1.40 1.86 2.31 2.90 3.36 9 1.38 1.83 2.26 2.82 3.25 10 1.37 1.81 2.23 2.76 3.17 11 1.36 1.80 2.20 2.72 3.11 12 1.36 1.78 2.18 2.68 3.05 13 1.35 1.77 2.16 2.65 3.01 14 1.35 1.76 2.14 2.62 2.98 15 1.34 1.75 2.13 2.60 2.95 16 1.34 1.75 2.12 2.58 2.92 17 1.33 1.74 2.11 2.57 2.90 18 1.33 1.73 2.10 2.55 2.88 19 1.33 1.73 2.09 2.54 2.86 20 1.33 1.72 2.09 2.53 2.85 21 1.32 1.72 2.08 2.52 2.83 22 1.32 1.72 2.07 2.51 2.82 23 1.32 1.71 2.07 2.50 2.81 24 1.32 1.71 2.06 2.49 2.80 25 1.32 1.71 2.06 2.49 2.79 26 1.31 1.71 2.06 2.48 2.78 27 1.31 1.70 2.05 2.47 2.77 28 1.31 1.70 2.05 2.47 2.76 29 1.31 1.70 2.05 2.46 2.76 30 1.31 1.70 2.04 2.46 2.75 ∞ 1.28 1.65 1.96 2.33 2.58 ! 343# CHAPTER 4: ESTIMATION OF PARAMETERS Lesson 5: Point and Confidence Interval Estimation of the Population Proportion TIM E FRAM E: 60 minutes OVERVIEW OF LESSON: In this lesson, learners learn how to construct interval estimates of the population proportion. They are also taught how to determine minimum sample size requirements for estimating the population proportion. LEARNING COM PETENCIES At the end of the lesson, the learner should be able to: Identify a point estimator of the population proportion Discuss the properties of the sample proportion as point estimator Compute for a point estimate of the population proportion Identify an appropriate confidence interval estimator of the population proportion using large sample based on the Central Limit Theorem Construct a (1-α)100% confidence interval estimator of the population proportion using a large sample Interpret point and confidence interval estimates of the population proportion PRE-REQ UISITE KNOW LEDGE AND SKILLS: Knowledge in point estimation as well as the sampling distribution of the population proportion LESSON OUTLINE A. Point estimator of the population proportion B. Properties of the sample proportion as point estimator of population proportion C. Construction and interpretation of a (1-α)100% confidence interval estimator of the population proportion using a large sample D. Illustration on the computation of a point and interval estimates of the population proportion and its interpretation. DEVELO PM ENT O F THE LESSO N First, review the lesson on proportion as a parameter. The ratio of the number of units possessing a characteristic to the total number of units in the population is a population proportion. Examples are the proportion of learners who passed the last examination, the proportion of Filipinos who live in poverty, the proportion of housing units in the Philippines with roof made of strong materials, and proportion of Piatos chips that are not broken. As a motivational activity, present the partial list of variables below in a data set gathered from learners enrolled in Grade 11 Statistics and Probability this school year. ! 344! VARIABLE DEFINITIO N/DESCRIPTIO N usual number of hours spends studying outside school hours during HRS_STUD weekdays SEX biological sex HEIGHT height measured in cm WEIGHT weight measured in kg WAIST waist girth measured in cm HIP hip girth measured in cm MGINCOME monthly family gross income MONTH_ALLO W monthly allowance WEEK_FOOD weekly expenditures on food outside home AGE_FATHER father's age AGE_MOTHER mother's age NUM_SIBLINGS number of siblings mode of transportation in going to school (private, service, public, not MODE_TRANS applicable (i.e. walking)) GENRE preferred genre of music (e.g. rock, acoustic, mellow, etc) Ask learners to identify proportions that could be defined from these variables. Note that some variables are straightforward while others need to be redefined further. The following are some examples identified: 1. Proportion of learners who are enrolled in Grade 11 Statistics and Probability this school year and who spend at least 2 hours studying outside school hours during weekdays 2. Proportion of male learners who are enrolled in Grade 11 Statistics and Probability this school year 3. Proportion of learners who are enrolled in Grade 11 Statistics and Probability this school year and at least 160 cm tall 4. Proportion of learners who are enrolled in Grade 11 Statistics and Probability this school year and at most 100 kg 5. Proportion of learners who are enrolled in Grade 11 Statistics and Probability this school year and with waist girth of at most 50 cm 6. Proportion of learners who are enrolled in Grade 11 Statistics and Probability this school year and with hip girth of at least 60 cm 7. Proportion of learners who are enrolled in Grade 11 Statistics and Probability this school year and who belong to a family whose gross monthly income is at most Php 15,000 8. Proportion of learners who are enrolled in Grade 11 Statistics and Probability this school year and with a monthly allowance equal to Php 4,000 9. Proportion of learners who are enrolled in Grade 11 Statistics and Probability this school year and with a weekly food expenditure outside home equal to P500 10. Proportion of learners who are enrolled in Grade 11 Statistics and Probability this school year and with a father whose age is at least 60 years ! 345! 11. Proportion of learners who are enrolled in Grade 11 Statistics and Probability this school year and with a mother whose age is at least 60 years 12. Proportion of learners who are enrolled in Grade 11 Statistics and Probability this school year and with at least 3 siblings 13. Proportion of learners who are enrolled in Grade 11 Statistics and Probability this school year and who go to school using private vehicles 14. Proportion of learners who are enrolled in Grade 11 Statistics and Probability this school year and who have rock as preferred music genre Choose one of these variables and ask learners what they are going to do if they were asked to estimate one of the variables. In the discussion, take note of the following: If, in general, you were to estimate the proportion of learners enrolled in Grade 11 Statistics and Probability this school year and with rock as as preferred music genre from a simple random sample of size n, an estimator for this population proportion is the ratio of the number of sampled Grade 11 Statistics and Probability learners who preferred rock over the sample size n. This is referred to as the sam ple proportion which is defined as the ratio of the number of sample units possessing the characteristic of interest to n. Mathematically, the point estimator of the population proportion, based on a simple random sample of size n, is expressed as where a is the number of sample units having the characteristic of interest. The sample proportion as estimator of the population proportion is unbiased with standard error equal to. This was discussed in the previous chapter on sampling. Also, with sufficient sample size n, (say at least 100), the sampling distribution of the sample proportion could be approximated by the standard normal distribution based on the Central Limit Theorem (CLT). Using the above mentioned concepts, a (1-α)% confidence interval (CI) of the population proportion (P) is constructed as or where is the sample proportion computed from a simple random sample of size n. The lower limit of the interval is while the upper limit is Computing the width of the confidence interval estimate, we have: ! 346! where the maximum allowable deviation is equal to. Illustration of the Computation: Suppose in a simple random sample of 50 Grade 11 Statistics and Probability learners, 30 of them said they preferred the music genre rock. The sample proportion is computed as with standard error equal to. Using the same sample, the 95% confidence interval (CI) of the population proportion of Grade 11 Statistics and Probability learners with rock as preferred genre of music is constructed as Hence, we estimate that 6 out of every 10 Grade 11 Statistics and Probability learners would say that rock is their preferred music genre. Further, we could say that we are 95% confident that the true proportion of Grade 11 Statistics and Probability learners would say that rock is their preferred music genre is between 0.59 and 0.61 or out of every 100 Grade 11 Statistics and Probability learners, we are 95% confident that there will be between 59 to 61 of them who would say that rock is their preferred music genre. ENRICHM ENT Most national opinion polls sample at least 1,200 respondents (although there are 100= million Filipinos as of mid-2014) and typically ask about the approval ratings of government officials, especially the President. How is this possible? In this lesson, we saw that the likely size of the chance error in sample percentages depends on the size of the sample, and, hardly at all, on the population size. The huge number of possible Filipinos that could be sampled does not affect the standard error (of the proportion) but only makes it difficult operationally to draw the random sample. Is 1,200 a big enough sample? Most critics of sample surveys would find it illogical why 1,200 respondents would represent millions. It turns out that 1,200 would indeed be a reasonable sample size for estimating approval ratings. If the true approval ratings of the President were 50%, then with a sample size of 1200, the standard error for the proportion is about 6 percentage points, and we could have a margin of error of 3 percentage points at 95% confidence. This shows why we ought to be able to accurately assess the winner of a presidential race even before election day itself, unless the proportion of votes for two candidates in an election are very close. Suppose we will be required to construct a 95% confidence interval for the proportion so that it would have a width of 5%, what sample size would be required? In the previous ! 347! chapter, we noted that a conservative estimate of the standard error of the sample proportion is 0.5 (0.5) n since the maximum value p(1-p) can take is when p = ½. Since we want the length of the confidence interval to be 5%, we would thus like to have the 95% confidence interval for the proportion take the form Sample Proportion ± 2.5% This means that we want 1.96 (Estimate of Standard Error) = 0.025 or equivalently 0.5 (0.5) 1.96 = 0.025 n Solving for this algebraic equation yields: 2 & , 0.5 )# n = $1.96* '! = 1537 % + 0.025 (" Ask learners whether they should account for sampling without replacement? Theoretically, the required sample size of 1,537 has to be adjusted by incorporating the population size. For a population of 100,000,000, we would have to obtain a sample of size: = 1537 which is the same as that obtained for sampling with replacement. This numerical result explains why nationwide polls typically use only 1,200 to 1,600 respondents. Emphasize to learners that a large population size has virtually no effect on the choice of the sample size when estimating a population proportion. REFERENCES Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua, Welfredo Patungan, Nelia Marquez). Philippines: Rex Bookstore. De Veau, R. D., Velleman, P. F., and Bock, D. E. (2006). Intro Stats. Pearson Ed. Inc. Freedman, D., Pisani, R, and Purves, R. (2007). Statistics, Fourth Edition. New York: W. W. Norton & Company Workbooks in Statistics 1: 11th Edition. Institute of Statistics, UP Los Baños, College Laguna 4031 ! 348! ASSESSM ENT The following are some problems that could serve as computational exercises on point and confidence interval estimation of the population proportion. 1. Some government officials are proposing for the country’s academic calendar to be moved from June-March to August-May. This proposal, according to the officials, if approved, can further improve education by synchronizing our calendar with that of the other countries. Government officials would push through with the proposal if at least 85% of the student population favor it. To know the opinion of learners regarding the said proposal, a simple random sample of learners was obtained and they were asked if they were in favor of the said proposal. Of the 1,000 surveyed learners, 892 said they were in favor of the approval of the said proposal. a. Find a point estimate of the true proportion of learners who are in favor of the approval of the proposal and find its standard error. Answer: With a = 892, then and its standard error equal to ) b. Construct a 99% confidence interval for the true proportion of learners who are in favor of the approval of the proposal. Interpret the confidence interval obtained. Answer: The 99% confidence interval estimate of the true proportion of learners who are in favor of the approval of the proposal is expressed as We then say that we are 99% confident that the true proportion of learners who are in favor of the approval of the proposal is between 0.87 and 0.92 2. Because of several political problems the country is experiencing right now, a lawyer became interested in knowing the opinion of the residents of a certain municipality about plunder issues. A lawyer came up with a proposed program regarding the resolution of plunder cases if majority of the population were not satisfied with the result of plunder cases filed in the country. She randomly selected 500 individuals from the complete list of registered voters of the 2013 National Election in their municipality. Each respondent was asked if he/she were satisfied with the outcome of plunder cases filed in the country. Of the surveyed citizens, 180 said they were satisfied with the result of plunder cases filed in the country. a. Find a point estimate of the true proportion of citizens who are not satisfied with the result of plunder cases filed in the country. Calculate the standard error of the estimate. ! 349! Answer: With a = 320, then and its standard error equal to b. Construct a 95% confidence interval for the true proportion of citizens who are not satisfied with the result of plunder cases filed in the country. Interpret the confidence interval obtained. Answer: The 95% confidence interval estimate of the true proportion of citizens who are not satisfied with the result of plunder cases filed in the country is expressed as We then say that we are 95% confident that the true proportion of citizens who are not satisfied with the result of plunder cases filed in the country is between 0.60 and 0.68 ENRICHM ENT For problems described above, discuss how they could use the confidence interval estimates to obtain the objective of the problem. TEACHER TIPS Use the same numerical example for future lessons. ! 350! CHAPTER 4: ESTIMATION OF PARAMETERS Lesson 6: More on Point Estimates and Confidence Intervals TIM E FRAM E: 60 minutes OVERVIEW OF LESSON: In this lesson, learners undertake an activity to deepen their understanding of point and interval estimation. This lesson is largely taken from a STatistics Education Web (STEW) lesson plan called “Did I Trap the Median?” Learners recall the information provided at the beginning of Chapter 1, particularly the rating score they gave (from 1 to 10) about their state of happiness. Learners collect a random sample of 10 of their classmates’ records on their respective states of happiness in order to obtain point and interval estimates for the median level of state of happiness in the entire class. LEARNING COM PETENCIES At the end of the lesson, the learner should be able to: Calculate a point estimator of the population median number of text messages sent in a day in class Construct a (1-α)100% confidence interval estimator of the population proportion using large sample Interpret point and confidence interval estimates of the population proportion M ATERIALS REQ UIRED : Ruler and Pencil, Calculators, Activity sheet LESSON OUTLINE A. Introduction B. Data Collection C. Data Analysis D. Enrichment DEVELOPM ENT OF THE LESSON A. Introduction This lesson involves an activity where learners collect sample data from their class to estimate the median state of happiness among the population of Grade 11 learners in the entire class. Each student obtains a point estimate and constructs an interval estimate for the median state of happiness in the entire class by using a simple random sample of 10 learners in the class. Numeric summaries and graphs are used to obtain point and interval estimates, respectively, for the median state of happiness in the entire class. The teacher examines the database on the state of happiness of all learners in class that was collected in the first lesson of the first chapter in order to obtain the population median state of happiness of the entire class. The confidence level for the interval estimates computed by ! 351$ the learners is estimated by obtaining the proportion of learners’ sample interval estimates that trap the population median state of happiness of the entire class. Ask learners to hypothesize the answers to some of these questions: 1. What are the advantages of collecting a sample of only 10 records of the state of happiness and not of the entire class to obtain the median state of happiness in the entire class? 2. What are the advantages and disadvantages of using the sample median to estimate the population median? 3. Is there any advantage to constructing an interval estimate as opposed to a point estimate (the sample median) for the population median? 4. Is it possible to ascribe a reliability value to the interval estimate (ascribe a probability that the interval contains the median)? 5. What are the factors that may affect the length and the reliability of an interval estimate? B. Data Collection Have the Learnerslearners recall that they reported their state of happiness to the teacher at the beginning of Lesson 1-01. This was put in a database. Now, let them get similar records of the state of happiness of 10 randomly selected learners in class. To ensure that each student will use a random sample of 10 records from the class database: 1. Ask learners to generate 10 random numbers from one to the total number of learners in class using a table of random digits. They may use the Table of Random Digits from Lesson 3-06. 2. Have learners write down the generated numbers from least to greatest on the data table. Explain that each number corresponds to a classmate. Next, list all the records of the state of happiness of each student (from the database collected in Lesson 1- 01), together with the respective student numbers. Tell each student to write down only the records for each randomly generated number that corresponds to a classmate’s state of happiness. A sample student data set is shown in Table 4-06.1 below. A blank data table is provided in the Activity Sheet. Table 4-06.1. Exam ple Student Data Sam ple Sample Student State of Happiness 1 7 2 7 3 8.5 4 4 5 6.5 6 7 7 7.5 8 5.5 9 7.5 10 3.5 ! 352$ An example class data set is shown in Table 4-06.2 below. Table 4-06.2. Exam ple Class Data Student Number State of Happiness 1 7 2 7 3 8.5 4 4 5 6.5 6 7 7 7.5 8 5.5 9 7.5 10 9.5 11 6.5 12 6 13 8 14 5 15 4 16 8.5 17 5.5 18 8 19 5 20 6 1. Computing and Displaying Numerical Summaries Different statistical tools are used for estimating numerical values in a population. For example, when drawing a random sample one can calculate the sample mean (or average), or the median (50th percentile) to obtain an estimate of a measure of the center of the distribution of values pertaining to the entire population. Also, the range, the inter-quartile range (difference from the 25th percentile or first quartile, to the 75th percentile or third quartile) as well as the standard deviation from sample data can be computed to estimate the spread of the values in a population (measures of variation). 2. Visualizing the Distribution A box and whiskers plot is a graphical summary proposed by John Tukey for data that uses the 5-number summary (minimum, 25th percentile, median, 75th percentile, and maximum) to graphically display the distribution of a data set while highlighting measures of the center (median), other positions (25thand 75th percentiles), and measures of variation (range, inter- quartile range). This plot also allows us to identify outliers, numbers that are very different from the rest of the data. Some features of the box plot of a sample data set will be used to construct interval estimates for the median of the population. ! 353$ To construct a box and whiskers plot, learners should compute the 5-number summary of their sample data. Ask learners to order the values in their sample, from smallest to largest. Now, learners can readily identify the minimum and maximum values in their sample data, and proceed to compute the quartiles. The median or second quartile (Q2) is found by locating the midpoint of the entire ordered sample data set. Since we have an even number of data points in the example used here, we have two middle values so we find the median by averaging these two values. The 25th percentile or first quartile (Q1) is found by calculating the median of the lower half of the sample data (first five numbers). For the sample data Q1 is the sole value in the middle position (third data point) of the first five numbers. The 75th percentile or third quartile (Q3) is similarly found by calculating the median of the upper half of the sample data (last five numbers). In this case, the third quartile is in the eighth position. The steps to draw the box plot using the sample data to construct an interval estimate for a population median can be better described by means of an example. This is done in the succeeding paragraphs using the sample data in Table 4-06.1. In addition, the teacher should construct a box plot for the data of the entire class for a later discussion. The median state of happiness in the entire class in the example in this lesson plan (Table 4- 06.2) is 6.75, while for the student data sample (Table 4-06.1), it is 7.0. Note that the sample median of 7 can be used as a point estimate of the population median of 6.75. Point estimates are obtained with the hope that they are close to the population value that they are meant to estimate. Confidence intervals have the extra advantage of providing a sense of uncertainty in the estimation process. With confidence intervals, we can be quite confident of the accuracy in estimation; i.e., that the exact population value that is being estimated (the population median in this case) is captured or “trapped” by an interval constructed using sample data. To place an interval estimate for the population median using the features of a box plot, start by having each student obtain the 5-number summary of his/her sample data as described in section A above. Notice that the smallest level of happiness for the sample data in Table 4-06. 1 is 4.0, while the largest level of happiness is 9.5. The median value (Q2) of 7.0 indicates that about half of the learners in the data set have stated levels of happiness less than or equal to 7, and that about half of the learners have levels of happiness greater than or equal to 7. The first quartile of the student data sample is 6.5 and the third quartile is 7.5 cm. These values for Q1 and Q3 indicate that about 25% of the learners in this sample have levels of happiness less than or equal to 6.5, and about 25% of learners have levels of happiness greater than or equal to 7.5. These values also indicate about 50% or half of the learners in the sample have levels of happiness between 6.5 and 7.5. ! 354$ To construct a box plot follow these steps: 1. Mark the values of Q1 = 6.5, Q2 = 7.0 cm, and Q3 = 7.5 on a horizontal scale that spans across all the values in the sample data. Then, construct a box above the scaled line using these values as indicated in Figure 4-06.1. _____________________________________________________________________ ! ! Q1!=!6.5! Q2!!!=!7.0! Q3!!!=!7.5! ! ! Figure !4-06. 1. Box plot: ! Step 1 2. To find if there are any outliers or extreme values in a data set, compute the inter-quartile range (IQR), which is the difference between the third and first quartiles. Any data point beyond what are called the lower outlier bound, Q1 – 1.5(IQR), or the upper outlier bound, Q3 + 1.5(IQR), is considered to be an outlier. In this case, IQR = 7.5 – 6.5 = 1; therefore any level of happiness smaller than Q1 − 1.5(IQR) = 6.5 − (1.5)(1) = 5, or larger than Q3 + 1.5(IQR) = 7.5 + 1.5(1) = 9.0 is an outlier. There are two outliers in this data set, 4.0 and 9.5. These outliers are indicated by drawing stars above the scaled line at about half the height of the box as shown in Figure 4-06.2 below. * * !! _____________________________________________________________________ !! 4.0! Q1!=!6.5! Q2!!!=!7.0! Q3!=!7.5! 9.5! ! Figure ! 4-06. 2 Box plot: Step 2 3. Finally, find the minimum value that is not an outlier and the maximum value that is not an outlier. Here, the minimum value that is not an outlier is 5.5, and the maximum value that is not an outlier is 8.5. Then, add what are called the whiskers to the box by drawing horizontal lines at about half the height of the box, first from Q1 down to the minimum value that is not at outlier, and second from Q3 up to the maximum value that is not an outlier as indicated in Figure 4-06.3 below. Only when there are no outliers would the whiskers go as far as the minimum and maximum values in the data set. To avoid drawing the whiskers incorrectly, make sure to draw them after the outliers (if any) have been added to the graph. ! 355$ * ! * !! ! Q2! Q2! Q3! !! _____________________________________________________________________ ! ! 4! 4.5! Q1!=!6.5! Q2!!!=!7.0! Q3!=!7.5! 8.5! 9.5! ! ! Figure ! 4-06. 3. Box plot: Step 3 ! Notice that the box and whiskers plot for the student sample data is quite symmetric. The distributions of random sample data tend to reflect the distribution of the population. At this point, you can write on the board the box plot you obtained for the data of the entire class, and ask learners if their sample data box plots resemble that of the population. There may be a small proportion of learners whose box plot may be quite different from the box plot of the data of the entire class. This is due to random variation in the samples. However, most of the learners should have a box plot that resembles that of the population. 3. Constructing an Interval Estimate Ask learners to discuss how much their sample median differs from the population median. In the above example, the sample median of 7.0 is off by 0.25 from the population median of 6.75. Learners should note the wide variability in estimation error when using their sample median as an estimate of the population median (compared to the variability in estimation error when estimating the population mean by the sample mean). Now ask learners if they would consider it reasonable to provide an interval estimate that has a high probability of capturing or trapping the exact median of the population. If they could provide an interval that captures or traps the population median by using their own sample data, what would this interval be? One suggestion might be to use the endpoints of the whiskers of their box plots as an interval that has a high probability of trapping the population median. However, learners may also realize that this interval is too wide to help hone in on the value of the population median (that is, that this interval has a large margin of error). Then, ask learners whether the shorter interval from Q1 to Q3 (endpoints of the box instead of endpoints of the whiskers) would be more reasonable to estimate the location of the population median. Now, you can ask learners how confident they are that each time they obtain a random sample of 10 learners and obtain the first and third quartiles of this sample, the interval (Q1, Q3) captures or traps the population median. It would not be surprising to have learners whose intervals (Q1, Q3) did not capture the population median. If so, this would prevent learners from saying that that they are 100% confident that each time they take a sample of 10 learners and obtain the first and ! 356$ third quartiles of their sample, the interval (Q1, Q3) will trap the population median. So what is the level of confidence that learners have for capturing the population median with the interval (Q1, Q3) from a random sample of 10 learners? To answer this question, learners can obtain the reliability or level of confidence, of using (Q1, Q3) from their sample of 10 learners as an interval estimate for the population median. Simply obtain the proportion of learners in class whose interval estimate trapped the population median (class median). For example, if 15 of the 20 learners (75%) in the class obtained an interval (Q1, Q3) that trapped the class median of 6.75, then this means that each time someone takes a sample of 10 learners from the class, we expect 75% of the intervals (Q1, Q3) will trap the population median level of happiness. C. Data Analysis Learners should now have an idea of the advantages of using interval estimates, which, once their level of reliability is known, are called confidence intervals. However, learners may agree that a sample interval (Q1, Q3) is still too wide (that is, the interval has a large margin of error) as a predictor of the location of the median. Ask learners questions pertaining to possible refinements for these confidence intervals such as the following questions below, which will be explored on the second day of this lesson plan: 1. What do you think would happen to the sample interval (Q1, Q3) if the sample size increased from 10 to 15 (or 20)? 2. What do you think would happen to the sample interval (Q1, Q3) if the population distribution is not a symmetric distribution? 3. Do you have any idea how to construct interval estimates that are shorter than the interval (Q1, Q3)? Would a shorter interval necessarily change the level of reliability? D. Enrichment Extend this lesson and activity by increasing the sample size to 15 or 20, and let learners see how increasing the sample size produces tighter intervals (Q1, Q3). Also, they should explore how symmetric distributions produce interval estimates with smaller reliability (lower level of confidence) than non-symmetric distributions. REFERENCES Many of the materials in this lesson were adapted from: Parks, S. Did I Trap the MEdian? (California State University Sacramento, Dept. of Mathematics and Statistics), Mathew Steinwachs (University of California Davis, iAMSTEM Hub), Rafael Diaz (California State University Sacramento, Dept. of Mathematics and Statistics), Marco Molinaro (University of California Davis). STatistics Education Web (STEW). Retrieved from https://www.amstat.org/education/stew/pdfs/DidITrapTheMedian.docx Albert, J. R. G. (2008). Basic Statistics for the Tertiary Level (ed. Roberto Padua, Welfredo Patungan, Nelia Marquez). Philippines: Rex Bookstore. ! 357$ De Veau, R. D., Velleman, P. F., and Bock, D. E. (2006). Intro Stats. Pearson Ed. Inc. Freedman, D., Pisani, R, and Purves, R. (2007). Statistics, Fourth Edition. New York: W. W. Norton & Company Workbooks in Statistics 1: 11th Edition. Institute of Statistics, UP Los Baños, College Laguna 4031 ASSESSM ENT A class of 25 learners is selected and their exam scores are recorded. A random sample of 10 learners is taken from the classlearners. The data is shown in the tables below. Class Data Table Sample Data Table Student Exam Scores Student Exam Scores 1 90 1 96 2 101 2 101 3 106 3 106 4 108 4 108 5 125 5 125 6 130 6 130 7 115 7 115 8 91 8 93 9 112 9 112 10 107 10 107 11 76 12 103 13 69 14 94 15 106 16 78 17 121 18 80 19 85 20 80 21 99 22 76 23 92 24 89 25 121 Using the above tables, answer the following questions: a) Calculate the 5-number summary for the sam ple data table. Answer: 5-number summary: minimum = 93, first quartile, Q1 = 101, median = 107.5, third quartile, Q3= 115, maximum = 130. ! 358$ b) Determine the lower and upper outlier bounds. Are there any outliers? Answer: Q1 = 101 c) What are the minimum and maximum values that are not outliers? Note: If there are no outliers below (above) the lower (upper) outlier bound, then the minimum (maximum) value that is not an outlier matches the minimum (maximum) value of the data set. Answer: Q3 = 115, IQR = Q3 – Q1 = 115 – 101 = 14; Lower outlier bound: Q1 – 1.5(IQR) = 101 – 1.5(14) = 80; Upper outlier bound: Q3 + 1.5(IQR) = 115 + 1.5(14) = 136; No outliers (no points located beyond the outlier bounds). d) Construct a box plot for foot sizes for the sam ple data table. Answer: Minimum that is not an outlier = Minimum of the sample data (no outliers below the lower outlier bound) = 93 Maximum that is not an outlier = Maximum of the sample data (no outliers above the upper outlier bound) = 130 e) Is the distribution of the data set symmetric or asymmetric? learners Answer: See box plot below: Sample Boxplot 100 110 120 130 IQ Score The box plot indicates that the distribution of the sample data is asymmetric due to a longer upper whisker, and a larger spread for the values between the third quartile and the median. (In the second day of this lesson plan, learners will learn that when a box plot shows an asymmetry in this direction the distribution of the data is said to be skewed right or positively skewed). f) Compute the population median (median of the entire class of 25 learners). Answer: The median of the entire class is 99. ! 359$ g) Does the interval (Q1, Q3) trap the median of the class data? Answer: learnersNo, the sample box plot does not trap the class median of 99. The class median does not fall between 101 (first quartile) and 115 (third quartile). ! Activity Sheet 4-06 1. Describe the data collection process that will be used. 2. Recall your answer to Activity Sheet 1-01a : On a scale from 1 (very unhappy) to 10 (happiest), how do you feel today? ________ 3. Record the state of happiness of 10 randomly chosen learners in your class. Name State of Happiness 4. Arrange the values of the state of happiness from smallest to largest. 5. Complete the table below showing numeric summaries for the state of happiness for your ten randomly chosen classmates. Mean Minimum First Median Third Maximum Quartile Quartile (Q1) (Q3) 6. Determine what values would be considered to be outliers for your 10 randomly chosen classmates. Are there any outliers? 7. Construct a horizontal box plot for your 10 randomly chosen classmates. In the event of having outliers for your data set, do not use outliers for the minimum or maximum values. For the minimum and maximum values, plot the minimum value that is not an outlier and the maximum value that is not an outlier. ! 360$ state of happiness 8. What is the class median state of happiness? Does your Q1 to Q3 interval estimate trap the median for the entire class? 9. Based on the median of the entire class given by your teacher and the median of your particular 10 randomly chosen classmates, calculate what proportion (percent) of box plots trap the median for the entire class. This is the reliability (confidence level) of using interval estimates from Q1 to Q3. 10. Think about what would happen if the sample size is increased. Would the proportion of box plots that would trap the median increase or decrease? Why? Sim ulation W orksheet 1. Hypothesize what the answers to the following questions might be and state why. a) What happens to the width of the confidence intervals when the sample size increases? Do the bounds of the intervals vary more? Why? b) What happens to the level of confidence (reliability or percentage of sample intervals that trap the population median) of the interval estimate when the sample size increases? Why? c) What happens to the width of the interval estimate when the population distribution shape changes? Do the bounds of the intervals vary more? Why? d) What happens to the level of confidence (reliability or percentage of sample intervals that trap the population median) when the population distribution shape changes? Why? ! ! 361$

Use Quizgecko on...
Browser
Browser