Measures of Dispersion for Grouped Data Chapter 7 PDF

Summary

This chapter explores measures of dispersion for grouped data, including the construction of histograms and frequency polygons. It covers concepts like class intervals, lower/upper limits, midpoints, boundaries, and cumulative frequencies, illustrated with examples. The chapter also touches upon statistical analysis of Covid-19 cases in Malaysia.

Full Transcript

CHAPTER Measures of Dispersion 7 for Grouped Data What will you learn? Dispersion Measures of Dispersion Why study this chapter? Statistical analysis such as measure...

CHAPTER Measures of Dispersion 7 for Grouped Data What will you learn? Dispersion Measures of Dispersion Why study this chapter? Statistical analysis such as measure of dispersion is widely applied in various fields, including medicine, agriculture, finance, social science and many more. The career fields that apply statistical analysis include biometrics, actuarial science and financial analysis that use big data to obtain statistical values, and hence represent the data in statistical graphs. Do you know? William Playfair (1759-1823) was a Scottish economist who used various common statistical graphs in his book, The Commercial and Political Atlas, published in 1786. For more information: bit.do/DoYouKnowChap7 CHAPTER 7 WORD BANK grouped data data terkumpul histogram histogram cumulative histogram histogram longgokan cumulative frequency kekerapan longgokan quartile kuartil ogive ogif statistical investigation penyiasatan statistik percentile persentil frequency polygon poligon kekerapan 196 KPM Number of Daily Cases of Covid-19 in Malaysia Phase 1 Phase 2 Phase 3 Phase 4 260 MCO MCO MCO MCO 18 Mar – 31 Mar 1 Apr – 14 Apr 15 Apr – 28 Apr 29 Apr – 12 May 240 220 200 Number of cases 180 160 140 120 100 80 60 40 20 0 Date (2020) Source: Ministry of Health Malaysia, July 2020 CHAPTER 7 The outbreak of the Covid-19 pandemic in early 2020 has forced Malaysian to adjust to a new normal. The swift and efficient action taken by the authorities in tackling the pandemic has helped Malaysia to control the increasing number of patients infected by the virus. Malaysia is successful in flattening the curve of the number of daily infected cases by issuing the Movement Control Order (MCO). In your opinion, how will the shape of the graph be if MCO has not been implemented? 197 KPM 7.1 Dispersion How to construct histogram and frequency polygon? In Form 4, you have learnt about the ways to interpret the Construct histogram and dispersion of ungrouped data based on the stem-and-leaf plots frequency polygon for a and dot plots. We can observe the dispersion for a grouped data set of grouped data by constructing histogram and frequency polygon. Prior to that, you need to know the class interval, lower limit, upper limit, midpoint, lower boundary, upper boundary and cumulative Info Bulletin frequency that can be obtained from a frequency table. Class interval is the range of a division of data. MIND MOBILISATION 1 Group Aim: To recognise the lower limit, upper limit, midpoint, lower boundary and upper boundary of a set of data. Steps: 8 10 4 7 1 The data shows the amount of daily pocket money in 5 2 8 11 4 RM, received by 20 pupils on a particular day. 5 7 15 3 4 1. Identify the smallest data and the largest data. 14 12 7 11 9 2. By referring to the data, group the data into 3, 4, 5 or 6 parts in sequence. For example, a group of three uniform parts means 1 – 5, 6 – 10 and 11 – 15. 3. By using the tally method, choose and insert the data according to the parts of the group. 4. Based on each part of the data, determine (a) the lower limit (the smallest value in a part of the data) and the upper limit (the largest value in a part of the data), (b) the midpoint of each part of the data, (c) (i) the middle value between the lower limit of a part and the upper limit of the part before it, (ii) the middle value between the upper limit of a part and the lower limit of the part after it. 5. Complete the frequency table with the results of steps 3, 4(a), 4(b), 4(c)(i) and CHAPTER 7 4(c)(ii) as shown below. Pocket money Step 4(a) Step 4(b) Step 4(c) Frequency (RM) Lower limit Upper limit Midpoint (i) (ii) Discussion: Discuss and write down the definition to determine the lower limit, upper limit, midpoint, lower boundary and upper boundary of a set of data. 198 KPM CHAPTER 7 Measures of Dispersion for Grouped Data The results of Mind Mobilisation 1 show that Size of class interval Lower boundary = ! Largest data value – Smallest data value Number of classes " = ! Upper limit of the class before it + Lower limit of the class " 2 Lower limit is the smallest value and upper Upper boundary limit is the largest value of a class. ! " Upper limit of Lower limit of + the class after it Midpoint = ! Lower limit 2+ Upper limit " = the class 2 Example 1 The data on the right shows the heights, to the nearest cm, of a 153 158 168 161 163 165 157 162 i – Technology group of Form 5 pupils. 145 150 158 156 Scan the QR code or (a) Determine the class intervals 166 163 152 155 visit bit.do/WSChap7i for the data, if the number 158 173 148 164 to explore ways to of classes required is 6. organise raw data in frequency table by using (b) Construct a frequency table based on the information in (a). spreadsheet. Hence, complete the frequency table with the lower limit, upper limit, midpoint, lower boundary and upper boundary. Solution: (a) The largest data is 173 and the smallest data is 145. If the number of classes is 6, then the size of each class interval 173 – 145 Size of class interval = 6 = 4.7 # 5 = !Largest dataNumber value – Smallest data value of classes " Therefore, the class intervals are 145 – 149, 150 – 154, 155 – 159, 160 – 164, 165 – 169 and 170 – 174. (b) Height Lower Upper Lower Upper Frequency Midpoint (cm) limit limit boundary boundary 145 – 149 2 145 149 147 144.5 149.5 CHAPTER 7 150 – 154 3 150 154 152 149.5 154.5 155 – 159 6 155 159 157 154.5 159.5 160 – 164 5 160 164 162 159.5 164.5 165 – 169 3 165 169 167 164.5 169.5 170 – 174 1 170 174 172 169.5 174.5 For a grouped data in uniform class intervals, the size of class interval can be calculated using two methods. 199 KPM Method 1: The difference between Method 2: The difference the lower limits or the upper limits between the upper To determine the size of of two consecutive classes. boundary and the lower class interval, avoid using boundary of a class interval. lower and upper limits of Size of class interval of the first a class. For example, for two classes Lower limit of Size of class interval of class interval 145 –149, = 150 – 145 class 145 – 149 the first class the size of class interval =5 = 149.5 – 144.5 = 149 – 145 Lower limit of class 150 – 154 or =5 Lower boundary of = 4 (Not true) Upper limit of class 145 – 149 = 154 – 149 class 145 – 149 =5 Upper boundary of pper limit of class 150 – 154 class 145 – 149 Info Bulletin The cumulative frequency of a data can also be obtained from In Example 1, class a frequency table. The cumulative frequency of a class interval 150 – 154 is actually is the sum of the frequency of the class and the total frequency inclusive of the values of the classes before it. This gives an ascending cumulative from 149.5 to 154.5 frequency. because the data is a continuous data. The lower boundary 149.5 and Example 2 the upper boundary 154.5 are used to separate the Construct a cumulative frequency table from the frequency table classes so that there are below. no gaps between 149 cm and 150 cm, also 154 cm Age 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59 and 155 cm. Frequency 4 5 8 7 3 Solution: Age Frequency Cumulative frequency Info Bulletin 10 – 19 4 4 Continuous data is + This value of a data measured on 20 – 29 5 9 17 means there a continuous scale. + are 17 people For example, the time 30 – 39 8 17 aged 39 years taken by pupils to buy + old and below food at the canteen, 40 – 49 7 24 and the pupils’ heights. + Discrete data is a data 50 – 59 3 27 involving counting. CHAPTER 7 For example, the Histogram number of pupils in Mathematics Club. Histogram is a graphical representation in which the data is grouped into ranges by using contiguous bars. The height of the bar in histogram represents the frequency of a class. Steps for constructing a histogram: Find the lower Choose an appropriate scale on the Draw bars that represent each boundary and upper vertical axis. Represent the frequencies class where the width is equal to boundary of each on the vertical axis and the class the size of the class and the height class interval. boundaries on the horizontal axis. is proportionate to the frequency. 200 KPM CHAPTER 7 Measures of Dispersion for Grouped Data Frequency polygon Info Bulletin A frequency polygon is a graph that displays a grouped data Histogram and frequency by using straight lines that connect midpoints of the classes polygon can only be constructed by using which lie at the upper end of each bar in a histogram. Steps for continuous data. constructing a frequency polygon: Mark the midpoints Mark the midpoints before the Draw straight lines of each class on top first class and after the last by connecting the of each bar. class with zero frequency. adjacent midpoints. Example 3 The frequency table below shows the speed of cars in km h–1, recorded by a speed trap camera along a highway in a certain duration. Represent the data with a histogram and frequency polygon by using a scale of 2 cm to 10 km h–1 on the horizontal axis and 2 cm to 10 cars on the vertical axis. Speed (km h–1) 70 – 79 80 – 89 90 – 99 100 – 109 110 – 119 120 – 129 Number of cars 5 10 20 30 25 10 Solution: Speed Number Lower Upper Midpoint (km h–1) of cars boundary boundary 70 – 79 5 74.5 69.5 79.5 80 – 89 10 84.5 79.5 89.5 90 – 99 20 94.5 89.5 99.5 By using the frequency polygon, explain the 100 – 109 30 104.5 99.5 109.5 speed of cars of more than 90 km h–1. 110 – 119 25 114.5 109.5 119.5 120 – 129 10 124.5 119.5 129.5 Histogram: Frequency polygon: CHAPTER 7 Speeds of Cars Speeds of Cars Midpoint 30 30 Number of cars Number of cars 20 20 10 10 0 0 69.5 79.5 89.5 99.5 109.5 119.5 129.5 69.5 79.5 89.5 99.5 109.5 119.5 129.5 Speed (km h–1) Speed (km h–1) 201 KPM The frequency polygon can also be constructed without constructing a histogram. Steps for constructing a frequency polygon from a frequency table: Add one class Find the Choose an appropriate scale Mark the Connect interval before the midpoint on the vertical axis. Represent midpoint each first class and after of each the frequencies on the vertical with the midpoint the last class with class axis and the midpoints on corresponding with a zero frequency. interval. the horizontal axis. frequency. straight line. Example 4 The frequency table below shows the time in seconds, recorded by 20 participants in a qualifying round of a swimming competition. Represent the data with a frequency polygon by using a scale of 2 cm to 5 seconds on the horizontal axis and 2 cm to 2 participants on the vertical axis. Time recorded (s) 50 – 54 55 – 59 60 – 64 65 – 69 70 – 74 Number of participants 2 3 6 5 4 Solution: Time Time Recorded of Participants Number of recorded Midpoint Number of participants participants 6 (s) 45 – 49 0 47 4 50 – 54 2 52 2 55 – 59 3 57 60 – 64 6 62 0 47 52 57 62 67 72 77 65 – 69 5 67 Time (s) 70 – 74 4 72 75 – 79 0 77 Add a class interval with zero frequency before the first class and after the last class Self Practice 7.1a 1. The data below shows the time taken by 50 pupils to go to school from their houses. CHAPTER 7 The time recorded is in the nearest minute. 6 15 32 16 18 31 38 20 17 32 18 8 25 35 13 24 14 8 8 25 16 25 30 10 18 14 14 10 25 30 23 30 12 18 6 23 1 15 30 12 40 15 5 14 22 49 12 19 33 25 Construct a frequency table such that there are 5 classes. Then, state the lower limit, upper limit, midpoint, lower boundary and upper boundary of each class interval. 202 KPM CHAPTER 7 Measures of Dispersion for Grouped Data 2. The frequency table below shows the masses in kg, of new-born babies in a hospital in a month. State the midpoint, lower limit, upper limit, lower boundary, upper boundary and cumulative frequency of the data. Mass (kg) 2.0 – 2.4 2.5 – 2.9 3.0 – 3.4 3.5 – 3.9 4.0 – 4.4 Number of babies 9 15 24 20 10 3. The frequency table below shows the number of hours of sleep per day of a group of workers in a factory. By using a scale of 2 cm to 1 hour on the horizontal axis and 2 cm to 20 workers on the vertical axis, construct a histogram and frequency polygon on the same graph to represent the data. Number of hours of 4.05 – 5.04 5.05 – 6.04 6.05 – 7.04 7.05 – 8.04 8.05 – 9.04 9.05 – 10.04 10.05 – 11.04 sleep per day Number of 2 4 22 64 90 14 2 workers 4. The frequency table below shows the height in m, of sugar cane plants or also known as taken from a plantation. Represent the data with a frequency polygon by using a scale of 2 cm to 1 m on the horizontal axis and 2 cm to 10 sugar cane plants on the vertical axis. Height (m) 1.0 – 1.9 2.0 – 2.9 3.0 – 3.9 4.0 – 4.9 5.0 – 5.9 6.0 – 6.9 Number of sugar cane plants 25 33 46 50 44 36 How to compare and interpret the dispersions based on histogram and frequency polygon? Compare and interpret Distribution shapes of data the dispersions of two or more sets of grouped When describing a grouped data, it is important to be able to data based on histogram recognise the shapes of the distribution. The distribution shapes and frequency polygon, hence make conclusion. can be identified through a histogram or frequency polygon. MIND MOBILISATION 2 Group Aim: To explore the possible shapes of a distribution. CHAPTER 7 Steps: 1. Divide the class into groups. 2. Open the worksheet by scanning the R code. Each group is given the worksheet. Scan the QR 3. In the group, classify the distribution shapes into two categories, code or visit bit.do/WSChap7ii symmetrical or skewed. to obtain the Discussion: worksheet. Can you differentiate between symmetrical and skewed shapes? 203 KPM The results of Mind Mobilisation 2 show that a distribution Info Bulletin is symmetric if the shape and size of the distribution are almost the same when divided into two parts, left and Other distribution shapes: right. The shape of distribution is skewed if one tail of the (i) U-shaped histogram is longer than the other tail. Frequency Symmetric Histogram Variable Frequency Frequency (ii) J-shaped Frequency 0 0 Variable Variable Variable Bell-shaped Uniform-shaped (iii) Reverse J-shaped Skewed Histogram Frequency Frequency Frequency Variable (iv) Bimodal 0 Frequency 0 Variable Variable Right-skewed Left-skewed Variable Example 5 The diagram below shows two histograms representing the time taken by 25 swimmers to complete two different events. 100 m Backstroke 100 m Freestyle Number of swimmers Number of swimmers 6 6 4 4 2 2 CHAPTER 7 0 0 69.5 74.5 79.5 84.5 89.5 94.5 99.5 104.5 69.5 74.5 79.5 84.5 89.5 94.5 99.5 104.5 Time (s) Time (s) (a) State the distribution shape of the histogram for the two events. Distributions are most (b) Which event has a wider dispersion of the time taken? often not perfectly shaped. Therefore, it is Give your reason. necessary to identify an (c) Between backstroke and freestyle, in which event did the overall pattern. swimmers perform better? 204 KPM CHAPTER 7 Measures of Dispersion for Grouped Data Solution: (a) The histogram for the 100 m backstroke shows a bell- shaped distribution and for 100 m freestyle shows a To determine distribution shape using hands: uniform distribution. (i) Skew to the right (b) The 100 m backstroke event has a wider dispersion Frequency because the difference of the time recorded is larger, that is 30 seconds (102 s – 72 s). (c) 100 m freestyle. This is because most of the swimmers Variable recorded a better time. (ii) Skew to the left Frequency Example 6 The frequency polygon below shows the selling prices of the Variable houses that were sold in two different areas in the last six months. Selling Prices of Houses in Area A and Area B 20 Area A Area B 18 16 14 House units 12 10 8 6 4 2 0 64 999.5 99 999.5 134 999.5 169 999.5 204 999.5 239 999.5 274 999.5 309 999.5 344 999.5 379 999.5 Selling prices (RM) (a) State the distribution shapes in the two areas. (b) Compare the dispersions of the house prices in the two areas. (c) In your opinion, which area represents an urban area and which area represents a rural CHAPTER 7 area? Solution: (a) The distribution shape of the selling prices in area A is skewed to the right whereas in area B is skewed to the left. (b) The dispersions of the selling prices in area A and area B are approximately the same even though their distribution shapes are different. (c) Area A represents a rural area because most of the selling prices are lower whereas Area B represents an urban area because most of the selling prices are higher. 205 KPM Self Practice 7.1b 1. The diagram below shows two histograms of Mathematics test marks obtained by two groups, Arif and Bestari. Mathematics Test Marks of Arif Group Mathematics Test Marks of Bestari Group Number of pupils Number of pupils 6 8 4 6 2 4 0 2 10.5 20.5 30.5 40.5 50.5 60.5 70.5 80.5 90.5 Marks 0 10.5 20.5 30.5 40.5 50.5 60.5 70.5 80.5 90.5 Marks (a) State the distribution shape of the histogram for the two groups. (b) Compare the dispersions of test marks between the two groups. (c) Which group shows better results? Give your reason. 2. The diagram below shows the survey results of the traffic flow in two different locations. Each location records the speeds of 50 cars. Speeds of Cars 22 Location A 20 Location B 18 16 Number of cars 14 12 10 8 6 4 2 0 CHAPTER 7 0.5 20.5 40.5 60.5 80.5 100.5 120.5 Speed (km h–1) (a) State the distribution shapes in both locations. (b) Compare the dispersions of the car speeds in both locations. (c) In your opinion, which location is a highway and which location is a housing area? 206 KPM CHAPTER 7 Measures of Dispersion for Grouped Data How to construct an ogive for a set of grouped data? Besides histogram and frequency polygon, a frequency distribution can also be displayed by drawing a cumulative Construct an ogive for a frequency graph, also known as an ogive. When the cumulative set of grouped data and frequencies of a data are plotted and connected, it will produce determine the quartiles. an S-shaped curve. Ogives are useful for determining the quartiles and the percentiles. We will learn how to use an ogive for this purpose in the next section. Steps for constructing an ogive: Add one class before the first Choose an appropriate scale on Plot the Draw a class with zero frequency. the vertical axis to represent cumulative smooth curve Find the upper boundary the cumulative frequencies and frequency with passing and the cumulative the horizontal axis to represent the corresponding through all frequency for each class. the upper boundaries. upper boundary. the points. Quartile For a grouped data with number of data N, the quartiles can be determined from the ogive. Q1, N N 3N Q2 and Q3 are the values that correspond to the cumulative frequency , and respectively. 4 2 4 Example 7 The frequency table on the right Salt content shows the salt content of 60 types Frequency (mg) of food. Quartiles are values that 100 – 149 4 (a) Construct an ogive to represent divide a set of data into the data. 150 – 199 11 four equal parts. Each set of data has three (b) From your ogive, determine 200 – 249 15 quartiles, which are Q1, (i) the first quartile Q2 (median) and Q3. (ii) the median 250 – 299 21 The first uartile Q1, (iii) the third quartile 300 – 349 8 also known as the lower quartile, is the middle 350 399 1 value of the lower half Solution: of the data before the median or a quartile that (a) Salt content Upper Cumulative Frequency contains 25% of the data. (mg) boundary frequency The second uartile, Q2, CHAPTER 7 also known as median 50 – 99 0 99.5 0 is the middle value of a 100 – 149 4 149.5 4 set of data. The third uartile, Q3, 150 – 199 11 199.5 15 also known as the upper quartile, is the middle 200 – 249 15 249.5 30 value of the upper half 250 – 299 21 299.5 51 of the data after the median or a quartile 300 – 349 8 349.5 59 that contains 75% of the data. 350 399 1 399.5 60 207 KPM Salt Content in Foods Cumulative frequency Steps to determine the quartiles: 60 N 1. Number of data, N = 60, therefore = 15, 4 50 N 3N = 30 and = 45. 2 4 45 2. Draw a horizontal line from the axis of 40 cumulative frequency at 15 until it intersects the ogive. 30 3. From the intersection point in step 2, draw the vertical line down until it meets the axis 20 of salt content at the horizontal axis. 15 4. The value of the salt content obtained is the 10 value of Q1. 284.5 5. Repeat steps 2 to 4 for the values of 30 and 0 45 to obtain the values of Q2 and Q3. 99.5 149.5 199.5 249.5 299.5 349.5 399.5 Salt content (mg) 1 Info Bulletin (b) × 60 = 15 4 From the graph, the first quartile, The salt content of 15 types The average salt of food are less or equal to intake per day among Q1 = 199.5 mg Malaysians is 7.9 g (1.6 199.5 mg teaspoons). This is above 1 the level recommended × 60 = 30 2 by the World Health From the graph, the median, The salt content of 30 types Organization (WHO), Q2 = 249.5 mg of food are less or equal to which is less than 5 g 249.5 mg (one teaspoon) per day. 3 × 60 = 45 4 The salt content of 45 types of food are From the graph, the third quartile, Q3 = 284.5 mg less or equal to 284.5 mg From Example 7, the first quartile, median and third quartile of a grouped data can be determined by using an ogive. Application & Career Cumulative frequency The first quartile position, Q1 A financial manager 1 CHAPTER 7 = × total frequency, N needs to be an expert N 4 in the features of market 3 —N capital that involve 4 The median position, Q2 financial assets such 1 — 1 2N = × total frequency, N as stocks and bonds. 1N 2 Statistical method can — 4 be used to analyse the The third quartile position, Q3 features of market capital 0 Q1 Q2 Q3 3 through the stocks and = × total frequency, N Variables 4 bonds distributions. 208 KPM CHAPTER 7 Measures of Dispersion for Grouped Data Cumulative histogram and ogive can be constructed using cumulative frequency table. Cumulative histogram is constructed just like histogram, but the vertical axis is represented by cumulative frequency. By referring to Example 7, the cumulative histogram and the related ogive are as shown below. Salt Content in Foods Salt Content in Foods Cumulative frequency Cumulative frequency 60 60 50 50 40 40 30 30 20 20 10 10 0 0 99.5 149.5 199.5 249.5 299.5 349.5 399.5 99.5 149.5 199.5 249.5 299.5 349.5 399.5 Salt content (mg) Salt content (mg) How is the construction of ogive related to the construction of cumulative histogram? Percentile We can analyse a large data more easily and effectively when we divide the data into small parts which is known as percentile. A percentile is a value that divides a set of data into 100 equal parts and is represented by P1, P2, P3, …, P99. Example 8 Aptitude Test Score The ogive on the right shows the scores of an Cumulative frequency aptitude test obtained from candidates who are 60 applying for a post in a company. CHAPTER 7 50 (a) Based on the ogive, find (i) the 10th percentile, P10 40 (ii) the 46th percentile, P46 30 (b) Only those candidates who obtained 92nd 20 percentile and above will be called for an interview. What is the minimum score required 10 in order to be called for an interview? 0 (c) What is the percentage of the candidates who 30.5 40.5 50.5 60.5 70.5 80.5 90.5 Score obtained a score of 57 and below? 209 KPM Solution: Aptitude Test Score 10 Cumulative frequency (a) (i) 10% of the total frequency = × 50 100 60 =5 From the ogive, P10 = 46.5 50 40 46 (ii) 46% of the total frequency = × 50 30 100 = 23 20 15 From the ogive, P46 = 63.5 10 77 92 0 (b) 92% of the total frequency = × 50 30.5 40.5 50.5 60.5 70.5 80.5 90.5 100 46.5 Score 63.5 = 46 Info Bulletin P92 = 77. Therefore, only candidates with a minimum score of 77 will be called for an interview. 25th percentile is also known as first uartile, (c) From the ogive, 50th percentile as median 15 and 75th percentile as × 100 = 30% third quartile. 50 Therefore, 30% of the candidates obtained a score of 57 and below. What is the difference between quartile and percentile? Self Practice 7.1c 1. The frequency table on the right shows the marks Marks Number of pupils of 100 pupils in an examination. 11 – 20 2 (a) Construct an ogive to represent the data. 21 – 30 13 (b) From your ogive, determine 31 – 40 25 (i) the first quartile 41 – 50 25 (ii) the median 51 – 60 19 (iii) the third quartile 61 – 70 10 71 – 80 4 81 – 90 2 CHAPTER 7 2. The frequency table on the right shows the length Length of soles Number of of the soles of 40 pupils. (cm) pupils (a) Construct an ogive to represent the data. 21.0 – 21.9 1 (b) Based on the ogive, find 22.0 – 22.9 4 (i) the 20th percentile, P20 (ii) the 55th percentile, P55 23.0 – 23.9 10 (iii) the 85th percentile, P85 24.0 – 24.9 18 (c) What is the percentage of the pupils having a 25.0 – 25.9 5 sole length of 24.6 cm and below? 26.0 – 26.9 2 210 KPM CHAPTER 7 Measures of Dispersion for Grouped Data 7.2 Measures of Dispersion How to determine range, interquartile range, variance and standard deviation for grouped data? Determine range, In Form 4, you have learnt ways to determine range, interquartile interquartile range, range, variance and standard deviation as a measure to describe variance and standard dispersion for ungrouped data. In this section, we shall proceed deviation as a measure to the measures of dispersion for grouped data. to describe dispersion for grouped data. Range and Interquartile Range Example 9 Pak Hamidi had recorded the mass of pineapples that he harvested from his farm. The following frequency table and ogive show the data that he obtained. Determine the range and interquartile range for the data. Masses of Pineapples Number of Cumulative frequency Mass (g) pineapples 80 400 – 499 6 Interquartile range 60 = Q3 – Q1 500 – 599 12 40 600 – 699 16 20 700 – 799 24 Interquartile range of a 0 set of grouped data can 399.5 499.5 599.5 699.5 799.5 899.5 999.5 800 – 899 14 be determined from ogive 900 – 999 8 Mass(g) by finding Q1 and Q3 first. Solution: Range = midpoint of the highest class – midpoint of the lowest class 900 + 999 400 + 499 = – 2 2 Difference between the Masses of Pineapples = 949.5 – 449.5 heaviest pineapple and the Cumulative frequency = 500 g lightest pineapple is 500 g. CHAPTER 7 80 From the ogive, the position of Q1: the position of Q3: 60 1 3 × 80 = 20 × 80 = 60 40 4 4 Q1 = 614.5 Q3 = 809.5 20 Q1 = 614.5 Q3 = 809.5 Therefore, the interquartile 0 Difference between the heaviest 399.5 499.5 599.5 699.5 799.5 899.5 999.5 range pineapple and the lightest = 809.5 – 614.5 pineapple that lies in the middle Mass (g) = 195 g 50% of the distribution is 195 g. 211 KPM Variance and Standard Deviation Variance and standard deviation for a grouped data can be obtained using the following formulae. ariance is the average of the square of the Variance, σ2 Standard deviation, σ where difference between each f x2 – 2 x = midpoint of the data and the mean. = –x = f x2 – 2 Standard deviation is a f –x class interval f measure of dispersion f = frequency relative to its mean, –x = mean of the data which is measured in the same unit of the original data. Example 10 The frequency table below shows the volumes of water to the nearest litres, used daily by a group of families in a housing area. Calculate the variance and standard deviation of the data. Volume of water (l) 150 – 159 160 – 169 170 – 179 180 – 189 190 – 199 200 – 209 Number of families 8 12 15 24 20 16 Solution: Volume of Frequency, Midpoint, fx x2 fx2 water (l) f x 150 – 159 8 154.5 1 236 23 870.25 190 962 160 – 169 12 164.5 1 974 27 060.25 324 723 170 – 179 15 174.5 2 617.5 30 450.25 456 753.75 180 – 189 24 184.5 4 428 34 040.25 816 966 190 – 199 20 194.5 3 890 37 830.25 756 605 200 – 209 16 204.5 3 272 41 820.25 669 124 f = 95 fx = 17 417.5 fx = 3 215 133.75 2 fx Mean, –x = f Checking 17 417.5 Answer = 95 1. Press M DE M DE = 183.34 l Display SD REg BaSE fx2 – 2 1 2 3 Variance, σ2 = –x Choose 1 f CHAPTER 7 2. Enter (midpoint), press 3 215 133.75 17 417.5 2 SHIFT , (frequency) = 95 – 95 ! " M+ , and repeat for the = 229.1856 subsequent values. = 229.19 l2 (correct to 2 decimal places) 3. Press aC SHIFT 2 Display – x xσn xσn–1 Standard deviation, σ = fx2 – 2 1 2 3 –x f Press 1 for mean: Display 183.3421053 = $% 229.1855956 Press 2 for standard deviation: = 15.1389 Display 15.13887696 = 15.14 l (correct to 2 decimal places) 212 KPM CHAPTER 7 Measures of Dispersion for Grouped Data Self Practice 7.2a 1. The frequency table below shows the electricity bills of apartment units for a certain month. Electricity bill (RM) 30 – 49 50 – 69 70 – 89 90 – 109 110 – 129 Number of apartment units 4 9 11 15 13 Construct an ogive for the data and hence, calculate the range and interquartile range. Explain the meaning of the range and interquartile range obtained. 2. Calculate the variance and standard deviation of each of the following data. Give your answer correct to two decimal places. (a) Time (minutes) 1–2 3–4 5–6 7–8 9 – 10 11 – 12 Frequency 15 20 28 35 30 24 (b) Distance (m) 11 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 – 70 71 – 80 Frequency 5 8 13 20 22 21 11 How to construct and interpret a box plot for a set of grouped data? Construct and interpret You have learnt that a box plot is a method to display a a box plot for a set of group of numerical data graphically based on the five number grouped data. summary of data. They are the minimum value, first quartile, median, third quartile and maximum value. Similar to the histrogram and frequency polygon, the shape of a distribution can also be identified through the box plot. (a) (b)(i) (b)(ii) Whisker Whisker Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3 Symmetric distribution Left-skewed distribution Right-skewed distribution CHAPTER 7 (a) The median lies in the middle of the box and the whiskers are about the same length on both sides of the box. (b) The median cuts the box into two different sizes. (i) If the left side of the box is longer, then the data distribution is left-skewed. (ii) If the right side of the box is longer, then the data distribution is right-skewed. Left whisker and right whisker represent the score outside of the median. If the box is divided into the same size but the left whisker is longer than the right whisker, then the data distribution is left-skewed, and vice versa. 213 KPM Example 11 Info Bulletin Masses of Starfruits The ogive on the right shows 90 Ogive and box plot on the the masses in g, of same graph: 90 starfruits. 80 Cumulative frequency Position of Q3 Masses of Starfruits (a) Construct a box plot 70 based on the ogive. 60 90 Position of Q2 Cumulative frequency (b) Hence, state the 50 80 40 distribution shape of the 70 30 Position of Q1 60 data. 50 20 Solution: 10 Minimum 123 Maximum 40 30 (a) From the ogive: 0 116 128 20 80 90 100 110 120 130 140 150 Minimum value = 80 10 116 123 128 Mass (g) 0 Maximum value = 150 100 120 130 140 150 110 80 90 1 Mass (g) Position of Q1: × 90 = 22.5 4 Q1 = 116 Info Bulletin 1 Position of Q2: × 90 = 45 25% 25% 25% 25% of data 2 Q2 = 123 3 Position of Q3: × 90 = 67.5 4 Q3 = 128 Box plot: 80 90 100 110 120 130 140 150 Mass (g) (b) The distribution of the data is skewed to the left because the left side of the box plot is longer than the right side of the box plot. Self Practice 7.2b Units of Power Consumed 1. The ogive on the right shows the number of units of electrical power, consumed by 80 households in 80 CHAPTER 7 Cumulative frequency a particular month. 70 (a) Construct a box plot based on the ogive. 60 (b) Hence, state the distribution shape of the data. 50 40 30 20 10 0 50 100 150 200 250 300 350 400 450 Electrical power (units) 214 KPM CHAPTER 7 Measures of Dispersion for Grouped Data 2. The ogive on the right shows the Duration of Songs duration in seconds, of 60 songs Cumulative frequency aired by a radio station at a certain 60 time. (a) Construct a box plot based on 40 the ogive. (b) Hence, state the distribution 20 shape of the data. 0 149.5 199.5 249.5 299.5 349.5 399.5 449.5 Duration (s) How to compare and interpret two or more sets of grouped data based on measures of dispersion? Compare and interpret Example 12 two or more sets of grouped data, based on A botanist sowed 40 samples of hibiscus seeds using two measures of dispersion different hybrids, A and B. The diameters of both hybrids are hence make conclusion. measured under close guard to develop an extra-large hibiscus. The following frequency table shows the diameters of petals for hybrid A and B. Diameter (cm) 13.0 – 13.4 13.5 – 13.9 14.0 – 14.4 14.5 – 14.9 15.0 – 15.4 Hybrid A 4 8 9 10 9 Hybrid B 9 10 8 6 7 Based on the mean and standard deviation, determine which hybrid produces larger and more consistent petals. Justify your answer. Solution: For hibiscus of hybrid A, Diameter Frequency, Midpoint, fx x2 fx2 (cm) f x 13.0 – 13.4 4 13.2 52.8 174.24 696.96 13.5 – 13.9 8 13.7 109.6 187.69 1 501.52 14.0 – 14.4 9 14.2 127.8 201.64 1 814.76 CHAPTER 7 14.5 – 14.9 10 14.7 147 216.09 2 160.9 15.0 – 15.4 9 15.2 136.8 231.04 2 079.36 f = 40 fx = 574 fx2 = 8 253.5 574 Mean, –x = Standard deviation, σ = 8 253.5 – 14.352 40 40 = 14.35 cm = $% 0.415 = 0.64 cm 215 KPM For hibiscus of hybrid B, Diameter Frequency, Midpoint, fx x2 fx2 (cm) f x 13.0 – 13.4 9 13.2 118.8 174.24 1 568.16 13.5 – 13.9 10 13.7 137 187.69 1 876.9 14.0 – 14.4 8 14.2 113.6 201.64 1 613.12 14.5 – 14.9 6 14.7 88.2 216.09 1 296.54 15.0 – 15.4 7 15.2 106.4 231.04 1 617.28 f = 40 fx = 564 fx2 = 7 972 Mean, –x Standard deviation, σ 564 7 972 = = – 14.12 40 40 = 14.1 cm = $% 0.49 = 0.7 cm Tunku abdul Rahman Hybrid A produces larger petals because the mean is larger Putra al-Haj declared the hibiscus as The than hybrid B (14.35 cm ! 14.1 cm) and the smaller standard National Flower in 1960. deviation (0.64 cm " 0.7 cm) shows that the diameter of the The five petals of the petals is more consistent. ower represent the five principles of Rukun Negara. Self Practice 7.2c 1. A ball manufacturing factory needs to regulate the internal air pressure in psi, of the produced ball before being marketed. The frequency table below shows the internal air pressures of 50 ball samples taken from machine P and machine. Air pressure 8.0 – 8.9 9.0 – 9.9 10.0 – 10.9 11.0 – 11.9 12.0 – 12.9 13.0 – 13.9 (psi) Machine P 7 11 13 12 5 2 Machine 1 3 5 20 18 3 The factory specified that the internal air pressure of a ball should be between 11.3 psi to CHAPTER 7 11.7 psi. Which machine shows better performance in terms of air pressure accuracy? 2. The frequency table below shows the lifespans in years, of brand X and brand Y batteries. Lifespan (years) 0 – 0.9 1.0 – 1.9 2.0 – 2.9 3.0 – 3.9 4.0 – 4.9 Brand X battery 4 10 17 20 9 Brand Y battery 10 21 15 8 6 By using suitable measures, determine which brand of battery is better and lasts longer. 216 KPM CHAPTER 7 Measures of Dispersion for Grouped Data How to solve problems involving measures of dispersion for grouped data? Solve problems involving measures of dispersion Example 13 for grouped data. A survey on the duration of time in hours, spent by customers to buy goods Time Spent by Customers in a supermarket is carried out. The results of the survey are shown in the 80 ogive on the right. Cumulative frequency (a) Construct a frequency table for the 60 time taken by the customers to buy goods in the supermarket using 40 the classes 0.5 – 0.9, 1.0 – 1.4, 1.5 – 1.9, 2.0 – 2.4 and 2.5 – 2.9. 20 (b) Hence, estimate the mean and standard deviation of the data. 0 0.45 0.95 1.45 1.95 2.45 2.95 Time (hours) Solution: Understanding the problem Devising a strategy Determine the mean and standard (a) Construct the frequency table from the ogive. deviation from the ogive. (b) Calculate the mean and standard deviation using formula. Implementing the strategy (a) Time Spent by Customers Time Number of 80 (hours) customers 70 Cumulative frequency 60 0.5 – 0.9 6 6–0=6 54 CHAPTER 7 1.0 – 1.4 16 22 – 6 = 16 40 22 1.5 – 1.9 32 54 – 22 = 32 20 2.0 – 2.4 16 70 – 54 = 16 6 0 2.5 – 2.9 10 80 – 70 = 10 0.45 0.95 1.45 1.95 2.45 2.95 Time (hours)

Use Quizgecko on...
Browser
Browser