Numerical Summaries PDF
Document Details
Uploaded by Deleted User
Tags
Related
Summary
This document contains mathematical examples of Numerical Summaries, Arithmetic Mean, Median, Grouped Data, and their applications.
Full Transcript
Chapter 1 Numerical Summaries Measures of Central Tendency The three most common measures of central tendency are the arithmetic mean, median, and mode. the Arithmetic Mean Arithmetic Mean: The sum of the values divided by the number of values. Example 1...
Chapter 1 Numerical Summaries Measures of Central Tendency The three most common measures of central tendency are the arithmetic mean, median, and mode. the Arithmetic Mean Arithmetic Mean: The sum of the values divided by the number of values. Example 1: The grades of a student on eight 100- point examinations were 70, 65, 69, 85, 94, 62, 79, and 100. Find the mean. Solution: In this example n = 8, the mean of this set of grades is Example 2: Suppose that the following sample represents the ages (in year) of a sample of 3 men: Then, the sample mean is: Note: the Mean for Grouped Data where denotes the frequency of the class, denotes the midpoint of class denotes the total frequency. Example Class Intervals Frequency ( f ) Midpoint ( x ) fx 5 ≤ t < 10 1 7.5 7.5 10 ≤ t < 15 4 12.5 50 15 ≤ t < 20 6 17.5 105 20 ≤ t < 25 4 22.5 90 25 ≤ t < 30 2 27.5 55 30 ≤ t < 35 3 32.5 97.5 Total 20 405 Thus, the mean of the grouped data is Example 3: Find the mean for the following grouped data Age (years) Frequency 1 - 10 12 11 - 20 30 Age (years) Frequency ( f ) x fx 21 - 30 18 1 - 10 12 5 60 31 - 40 12 41 - 50 9 11 - 20 30 15 450 51 - 60 6 21 - 30 18 25 450 61 - 70 0 31 - 40 12 35 420 41 - 50 9 45 405 51 - 60 6 55 330 61 - 70 0 65 0 𝑥= ∑ 𝑓𝑥 2115 = =24.31 Total 87 2115 ∑ 𝑓 87 Example 4: Find the mean of the following set of 30 numbers by grouping them into a frequency distribution: 4, 3, 6, 7, 5, 5, 3, 4, 9, 6, 5, 5, 6, 8, 3, 6, 6, 3, 5, 4, 7, 6, 4, 1, 9, 7, 8, 6, 4, 6. Solution: Below how to grouping the data Thus, the mean of the grouped data is x 1 2 3 4 5 6 7 8 9 Sum Tally f fx 1 1 0 0 4 12 5 20 5 25 48 8 3 21 2 16 2 18 30 161 ∑ 𝑓𝑥 161 𝑥= = =5.4 ∑ 𝑓 30 Example 5: Class 50-60 60-70 70-80 80-90 90-100 100-110 110-120 Frequency 7 9 19 14 14 6 2 Class Frequency Midpoint(x) fx 50-60 7 55 385 60-70 9 65 585 70-80 19 75 1425 80-90 14 85 1190 𝑛 90-100 14 95 1330 ∑ 𝑓𝑖𝑥 5775 100-110 6 105 630 𝑋 = 𝑖=1 = =81.34 110-120 2 115 230 ∑𝑓 71 Total N=71 5775 Example 6: The following table indicates the data on the number of patients visiting a hospital in a month. Find the average number of patients visiting the hospital in a day. Number of Number of days visiting patients hospital Classes Class mark frequency 0-10 2 (xi) (fi) 10-20 6 0-10 5 2 10 20-30 9 10-20 15 6 90 30-40 7 20-30 25 9 225 40-50 4 30-40 35 7 245 50-60 2 40-50 45 4 180 50-60 55 2 110 𝑛 30 ∑ 𝑥𝑖 𝑓 𝑖 860 𝑖 =1 860 𝑋= 𝑛 = =28. 67 30 ∑ 𝑓 𝑖 𝑖=1 Advantages and disadvantages of the mean: Advantages: Uniqueness. For a given set of data there is one and only one mean. Simplicity. It is easy to understand and to compute. The mean takes into account all values of the data. Disadvantages: Affected by extreme values. Since all values enter into the computation. Example: The mean can only be found for Sample Data mean quantitative variables. A 2,4,5,7,7,10 5.83 B 2,4,5,7,7,100 20.83 the Median () Median The midpoint of the values after they have been arranged from the smallest to the largest (or the largest to the smallest). There will be as many values above the median as below the median. 𝑋 𝑛+1 𝑖𝑓 𝑛 𝑖𝑠 𝑎𝑛 𝑜𝑑𝑑 𝑛𝑢𝑚𝑏𝑒𝑟 ~ 2 𝑿={ 1 2 ( 𝑋 𝑛+ 𝑋𝑛 2 2 +1 ) 𝑖𝑓 𝑛 𝑖𝑠 𝑎𝑛𝑒𝑣𝑒𝑛 𝑛𝑢𝑚𝑏𝑒𝑟 Example 7: Example 8: For the data set 2, 2, 3, 4, 6, 7, 7, 7, 11 calculate the median. Solution: The median value , n=9 hence For the data set 60, 56, 48, 46, 50, 70 calculate the median. Solution: Arrange the data from low to high 46, 48, 50, 56, 60, 70 n=6 (even number), hence Example 9: Let us we have the data 4, 2, 1, 4, 5, 2, 1 to get the median we must firstly rearrange this data to be on the form 1, 1, 2, 2, 4, 4, 5 Therefore the median is 2 But if we have even number of data 4, 2, 1, 4, 5, 2 after rearrange this data to be 1, 2, 2, 4, 4, 5 , the median in this case will be (2 + 4)/2 = 3 the Median for Grouped Data Is the lower true limit of the class containing the median, Is the total number of frequencies, Is the cumulative number of frequencies in all the classes immediately preceding the class containing the median, Is the frequency in the class containing the median, Is the size of the interval. Example 10: Calculate the median for the following frequency table Class 50-60 60-70 70-80 80-90 90-100 100-110 110-120 Frequency 7 9 19 14 14 6 2 Class Frequency 50-60 7 7 The median is that point in the data will h 60-70 9 16 50% of the entries above it and 50% below it 70-80 19 35 80-90 14 49 Now 50% of 71 is 35.5, so we are interested 90-100 14 63 finding the point in the distribution with 100-110 6 69 entries above it and 35 below it, thus 110-120 2 71 median must lie in the interval 80-90. Now Total 71 Example 11: Based on the grouped data below, find the median: Time to travel to Frequency Cumulative Time to travel to Frequency work Frequency work 1-10 8 8 1-10 8 10-20 14 22 10-20 14 20-30 12 34 20-30 12 30-40 9 43 30-40 9 40-50 7 50 40-50 7 2nd Step: class median is the 3rd class So, Therefore, Median = Example 12: Find the median marks for the following distribution: Classes 0-10 10-20 20-30 30-40 40-50 Number of students 2 12 22 8 6 Classes Frequency Cumulative frequency 0-10 2 2 10-20 12 2 + 12 = 14 20-30 22 14 + 22 = 36 The median class : 20-30. 30-40 8 36 + 8 = 44 40-50 6 44 + 6 = 50 Advantages and disadvantages of the median: Advantages: Uniqueness. For a given set of data there is one and only one median. Simplicity. It is easy to calculate. It is not affected by extreme values as is the mean. Disadvantages The Mode Mode The mode of a set of real numbers is the value that occurs with the greatest frequency exceeding frequency of 1. e.g. 2, 4, 5, 1, 7, 9, 0 : No mode 2, 4, 2, 5, 4, 2 : Mode is 2 2, 4, 2, 5, 4, 2, 4, 7 : Modes are 2 and 4 The grouped of data 2, 3, 4, 5, 7, 15 has no mode. The grouped of data 2, 2, 2, 3, 3, 7, 7, 7, 11, 15 has two modes 2 and 7 and is the Mode of Grouped Data the mid-point of the class containing the largest class frequency. Calculate the mode from the following frequency table Class 10-15 15-20 20-25 25-30 Frequency 6 9 7 4 The largest class frequency is 9 and the midpoint of that class interval is 17.5, hence The mode is 17.5 Based on the table, therefore, Time to travel to work Frequency 1-10 8 10-20 1 6 14 Mode L mo i 10 10 17 20-30 12 1 2 62 30-40 9 40-50 7 i 10, L mo 10, 1 14 8 6 and 2 14 12 2 Mode can also be obtained from a histogram. step 1: Identify the modal class and the bar representing it. Step 2: Draw two cross lines as shown in the diagram. Step 3: Drop a perpendicular from the intersection of the two lines until it touch the horizontal axis. Step 4: Read the mode from the horizontal axis. for the last example we can get the same result as follows Example 17: Find the mode of the given data: Marks Obtained 0-20 20-40 40-60 60-80 80-100 Number of students 5 10 12 6 3 The highest frequency = 12, so the modal class is 40-60. L= lower limit of modal class = 40 = frequency of modal class =12 =frequency of class preceding modal class = 10 =frequency of class succeeding modal class = 6 h =class width = 20 Using the mode formula, Example 18:The heights, in cm, of 50 students are recorded Height (in cm) 125-130 130-135 135-140 140-145 145- 150 Calculate the mode Number of students 7 14 10 10 9 Here, the maximum frequency is 14 and the corresponding class is 130-135. So, 130-135 is the modal class. L=130, h=5, =14, =7 and =10. Mode=h= =133.18. Hence, the modal height = 133.18 cm. Example 19: Find the mean, mode and median for the following data, Class 0-10 10-20 20-30 30-40 40-50 Frequency 8 16 36 34 6 Class Cumulative frequency 0-10 8 5 8 40 10-20 16 15 24 240 20-30 36 25 60 900 30-40 34 35 94 1190 40-50 6 45 100 270 Sum 100 2640 𝒏 ∑ 𝒙𝒊 𝒇 𝒊 𝟐𝟔𝟒𝟎 𝒊=𝟏 𝑴𝒆𝒂𝒏 = = = 𝟐𝟔. 𝟒. 𝒏 𝟏𝟎𝟎 ∑ 𝒇 𝒊 Here, N =100 ⇒ N / 2 = 50. Cumulative frequency just greater than 50 is 60 and corresponding class is 20-30. Thus, the median class is 20-30. Hence, L = 20, c = 10, f = 36, c = c. f. of preceding class = 24 and N/2=50 Median = 27.2. Mode = 28.8. Advantages and disadvantages of the median Advantages: Simplicity. It is easy to calculate. It may be used for both quantitative and qualitative data. It is not affected by extreme values Disadvantages: There might be no mode or more than one mode. The mode does not take into account all the values of the sample. Measures of Variation Measures of central tendency locate the center of a distribution. They do not indicate how the values are distributed around the center. – Measures of variation examine the spread, or variation, of data values around the center. Example The grades of class A in math are 55 60 65 70 75 80 85 90 95 100 55 60 65 70 75 80 85 90 95 100 10 77.5 Example The grades of class B in math are 73 74 75 76 77 78 79 80 81 82 73 74 75 76 77 78 79 80 81 82 10 77.5 Example The two distributions have the same mean. Are they the same? How exactly are they different? – The difference is in the spread of values around the mean. – In class B, the data values are clustered closer to the mean. – The grades of class B are more consistent. Measures of Variation We’ll consider three measures of variation: 1. The range. 2. The variance and the standard deviation. 3. The coefficient of variation 1. The Range The range, R, is defined by R = highest value - lowest value In the grades example – For class A R = 100 -55 = 45 – For class B R = 82 - 73 = 9 But the range is sometimes misleading Example The salaries for the staff of the XYZ Manufacturing Co. are shown here. Staff Salary Owner $100,000 Manager 40,000 Salesperson 30,000 Workers 25,000 15,000 18.000 Example R = 100,000 -15,000 = 85,000 The presence of an outlier (the owner’s salary) distorts the measure of variation given by the range. 2. The Variance The population variance is defined by N i ( X ) 2 2 i 1 Note N – The variance is a mean. – It is the mean of the square of distances to the population mean. – The squaring is needed so that differences of values greater and smaller of the mean will not cancel each other. Computing the Variance 55 60 65 70 75 80 85 90 95 100 1. Find the mean 77.5 2. Subtract the mean from each data value 55 77.5 22.5 60 77.5 17.5 65 77.5 12.5 70 77.5 7.5 75 77.5 2.5 80 77.5 2.5 85 77.5 7.5 90 77.5 12.5 95 77.5 17.5 100 77.5 22.5 Computing the Variance 3. Square each result ( 22.5) 2 506.25 ( 17.5) 2 306.25 ( 12.5) 2 156.25 ( 7.5) 2 56.25 ( 2.5) 2 6.25 (2.5) 2 6.25 (7.5) 2 56.25 (12.5) 2 156.25 (17.5) 2 306.25 (22.5) 2 506.25 4. Find the sum of the squares = 2062.5 5. Divide by N (= 10) 2062.5 / 10 = 206.25 206.3 Computing the Variance For Class B, 73 74 75 76 77 78 79 80 81 82 the variance will be 8.25 Note We note that the variance of the second class (B) will be smaller than that of class A, which is expected of course, since the variance measures the variation of the data, i.e the more the variation or the spread of the data around the mean, the greater value of the variance will occur. 3. The Standard Deviation The standard deviation is the square root of the variance. – It has the same units as the raw data. For a population 2 For a sample 2 s s Computing the standard deviation Find the sample standard deviation for the amount of European auto sales for a sample of 6 years shown. The data are in millions of dollars. 11.2, 11.9, 12.0, 12.8, 13.4, 14.3 Solution (Find in the SD mode by a calculator) s 1.1296 1.13 Variance for Grouped Data For a population 2 f ( X m )2 For a sample f Example Class Frequency 5.5 – 10.5 1 10.5 – 15.5 2 15.5 – 20.5 3 20.5 – 25.5 5 25.5 – 30.5 4 30.5 – 35.5 3 35.5 – 40.5 2