Chapter 2 - Describing Data Graphically and Numerically PDF
Document Details
![RenownedSulfur](https://quizgecko.com/images/avatars/avatar-18.webp)
Uploaded by RenownedSulfur
MMSU
Tags
Summary
This document provides an outline of Chapter 2:"Describing Data Graphically and Numerically" and includes topics such as frequency distribution tables, graphical descriptions, and numerical measures.
Full Transcript
2/3/2024 1 Describing Data Graphically and Numerically CE 190: Engineering Data Analysis with Research Methods 2nd Semester A.Y. 2023-2024 COLLEGE OF ENGINEERING Department of Civil Engineering 2...
2/3/2024 1 Describing Data Graphically and Numerically CE 190: Engineering Data Analysis with Research Methods 2nd Semester A.Y. 2023-2024 COLLEGE OF ENGINEERING Department of Civil Engineering 2 1 2/3/2024 Topic Outline Frequency Distribution Tables Graphical Description of Data Numerical Measures of Quantitative Data Numerical Measures of Grouped Data Measures of Relative Position Box-Whisker Plot 3 Frequency Distribution Tables Chapter 2: Describing Data Graphically and Numerically 4 2 2/3/2024 Frequency Distribution Table (FDT) A frequency distribution table shows how data are partitioned among several categories (or classes) by listing the categories along with the number (frequency) of data values in each of them. 5 Frequency Distribution Table An example of a FDT is shown below. Cumulative Frequency or Cumulative Relative Categories Tally Relative Count Frequency Frequency Frequency Category 1 ||||| ||||| ||||| ||||| ||||| ||| 28 28 0.2545 0.2545 Category 2 ||||| ||||| ||||| ||||| ||||| | 26 54 0.2364 0.4909 Category 3 ||||| ||||| ||||| ||||| 20 74 0.1818 0.6727 Category 4 ||||| ||||| ||||| | 16 90 0.1455 0.8182 Category 5 ||||| ||||| ||||| ||||| 20 110 0.1818 1.0000 Total 110 1.0000 6 3 2/3/2024 Organizing Qualitative Data into FDT Back Back Hand Example: Wrist Back Groin A physical therapist wants to determine Elbow Back Back the types of rehabilitation required by Back Shoulder Shoulder her patients. To do so, she obtains a Hip Knee Hip simple random sample of 30 of her Neck Knee Knee patients and records the body part Shoulder Shoulder Back requiring rehabilitation. Construct a Back Back Back frequency distribution table of location Knee Knee Back of injury. Hand Back Wrist 7 Organizing Qualitative Data into FDT 1. Create a list of the body parts (categories) and write in the first column Body Part Back Wrist Elbow Hip Shoulder Knee Hand Groin Neck 8 4 2/3/2024 Organizing Qualitative Data into FDT 2. Tally each occurrence. Body Part Tally Back ||||| ||||| || Wrist || Elbow | Hip || Shoulder |||| Knee ||||| Hand || Groin | Neck | 9 Organizing Qualitative Data into FDT 3. Add up the number of tallies to determine the frequency Body Part Tally Frequency Back ||||| ||||| || 12 Wrist || 2 Elbow | 1 Hip || 2 Shoulder |||| 4 Knee ||||| 5 Hand || 2 Groin | 1 Neck | 1 10 5 2/3/2024 Organizing Qualitative Data into FDT 4. Complete the cumulative frequency column of each category by getting the sum of the frequencies for that class and all previous classes Body Part Tally Frequency C.F. Back ||||| ||||| || 12 12 Wrist || 2 14 Elbow | 1 15 Hip || 2 17 Shoulder |||| 4 21 Knee ||||| 5 26 Hand || 2 28 Groin | 1 29 Neck | 1 30 11 Organizing Qualitative Data into FDT frequency 5. Complete the relative frequency of each category using the formula R.F. sum of all frequencies Body Part Tally Frequency C.F. R.F. Back ||||| ||||| || 12 12 0.4 Wrist || 2 14 0.0667 Elbow | 1 15 0.0333 Hip || 2 17 0.0667 Shoulder |||| 4 21 0.1333 Knee ||||| 5 26 0.1667 Hand || 2 28 0.0667 Groin | 1 29 0.0333 Neck | 1 30 0.0333 12 6 2/3/2024 Organizing Qualitative Data into FDT 6. Complete the cumulative relative frequency of each category using the formula Body Part Tally Frequency C.F. R.F. C.R.F Back ||||| ||||| || 12 12 0.4 0.4 Wrist || 2 14 0.0667 0.4667 Elbow | 1 15 0.0333 0.5 Hip || 2 17 0.0667 0.5667 Shoulder |||| 4 21 0.1333 0.7 Knee ||||| 5 26 0.1667 0.8667 Hand || 2 28 0.0667 0.9334 Groin | 1 29 0.0333 0.9667 Neck | 1 30 0.0333 1 13 Organizing Quantitative Data into FDT To construct a frequency distribution table for a quantitative data set, we follow these steps: 1) Find the range (R) Range = R = largest data point − smallest data point 2) Divide the data set into an appropriate number of classes using Sturges’s formula (m) Number of classes = m =1+3.3 log n 14 7 2/3/2024 Organizing Quantitative Data into FDT 3) Determine the width of classes as follows Class width = R/m Class width should always be a whole number obtained only by rounding up. The number of decimal places of class width should be the same as the one with the highest number of decimal digits in the given data set. 4) Finally, prepare the frequency distribution table by assigning each data point to an appropriate class 15 Organizing Quantitative Data into FDT Example: The following data give the lengths (in millimeters) of 40 randomly selected rods manufactured by a company: 145 140 120 110 135 150 130 132 137 115 142 115 130 124 139 133 118 127 144 143 131 120 117 129 148 130 121 136 133 147 147 128 142 147 151 122 120 145 126 151 16 8 2/3/2024 Organizing Quantitative Data into FDT 151 110 41 Step 1: Find the range (R) 1 3.3 log 40 6.29 ! 6 Step 2: Determine the number of classes 41 Step 3: Solve for the class width class width 6.83 ! 7 6 Round-up to the nearest whole number (since the data set has whole numbers) Step 4: Construct the FDT 17 Organizing Quantitative Data into FDT Write the lower class limits in the first column. Choose the value for the first lower class limit by using either the minimum value or a convenient value below the minimum. Classes 110 18 9 2/3/2024 Organizing Quantitative Data into FDT Using the first lower class limit and the class width, list the other lower class limits. (Add the class width to the first lower class limit to get the second lower class limit, and so on.) Classes 110 117 124 131 138 145 19 Organizing Quantitative Data into FDT Determine the upper class limits, and enter them in the table. Classes 110 – 116 117 – 123 124 – 130 131 – 137 138 – 144 145 – 151 20 10 2/3/2024 Organizing Quantitative Data into FDT Tally each of the data point in their corresponding class, and write the frequencies. Then complete the table. Classes Tally Frequency C.F. R.F. C.R.F. 110 – 116 ||| 3 3 0.075 0.075 117 – 123 ||||| || 7 10 0.175 0.25 124 – 130 ||||| ||| 8 18 0.2 0.45 131 – 137 ||||| || 7 25 0.175 0.625 138 – 144 ||||| | 6 31 0.15 0.775 145 – 151 ||||| |||| 9 40 0.225 1.0 Total 40 1.0 21 Graphical Description of Data Chapter 2: Describing Data Graphically and Numerically 22 11 2/3/2024 Dot Plot A dot plot consists of a graph in which each data value is plotted as a point (or dot) along a horizontal scale of values. Dots representing equal values are stacked. Dot plot provides visual information about the distribution of a single variable. 23 Dot Plot Example: The following data give the number of defective motors received in 20 different shipments: 8 12 10 16 10 25 21 15 17 5 26 21 29 8 6 21 10 17 15 13 Construct a dot plot for these data. 24 12 2/3/2024 Dot Plot Example: The following data give the number of defective motors received in 20 different shipments: 8 12 10 16 10 25 21 15 17 5 26 21 29 8 6 21 10 17 15 13 Construct a dot plot for these data. 25 Pie Chart A pie chart is a graph that depicts categorical data as slices of a circle, in which the size of each slice is proportional to the frequency count for the category. The angle of slice is determined using the formula Angle of a slice in degrees = (Relative frequency of the given category) × 360 The pie chart helps us better understand at a glance the composition of the population with respect to the characteristic of interest. 26 13 2/3/2024 Pie Chart Example: In a manufacturing operation, we are interested in understanding defect rates as a function of various process steps. The inspection points (categories) in the process are initial cutoff, turning, drilling, and assembly. The frequency distribution table for these data is shown in the table. Construct a pie chart for these data. Process Steps Frequency Relative Frequency Initial Cutoff 86 0.2382 Turning 182 0.5042 Drilling 83 0.2299 Assembly 10 0.0277 Total 361 1.0000 27 Pie Chart Solution: Compute for the angle of slice of each category: Process Steps Frequency Relative Frequency Angle of Slice Initial Cutoff 86 0.2382 85.75 Turning 182 0.5042 181.51 Drilling 83 0.2299 82.77 Assembly 10 0.0277 9.97 Total 361 1.0000 360 28 14 2/3/2024 Pie Chart 29 Bar Chart A bar graph (or bar chart) uses bars of equal width to show frequencies of categories of categorical (or qualitative) data. The vertical scale represents frequencies or relative frequencies. The horizontal scale identifies the different categories of qualitative data. The bars may or may not be separated by small gaps. A multiple bar graph has two or more sets of bars and is used to compare two or more data sets. 30 15 2/3/2024 Bar Chart Example: The following data give the annual revenues (in millions of dollars) of five companies A, B, C, D, and E for the year 2011: 78, 92, 95, 94, 102 31 Histogram A histogram is a graph consisting of bars of equal width drawn adjacent to each other (unless there are gaps in the data). The horizontal scale represents classes of quantitative data values and the vertical scale represents frequencies (or relative frequencies). A histogram is basically a graph of a frequency distribution table. 32 16 2/3/2024 Histogram Example: The following data give the survival times (in hours) of 50 parts involved in a field test under extraneous operating conditions. Construct a frequency distribution table for this data. Then, construct frequency and relative frequency histograms for these data. 33 Histogram Solution: 1. Range (R) = 195 − 30 = 165 2. Number of Classes (m) = 1+3.3 log 50 = 6.61 ≈ 7 3. Class Width = R/m = 165/7 = 23.57 ≈ 24 34 17 2/3/2024 Histogram Class Tally Frequency Relative Frequency 30 – 53 ||||| 5 0.1 54 – 77 ||||| ||||| 10 0.2 78 – 101 ||||| |||| 9 0.18 102 – 125 ||||| || 7 0.14 126 – 149 ||||| | 6 0.12 150 – 173 ||||| | 6 0.12 174 – 197 ||||| || 7 0.14 Total 50 1.00 35 Histogram 36 18 2/3/2024 Line Graph or Time-Series Graph A line graph or a time-series graph is a graph of time-series data that have been collected at different points in time, such as monthly or yearly. A line graph is commonly used to study any trends in the variable of interest that might occur over time. In a line graph, time is marked on the horizontal axis (the x-axis) and the variable on the vertical axis (the y-axis). 37 Line Graph or Time-Series Graph The data in the table give the number of lawn mowers sold by a garden shop over a period of 12 months of a given year. Prepare a line graph for these data. Months January February March April May June July LM Sold 2 1 4 10 57 62 64 Months August September October November December LM Sold 68 40 15 10 5 38 19 2/3/2024 Line Graph or Time-Series Graph 39 Stem-and-Leaf Plot A stem-and-leaf plot represents quantitative data by separating each value into two parts: the stem (such as the leftmost digit) and the leaf (such as the rightmost digit). One advantage of the stem-and-leaf plot is that we can see the distribution of data while keeping the original data values. It is also a quick way to sort data (arrange them in order). 40 20 2/3/2024 Stem-and-Leaf Plot Example: A manufacturing company has been awarded a huge contract by the Defense Department to supply spare parts. In order to provide these parts on schedule, the company needs to hire a large number of new workers. To estimate how many workers to hire, representatives of the Human Resources Department decided to take a random sample of 80 workers and find the number of parts each worker produces per week. Prepare a stem-and-leaf diagram for these data. 41 Stem-and-Leaf Plot 42 21 2/3/2024 Numerical Measures of Quantitative Data Chapter 2: Describing Data Graphically and Numerically 43 Measures of Centrality A measure of center (or measure of central tendency) is a value at the center or middle of a data set. The three most widely-used measure of center are the mean, median, and the mode. 44 22 2/3/2024 Mean | Measures of Centrality Mean The mean (or arithmetic mean) of a variable is obtained by adding the ∑)* /, scores and dividing the total by the number of scores. The sample mean is denoted by )̅ , and the population mean is denoted by the Greek letter.. 45 Mean | Measures of Centrality Important Properties of the Mean Sample means drawn from the same population tend to vary less than other measures of center. The mean of a data set uses every data value. A disadvantage of the mean is that just one extreme value (outlier) can change the value of the mean substantially. The mean is not a resistant measure of center. 46 23 2/3/2024 Mean | Measures of Centrality Example 1: The hourly wages (in dollars) of randomly selected workers in a manufacturing company: 8, 6, 9, 10, 8, 7, 11, 9, 8 Find the mean hourly wage of these workers. 47 Mean | Measures of Centrality company, the data represent a sample with , 9. Since wages listed in these data are for only some of the workers in the )̅ ∑/0 8+6+9+10+8+7+11+9+8 1 2 )̅ 34 2 )̅ 8.44 The average hourly wage of these employees is $8.44. 48 24 2/3/2024 Mean | Measures of Centrality Example 2: The following data give the ages of all the employees in a city hardware store: 22, 25, 26, 36, 26, 29, 26, 26 Find the mean age of the employees in that hardware store. 49 Mean | Measures of Centrality Since the data give the ages of all the employees of the hardware store, we are dealing with a population. Thus, we have. ∑/0 55657654684654652654654 1 9. 5:4 9. 27 The mean age of the employees in the hardware store is 27 years. 50 25 2/3/2024 Median | Measures of Centrality Median The median of a variable is the value that lies in the middle of the data when arranged in ascending (or descending) order. We shall use ;< to denote the median. 51 Median | Measures of Centrality Important Properties of the Median The median does not change by large amounts when we include just a few extreme values (so the median is a resistant measure of center). The median does not use every data value. 52 26 2/3/2024 Median | Measures of Centrality Steps to determine the median of a data set of size = rank them from 1 to ,. 1. Arrange the observations in the data set in an ascending order and ?, 1@/2 if , is odd 2. Find the rank of the median that is given by Rank >, , and 1 if , is even 2 2 3. Find the value of the observation corresponding to the rank of the median found in step 2. 53 Median | Measures of Centrality Example 1: The following data give the length (in mm) of an alignment pin for a printer shaft in a batch of production: 30, 24, 34, 28, 32, 35, 29, 26, 36, 30, 33 Find the median alignment pin length. 54 27 2/3/2024 Median | Measures of Centrality Write the data in an ascending order and rank them. Observation 24 26 28 29 30 30 32 33 34 35 36 Rank 1 2 3 4 5 6 7 8 9 10 11 Since , is odd, the rank of the median is , 1 11 1 ;< BC,D 6 2 2 ;< 30 Therefore, the median is the 6th data. 55 Median | Measures of Centrality Student Score Example 2: The data in the table Michelle 82 represent the first exam score of 10 Ryanne 77 students enrolled in CE 190. Find Bilal 90 the median score of the data. Pam 71 Jennifer 62 Dave 68 Joel 74 Sam 84 Justine 94 Juan 88 56 28 2/3/2024 Median | Measures of Centrality Write the data in an ascending order and rank them. Observation 62 68 71 74 77 82 84 88 90 94 Rank 1 2 3 4 5 6 7 8 9 10 Since , is even, the rank of the median is 1 :E ;< BC,D 5 and 6 5 5 77 82 Therefore, the median is the average of the 5th and the 6th data. ;< 79.5 2 57 Weighted Mean | Measures of Centrality Weighted Mean The weighted mean is the sample average of a data set where each values called weights, F. observation is given a relative importance numerically by a set of ∑ F* H )* The formula is )̅G ∑F 58 29 2/3/2024 Weighted Mean | Measures of Centrality Example: Elizabeth took five courses in a given semester with 5, 4, 3, 3, and 2 credit hours. The grade points she earned in these courses at the end of the semester were 3.7, 4.0, 3.3, 3.7, and 4.0, respectively. Find her GPA for that semester. 59 Weighted Mean | Measures of Centrality IJ H KJ The weighted mean is IJ KJ ∑ G0 H/0 Weights, Score, )̅G 5 3.7 18.5 ∑G0 48.7 )̅G 4 4.0 16.0 :3 )̅G 3.735 3 3.3 9.9 3 3.7 11.1 2 4.0 8 17 63.5 Her GPA is 3.735. 60 30 2/3/2024 Mode | Measures of Centrality Mode The mode of a variable is the value that occurs with the greatest frequency. A data set can have one, more than one, or no mode. It is usually used in a nominal level of measurement. It is the nominal average. We denote the mode by ;E. 61 Mode | Measures of Centrality Example 1: Find the mode for the following data set: 3, 8, 5, 6, 10, 17, 19, 20, 3, 2, 11 Solution: In the data set of this example, each value occurs once except 3, which occurs twice. Thus, the mode for this set is M0 = 3 62 31 2/3/2024 Mode | Measures of Centrality Example 2: Find the mode for the following data set: 1, 7, 19, 23, 11, 12, 1, 12, 19, 7, 11, 23 Solution: Note that in this data set, each value occurs twice. Thus, this data set does not have any mode 63 Mode | Measures of Centrality Example 3: Find the mode for the following data set: 5, 7, 12, 13, 14, 21, 7, 21, 23, 26, 5 Solution: In this data set, values 5, 7, and 21 occur twice, and the rest of the values occur only once. Thus, in this example, there are three modes, that is, M0 = 5, 7, and 21 64 32 2/3/2024 Measures of Centrality and the Shape of the Distribution The shape of a distribution can be categorized into three: 65 Measures of Centrality and the Shape of the Distribution Symmetric – A data set is symmetric when the values in the data set that lie equidistant from the mean, on either side, occur with equal frequency. Left-skewed – A data set is left-skewed when values in the data set that are greater than the median occur with relatively higher frequency than those values that are smaller than the median. The values smaller than the median are scattered to the left far from the median. Right-skewed – A data set is right-skewed when values in the data set that are smaller than the median occur with relatively higher frequency than those values that are greater than the median. The values greater than the median are scattered to the right far from the median. 66 33 2/3/2024 Measures of Dispersion Dispersion is the degree to which the data are spread out. Information about variations in the data set is provided by measures known as measures of dispersion. The three most common measures of dispersion are range, variance, and standard deviation. 67 Range | Measures of Dispersion Range The range is defined as the distance between the highest and the lowest value. Range = Largest value − Smallest value The range is not efficient measure of dispersion because it takes into consideration only the largest and the smallest values and none of the remaining observations. 68 34 2/3/2024 Range | Measures of Dispersion Example: The following data gives the tensile strength (in psi) of a sample of certain material submitted for inspection. Find the range for this data set: 8538.24, 8450.16, 8494.27, 8317.34, 8443.99, 8368.04, 8368.94, 8424.41, 8427.34, 8517.64 Solution: The largest and the smallest values in the data set are 8538.24 and 8317.34, respectively. Therefore, the range for this data set is Range = 8538.24 − 8317.34 = 220.90 69 Variance | Measures of Dispersion Variance The variance is the average value of the squared deviations from the mean. Basically, variance is a value that measures how far the observations The population variance is denoted by L 5 , while the sample variance is within a data sets deviate from their mean. denoted by M 5. Variance is expressed as a square of the units for the data values. 70 35 2/3/2024 Variance | Measures of Dispersion Variance 1 The population variance is defined as L5 ∑ )*. 5 , 1 The sample variance is defined as M5 ∑ )* )̅ 5 , 1 71 Variance | Measures of Dispersion For computational purposes, we can use the following equivalent formulas in solving for the population and sample variance 1 ∑)* 5 L 5 ∑)*5 , , 1 ∑)* 5 M 5 ∑)*5 , 1 , 72 36 2/3/2024 Standard Deviation | Measures of Dispersion Standard Deviation The standard deviation is obtained by taking the square root of the variance. We just use the same formula as the variance and take the square root to solve for standard deviation. It is a more standard measure of dispersion because the unit is the The population standard deviation is denoted by L, and the sample same as the unit of the values in the data set. standard deviation is denoted by M. 73 Measures of Dispersion Example: The following data give the length (in millimeters) of material chips removed during a machining operation: 4, 2, 5, 1, 3, 6, 2, 4, 3, 5 Determine the variance and the standard deviation for these data. 74 37 2/3/2024 Measures of Dispersion Calculate the sum of all the data values, ∑)* ∑)* 4 2 5 1 3 6 2 4 3 5 35 Calculate the sum of squares of all observations, ∑)*5 ∑)*5 45 25 55 15 35 65 25 45 35 55 145 1 ∑)* 5 1 355 Substitute all values in the formula for sample variance M 5 ∑)* 5 145 2.5 , 1 , 10 1 10 M 2.5 1.58 Take the square root to solve for the sample standard deviation 75 Empirical Rule The empirical rule shows how the standard deviation of a data set helps us measure the variability of the data. The empirical rule can be used to compute the percentage of data that will fall within k standard deviations from the mean, if the data have a distribution that is approximately bell-shaped. 76 38 2/3/2024 Empirical Rule 1. About 68% of the data will fall within one standard deviation of the mean, that is, between µ − 1σ and µ + 1σ. 2. About 95% of the data will fall within two standard deviations of the mean, that is, between µ − 2σ and µ + 2σ. 3. About 99.7% of the data will fall within three standard deviations of the mean, that is, between µ − 3σ and µ + 3σ. 77 Empirical Rule Example 1: A soft-drink filling machine is used to fill 16-oz soft-drink bottles. The amount of beverage slightly varies from bottle to bottle, and it is assumed that the actual amount of beverage in the bottle forms a bell-shaped distribution with a mean 15.8 oz and standard deviation 0.15 oz. Use the empirical rule to find what percentage of bottles contain between 15.5 and 16.1 oz of beverage. 78 39 2/3/2024 Empirical Rule Solution: From the information provided to us in this problem, we have µ = 15.8 oz and σ = 0.15 oz. We are interested in knowing the percentage of bottles that will contain between 15.5 and 16.1 oz of beverage. We can see that µ ± 2σ = 15.8 ± 2(0.15) = (15.5, 16.1). Considering the empirical rule, it seems that approximately 95% of the bottles contain between 15.5 and 16.1 oz of the beverage, since 15.5 and 16.1 are two standard deviations away from the mean 79 Empirical Rule Example 2: At the end of each fiscal year, a manufacturer writes off or adjusts its financial records to show the number of units of bad production occurring over all lots of production during the year. Suppose form a bell-shaped distribution with mean )̅ = $35,700 and standard that the dollar values associated with the various units of bad production deviation M = $2500. Find the percentage of units of bad production that has a dollar value between $28,200 and $43,200. 80 40 2/3/2024 Empirical Rule Solution: From the information provided, we have )̅ = $35,700 and M = $2500. Since the limits $28,200 and $43,200 are three standard deviations away from the mean, applying the empirical rule shows that approximately 99.7% units of the bad production has dollar value between $28,200 and $43,200. 81 Chebyshev’s Inequality If the population data have a distribution that is not bell-shaped, then we use the Chebyshev’s inequality. The Chebyshev’s inequality states that “For any D N 1, at least 1 : H 100% of the data values fall within D standard deviations OP of the mean. 82 41 2/3/2024 Chebyshev’s Inequality Example: Sodium is an important component of the metabolic panel. The average sodium level for 1000 American male adults who were tested for low sodium was found to be 132 mEq/L with a standard deviation of 3 mEq/L. Using Chebyshev’s inequality, determine at least how many of the adults tested have a sodium level between 124.5 and 139.5 mEq/L. 83 Chebyshev’s Inequality level for these adults are )̅ = 132 and M = 3. From the given information, we have that the mean and the standard deviation of sodium mEq/L, we need to determine the value of D. Since each of these values is 7.5 points away To find how many of 1000 adults have their sodium level between 124.5 and 139.5 from the mean, then using Chebyshev’s inequality, the value of k is such that kS = 7.5, so D 7.5/3 2.5 that Hence, the number of adults in the sample who have their sodium level between 124.5 and 1 139.5 mEq/L is at least 1 H 1000 840 2.5 5 84 42 2/3/2024 Numerical Measures of Grouped Data Chapter 2: Describing Data Graphically and Numerically 85 Numerical Measures of Grouped Data A set of data presented in the form of a frequency distribution table (FDT) is called grouped data. To compute the measures of centrality and dispersion of a grouped data, each measurement in a given class is approximated by its midpoint. The average of the lower and upper class limits of a class (bin) is called the class midpoint or class mark. 86 43 2/3/2024 Mean of Grouped Data Mean of a Grouped Data In order to compute the average of a grouped data set, the first step is to find the midpoint (m) of each class, which is defined as Then, the mean (.R or )̅R ) is computed as: m = (Lower limit + Upper limit)/2 ∑S* * , 87 Mean of Grouped Data Example: Find the mean of the grouped data that is the frequency distribution of a group of 40 basketball fans watching a basketball game. Class Frequency TJ 11 – 20 8 21 – 30 10 31 – 40 6 41 – 50 11 51 – 60 5 88 44 2/3/2024 Variance of Grouped Data Variance of a Grouped Data The population and sample variance of grouped data are computed by using the following formulas: : ∑U0 V0 P Population Variance: LR 5 ∑S* 5 * 1 1 : ∑U0 V0 P Sample Variance: MR 5 ∑S* 5 * 1W: 1 89 Variance of Grouped Data Example: Find the variance and standard deviation of the grouped data that is the frequency distribution of a group of 40 basketball fans watching a basketball game. Class Frequency TJ 11 – 20 8 21 – 30 10 31 – 40 6 41 – 50 11 51 – 60 5 90 45 2/3/2024 Measures of Relative Position Chapter 2: Describing Data Graphically and Numerically 91 Measures of Relative Position Measures of relative position determine the position of a single value in relation to other values in a sample or a population data set. We commonly refer to these measures of position as quantiles or fractiles. The most commonly use quantiles are the percentiles, deciles, and quartiles. 92 46 2/3/2024 Percentiles Percentiles Percentiles divide the ordered observations or score distribution into 100 equal parts; each part contains at the most 1% of the data and is numbered 1 to 99. The pth percentile, denoted XY , of a set of data is a value such that Z percent of the observations are less than or equal to the value. 93 Percentiles 1. Write the data values in an ascending order and rank them from 1 to ,. We compute the percentiles as follows: , 2. Find the rank of the pth percentile (p =1, 2,...,99), which is given by Z 100 Rank of the pth percentile 3. Find the data value that corresponds to the rank of the pth percentile using either 1 of the following: a. If the computed rank is a whole number, the value of the pth percentile is midway between the value of the computed rank, and the next value. b. If the computed rank is not a whole number, the value of the pth percentile is the value to the next whole-numbered rank. 94 47 2/3/2024 Percentiles Example: The following data give the salaries (in thousands of dollars) of 15 engineers in a corporation: 62 48 52 63 85 51 95 76 72 51 69 73 58 55 54 a) Find the 70th percentile for these data. b) Find the percentile corresponding to the salary of $60,000 95 Percentiles Part A: Find the 70th percentile for these data. Write the data values in ascending order and rank them from 1 to 15. Data 48 51 51 52 54 55 58 62 63 69 72 73 76 85 95 Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 15 Find the rank of the 70th percentile Rank 70 10.5 100 Find the data value that corresponds to the rank 11 (the next whole number from X3E 72 10.5). This means that 70% of the engineers have salaries less than $72000 96 48 2/3/2024 Percentiles Part B: Find the percentile corresponding to the salary of $60,000 Write the data values in ascending order and rank them from 1 to 15. Data 48 51 51 52 54 55 58 62 63 69 72 73 76 85 95 Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 From the sorted list of salaries, 60 can be placed between ranks 7 and 7 8. There are 7 data fewer than 60. Therefore, the percentile of 60 is D H 100 46.7 ! 47 15 Hence, the engineer who makes a salary of $60,000 is at the 47th percentile. In other words, 47% of engineers have salaries less than $60,000. 97 Quartiles Quartiles Quartiles are measures of location, denoted \: , \5 , and \8 which divide a set of data into four groups with about 25% of the values in each group. 98 49 2/3/2024 Quartiles Quartiles \: , \5 , and \8 are the 25th, 50th and 75th percentile, respectively. They are also known are the lower, middle, and upper quartiles. To determine the values of the different quartiles, one has to simply find the 25th, 50th, and 75th percentiles. 99 Quartiles Quartiles Some statistics are defined using quartiles and percentiles, as in the following: 1. Interquartile Range (IQR) \8 \: ]^ W]_ 5 2. Semi-interquartile range ]^ 6]_ 5 3. Midquartile 4. 10-90 percentile range X2E X:E 100 50 2/3/2024 Interquartile Range (IQR) Interquartile Range (IQR) The interquartile range, IQR, is the range of the middle 50% of the observations in a data set. That is, the IQR is the difference `\ \8 \: between the third and first quartiles 101 Interquartile Range (IQR) Example: The following data give the salaries (in thousands of dollars) of 15 engineers in a corporation: 62 48 52 63 85 51 95 76 72 51 69 73 58 55 54 a) Find the interquartile range 102 51 2/3/2024 Interquartile Range (IQR) Solution: Find the quartiles \: and \8 or equivalently 25th percentile and the 75th percentile Data 48 51 51 52 54 55 58 62 63 69 72 73 76 85 95 Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Rank of \: = 25 × [15/100] = 3.75 ! 4 (round up) Rank of \8 = 75 × [15/100] = 11.25 ! 12 (round up) Therefore, \: 52 and \8 73 103 Interquartile Range (IQR) `\ \8 \: 73 52 21 Hence, the IQR is `\ $21,000 Since the data is in thousands of dollars, 104 52 2/3/2024 Coefficient of Variation Coefficient of Variation The coefficient of variation is usually denoted by cd and is defined as the M ratio of the standard deviation to the mean expressed as a percentage: cd H 100% )̅ The coefficient of variation is a relative comparison of a standard deviation to its mean and is unitless. The cd is commonly used to compare the variability in two populations. 105 Coefficient of Variation Example: A company uses two measuring instruments, one to measure the diameters of ball bearings and the other to measure the length of rods it manufactures. The quality control department of the company wants to find which instrument measures with more precision. To achieve this goal, a quality control finds the sample average )̅ and the standard deviation M to be 3.84 and 0.02 mm, engineer takes several measurements of a ball bearing by using one instrument and instrument and finds the sample average )̅ and the standard deviation M to be 29.5 respectively. Then, she takes several measurements of a rod by using the other and 0.035 cm, respectively. Estimate the coefficient of variation from the two sets of measurements. 106 53 2/3/2024 Coefficient of Variation Solution: Using the formula cd: = (0.02/3.84)100% = 0.52% cd5 = (0.035/29.5)100% = 0.119% The measurements of the lengths of rod are relatively less variable than of the diameters of the ball bearings. Therefore, we can say the data show that instrument 2 is more precise than instrument 1. 107 Box-Whisker Plot Chapter 2: Describing Data Graphically and Numerically 108 54 2/3/2024 Box-Whisker Plot Box-Whisker Plot The box-whisker plot or simply box plot, is invented by J. Tukey. The box-whisker plot is an important tool in determining what values in a data set are extreme values, also known as outliers. Uses the quartiles to determine the outliers. 109 Box-Whisker Plot 110 55 2/3/2024 Box-Whisker Plot Construction of a Box Plot 1. Find the quartiles ef , eg , and eh for the given data set. 2. Draw a box with its outer lines of the box standing at the first quartile (Q1) and the third quartile (Q3), and then draw a line at the second quartile (Q2). The line at Q2 divides the box into two boxes, which may or may not be of equal size. 3. From the outer lines, draw straight lines extending outwardly up to three times the IQR and mark them as shown in the figure. Note that each distance between the points A and B, B and C, D and E, and E and F is equal to one and a one-half times distance between the points C and D, or one and one-half times IQR. The points S and L are, respectively, the smallest and largest data points that fall within the inner fences. The lines from S to C and D to L are called the whiskers. 111 How to Use the Box Plot About the Outliers 1. Any data points that fall beyond the lower and upper outer fences are the extreme outliers. These points are usually excluded from the analysis. 2. Any data points between the inner and outer fences are the mild outliers. These points are excluded from the analysis only if we are convinced that these points are somehow recorded or measured in error. 112 56 2/3/2024 How to Use the Box Plot About the Shape of the Distribution 1. If the second quartile (median) is close to the center of the box and each of the whiskers is approximately of equal length, then the distribution is symmetric. 2. If the right box is substantially larger than the left box and/or the right whisker is much longer than the left whisker, then the distribution is right- skewed. 3. If the left box is substantially larger than the right box and/or the left whisker is much longer than the right whisker, then the distribution is left-skewed. 113 Box-Whisker Plot Example: The following data gives the noise level measured in decibels (a usual conversation by humans produces a noise level of about 75 dB) produced by 15 different machines in a very large manufacturing plant: 85 80 88 95 115 110 105 104 89 87 96 140 75 79 99 Construct a box plot and examine whether the data set contains any outliers. 114 57 2/3/2024 Box-Whisker Plot Solution: We arrange the data in ascending order and rank them. We then find the ranks of the quartiles Q1, Q2, and Q3. Thus, we have (n = 15) Rank of Q1 = (25/100)(15) = 3.75 ! 4 Rank of Q2 = (50/100)(15) = 7.5 ! 8 Rank of Q3 = (75/100)(15) = 11.25 ! 12 Therefore, \: 85, \5 96, and \8 105. 115 Box-Whisker Plot The IQR is IQR = Q3 − Q1 = 105 − 85 = 20 To determine the fences, we need (1.5) × IQR = (1.5) × 20 = 30 116 58 2/3/2024 Get in Touch With Us Send us a message or visit us City of Batac, Ilocos Norte, Philippines (63) 77-600-0459 [email protected] Follow us for updates facebook.com/MMSUofficial www.mmsu.edu.ph 117 59