Organizing and Graphing Data - Chapter 2 - Statistics
Document Details

Uploaded by ComfortableHydra9348
Qassim University
Tags
Related
- Topic 2: Frequency Tables, Frequency Distributions & Graphic Presentation - Les Roches - PDF
- Frequency Distributions PDF
- MODULE 5 DATA PRESENTATION PDF
- Main Textbook - One-Dimensional Frequency Distributions PDF
- CH 2 Frequency Distributions and Graphs PDF
- Chapter 2 - Describing Data Graphically and Numerically PDF
Summary
This document introduces key concepts in statistics, covering data organization and graphing techniques. Explore frequency distributions and graphical representations of data, including constructing tables and interpreting different types of graphs. Key topics include qualitative and quantitative data analysis.
Full Transcript
1/18/2025 2.1 ORGANIZING AND GRAPHING QUALITATIVE DATA Frequency Distributions CHAPTER 2...
1/18/2025 2.1 ORGANIZING AND GRAPHING QUALITATIVE DATA Frequency Distributions CHAPTER 2 Relative Frequency and Percentage Distributions Graphical Presentation of Qualitative Data ORGANIZING and GRAPHING DATA Bar Graphs Pareto Charts Pie Charts 2 2.1 (P-33) ORGANIZING and GRAPHING QUALITATIVE DATA Types of Data Qualitative Quantitative ï€ Definition of Raw data: ï€ Raw data is a collected data in which the Words, Symbols or Numbers: Age, Height sequence information of each member of Letters: Weight, Price etc. this data is random and unranked. Junior, Senior, Single, Marks of students in Married, Sister, etc. an exam or a quiz Used with the Mode Used with: Mean, Median and Mode 3 4 1 1/18/2025 Table 2.1 (P-33): Quantitative Data Table 2.2 (P-33): Qualitative Data Collected information on the ages in years of 50 Collected information on the status of 50 students: students: J F SO SE J J SE J J J 21 19 24 25 29 34 26 27 37 33 F F J F F F SE SO SE J 18 20 19 22 19 19 25 22 25 23 J F SE SO SO F J F SE SE 25 19 31 19 23 18 23 19 23 26 SO SE J SO SO J J SO F SO 22 28 21 20 22 22 21 20 19 21 SE SE F SE J SO F J SO SO 25 23 18 37 27 23 21 25 21 24 F- Freshman SO-Sophomore, J- Junior, SE- Senior 5 Tables 2.1 and 2.2 are also called ungrouped data. 6 Example-2.1 (P-34): The following table gives responses of 30 persons who often consume donuts 2.1.2 (P-33) Frequency Distributions were asked what variety of donuts is their favorite: Glazed Filled Other Plain Glazed Other ï€ Definition Frosted Filled Filled Glazed Other Frosted ï€ A frequency distribution for qualitative data lists Glazed Plain Other Glazed Glazed Filled all categories and the number of elements that Frosted Plain Other Other Frosted Filled belong to each of the categories. Filled Other Frosted Glazed Glazed Filled Q. Construct a frequency distribution table for these data. 7 8 2 1/18/2025 2.1.3 (P-35) Relative Frequency Solution: Frequency Distribution of Favorite Donut variety and Percentage Distributions Frequency Variable column ï€ Calculating Relative Frequency of a Donut Variety Tally Frequency (f ) Category Glazed |||| ||| 8 Filled |||| || 7 Frequency of that category Category Relative frequency of a category  Sum of all frequencies Frosted |||| 5 Frequency Plain ||| 3 = ∑ Other |||| || 7 Sum = 30 Sum of frequency= ∑f 9 10 Relative Frequency and Example-2.2 (P-35) Find Relative Frequency and Percentage Distributions. Percentage Distributions cont. Relative Frequency and Percentage Distributions ï€ Calculating Percentage Donut Variety Relative Freq. Percentage % Percentage = (Relative frequency) X 100% Glazed 8/30=.267 (0.267)X100=26.7 Filled 7/30=.233 (0.233)X100=23.3 = ∑ X 100% Frosted 5/30=.167 (0.167)X100=16.7 Plain 3/30=.100 (0.100)X100=10.0 Other 7/30=.233 (0.233)X100=23.3 Sum = 1.000 Sum = 100.00% 11 12 3 1/18/2025 2.1.4 (P-35) Graphical BAR GRAPH (P-36) FREQUENCY DISTRIBUTIONS OF DONUT VARIETY Presentation of Qualitative Data Three types of graphs: 9 8 Bar Graphs 7 6 Frequency Pareto Charts 5 4 Pie Charts 3 2 ï€ Definition: Bar graph 1 0 ï€ A graph made of bars whose heights represent Glazed Filled Frosted Plain Other Categories the frequencies of respective categories is called a bar graph. 13 14 PARETO CHART (P-37) FREQUENCY DISTRIBUTIONS OF DONUT VARIETY PARETO CHART (P-37) Pareto Chart: A Pareto chart is a bar graph with Pareto Chart: bars arranged by their heights in descending order. To make a Pareto chart, arrange the bars according to their heights such that the bar with the largest height appears first on the left side, and then subsequent bars are arranged in descending order with the bar with the smallest height appearing last on the right side. 15 16 4 1/18/2025 Calculate the angle sizes for the PIE CHART (P-38) Pie Chart ï€ A circle divided into portions that represent the relative frequencies or percentages of a Donut Variety Relative Freq. Angle Size (deg.) population or a sample belonging to different Glazed 8/30=.267 (.267)X360=96.12 categories is called a pie chart. Filled 7/30=.233 (.233)X360=83.88 ï€ Angle size = Relative frequency X 360 0 Frosted 5/30=.167 (.167)X360=60.12 Plain 3/30=.100 (.100)X360=36.00 = ∑ X 3600 Other 7/30=.233 (.233)X360=83.88 Sum = 1.000 Sum = 360.00 deg. 17 18 Extra Questions (P-38) Pie chart for the percentage distribution Example-1: A sample of 100 students were asked what they intend to do after graduation. 44% wanted to work for private companies. 16% “ “ “ “ federal government. 23% “ “ “ “ local government. 17% wanted to start their own business. Construct a frequency distribution table. 19 20 5 1/18/2025 Solution-1: Frequency Distribution Table Example-2: ï€ A sample of 30 employees from large Number of Frequency Type of Employment Students (f) column companies was selected, and these employees Variable Private companies 44 were asked how stressful their jobs were. The Category Federal government 16 Frequency responses of these employees are recorded Local government 23 next where very represents very stressful, Own business 17 somewhat means somewhat stressful, and Sum = 100 none stands for not stressful at all. Sum of frequency= ∑f 21 22 Solution-2: Some what None Somewhat Very Very None (Table-A) Frequency Distribution of Stress on Job Very Somewhat Somewhat Very Somewhat Somewhat Very Somewhat None Very None Somewhat Stress on Job Tally Frequency (f) Somewhat Very Somewhat Somewhat Very None Very |||| |||| 10 Somewhat Very very somewhat None Somewhat Somewhat |||| |||| |||| 14 None |||| | 6 Q. Construct a frequency distribution table for these data. Sum = 30 23 24 6 1/18/2025 Example 2-2 Solution 2-2 ï€ Determine the relative frequency and Table B- Relative Frequency and Percentage Distributions of percentage for the data in Table-A. Stress on Job Stress on Job Relative Frequency Percentage Very 10/30 = 0.333.333(100) = 33.3% Somewhat 14/30 = 0.467.467(100) = 46.7% None 6/30 = 0.200.200(100) = 20.0% Sum = 1.000 Sum = 100.0% 25 26 Bar graph for the frequency distribution of Table-A Table C: Calculating Angle Sizes for the Pie Chart 16 14 12 Stress on Job Relative Frequency Angle Size (deg.) Frequency 10 8 Very 0.333 360 x (.333) = 119.88 6 4 Somewhat 0.467 360 x (.467) = 168.12 2 0 None 0.200 360 x (.200) = 72.00 Very Somewhat None Strees on Job Sum = 1.00 Sum = 360.00 deg. NOTE: The bar graphs for relative frequency and percentage distributions can be drawn simply by marking the relative frequencies or percentages, instead of the frequencies, on the vertical axis. 27 28 7 1/18/2025 2.2 (P-39) ORGANIZING AND Pie chart for the percentage distribution of Table B. GRAPHING QUANTITATIVE DATA Frequency Distributions Constructing Frequency Distribution Tables Relative and Percentage Distributions Graphing Grouped Data Histograms Polygons 29 30 2.2.1 (P-39) Frequency Distributions Table 2.6 Weekly Earnings of 100 employees of a Company Frequency Frequency Distributions cont. Column Weekly Earnings Number of Employees ï€ Definition: A frequency distribution for Variable (Dollars) f quantitative data lists all the classes and the 801 to 1000 4 number of values that belong to each class. 1001 to 1200 11 ï€ Data presented in the form of classes and Frequency of Third Class 1201 to 1400 39 the third class frequency distribution are called grouped 1401 to 1600 24 data. 1601 to 1800 16 ï€ NOTE: Table 2.6 is also called inclusive 1801 to 2000 6 series or class intervals. Lower limit of Upper limit of the sixth class 31 32 the sixth class 8 1/18/2025 Frequency Distributions cont. Frequency Distributions cont. ï€ Definition: (P-40) Finding Class Width or Class Size ï€ The class boundary is given by the midpoint of The difference between the lower limits of two the upper limit of one class and the lower limit consecutive classes gives the class width. The class of the next class. width is also called the class size. ï€ Lower boundary = Lower limit - 0.5 Width of a class = Lower limit of the next class - ï€ Upper boundary = Upper limit + 0.5 Lower limit of the current class OR Class width = Upper boundary – Lower boundary 33 34 Table 2.7 Class Boundaries, Class Widths, and Frequency Distributions cont. Class Midpoints for Table 2.6 (P-40) Calculating Class Midpoint or Mark Class Limits Class Boundaries Class Class Lower limit  Upper limit Width Midpoint Class midpoint or mark  2 801 to 1000 800.5 to less than 1000.5 200 900.5 1001 to 1200 1000.5 to less than 1200.5 200 1100.5 Note: The lower-class limit of a class is the smallest data 1201 to 1400 1200.5 to less than 1400.5 200 1300.5 value that can go into the class. The upper-class limit of 1401 to 1600 1400.5 to less than 1600.5 200 1500.5 a class is the largest data value that can go into 1601 to 1800 1600.5 to less than 1800.5 200 1700.5 the class. 1801 to 2000 1800.5 to less than 2000.5 200 1900.5 35 36 9 1/18/2025 2.2.2 (P-40) Constructing Constructing Frequency Frequency Distribution Tables Distribution Tables When you construct a frequency table: Usually this approximate class width is rounded to a Number of classes: We can take any number of classes convenient number, which is then used as the class width. from 5 to 20. Note that rounding this number may slightly change the Or number of classes initially intended. (Sturge’s formula) C=1 + 3.3 x Log(n) where c is the number of classes and n is the number of observations in the data set. Lower Limit of the First Class or the Starting Point Class Width (Approximate): Any convenient number that is equal to or less than the smallest value in the data set can be used as the lower Largest value - Smallest value limit of the first class. Approximate class width  Number of classes 37 38 Example: Solution-ctd. ï€ The following data give the total number of iPods® sold The minimum value is 5, and the maximum value of the by a mail order company on each of 30 days. Construct data set is 29. Then, a frequency distribution table. 29 ï€ 5 8 25 11 15 29 22 10 5 17 21 Approximate width of each class   4.8 5 22 13 26 16 18 12 9 26 20 16 Now we round this approximate width to a convenient 23 14 19 23 20 16 27 16 21 14 number, say 5. Solution: The lower limit of the first class can be taken as 5 or any number less than 5. Suppose we take 5 as the lower limit Number of classes: of the first class. Suppose we decide to group these data by using 5 classes of equal width. 37 38 10 1/18/2025 Frequency Distribution for the Data on iPods Solution-ctd. Sold To find the upper limit of the first class, subtract one from the lower limit of the first class and add it with the class width. Then continue to add the class width to both the limits to find the rest of the class limits. Then our classes will be 5 – 9, 10 – 14, 15 – 19, 20 – 24, and 25 – 29 41 39 Example 2-3 (P-41) Values of Baseball Teams, 2015 Team Value Team Value The following table gives the value (in Arizona Diamondbacks (millions of $) 840 Milwaukee Brewers (millions of $) 875 million dollars) of each of the 30 base ball Atlanta Braves 1150 Minnesota Twins 895 Baltimore Orioles 1000 New York Mets 1350 teams as estimated by Forbes magazine Boston Red Sox 2100 New York Yankees 3200 Chicago Cubs 1800 Oakland Athletics 725 (source: Forbes Magazine, April 13, 2015). Chicago White Sox 975 Philadelphia Phillies 1250 Cincinnati Reds 885 Pittsburgh Pirates 900 Construct a frequency distribution table. Cleveland Indians 825 San Diego Padres 890 Colorado Rockies 855 San Francisco Giants 2000 Detroit Tigers 1125 Seattle Mariners 1100 Houston Astros 800 St. Louis Cardinals 1400 Kansas City Royals 700 Tampa Bay Rays 605 Los Angeles Angels of Anaheim 1300 Texas Rangers 1220 Los Angeles Dodgers 2400 Toronto Blue Jays 870 Miami Marlins 650 Washington Nationals 1280 43 44 11 1/18/2025 Solution: In these data, the minimum value is 605, Suppose we take 601 as the lower limit of and the maximum value is 3200. Suppose the first class. Then our classes will be we decide to group these data using six classes of 601–1050, 1051–1500, 1501–1950, equal width. Then, 1951–2400, 2401–2850, and 2851–3300 Approximate width of each class =(3200−605)/6 = 432.5 We record these six classes in the first column of Table 2.8. Now we round this approximate width to a convenient number, say 450. The lower limit of the first class can be taken as 605 or any number less than 605. 45 46 Table 2.8 Frequency Distribution of 2.2.3 (P-42) Relative Frequency the Values of Baseball Teams, 2015 and Percentage Distributions Value of a team Number of Teams Relative Frequency and Percentage Distributions (in million $) Tally (f) 601–1050 |||| |||| |||| | 16 Frequency of that class f Relative frequency of a class   1051–1500 |||| |||| 9 Sum of all frequencies f 1501–1950 | 1 1951–2400 ||| 3 Percentage  (Relative frequency)  100% 2401–2850 0 2851–3300 | 1 N = ∑ f = 30 47 48 12 1/18/2025 Example 2-4 (P-43) Solution 2-4 Table 2.9 Relative Frequency and Percentage Distributions for ï€ Calculate the relative frequencies and Table 2.8 Value of a percentages for Table 2.8. team Class Boundaries Relative Percentage Frequency (in million $) 601 – 1050 600.5 to less than 1050.5 16/30=.533 53.3 1051 – 1500 1050.5 to less than 1500.5 9/30=.300 30.0 1501 – 1950 1500.5 to less than 1950.5 1/30=.033 3.3 1951 – 2400 1950.5 to less than 2400.5 3/30=.100 10.0 2401 – 2850 2400.5 to less than 2850.5 0/30=.000 0.0 2851 – 3300 2850.5 to less than 3300.5 1/30=.033 3.3 Sum =.999 Sum = 99.9% 49 50 2.2.4 (P-43) Graphing Grouped Data P-44 ï€ Definition: ï€ A histogram is a graph in which classes are marked on the horizontal axis and the frequencies, relative frequencies, or percentages are marked on the vertical axis. The frequencies, relative frequencies, or percentages are represented by the heights of the bars. In a histogram, the bars are drawn adjacent to each other. 51 52 13 1/18/2025 P-44 Example ï€ Table 2.9 gives the total home runs hit by all players of each of the 30 Major League Baseball teams during the 2002 season. Construct a frequency distribution table. 53 54 Table 2.9 Home Runs Hit by Major League Baseball Teams During the 2002 Season Solution Team Home Runs Team Home Runs Anaheim 152 Milwaukee 139 Number of classes: Arizona 165 Minnesota 167 Atlanta 164 Montreal 162 Suppose we decide to group these data by Baltimore 165 New York Mets 160 Boston 177 New York Yankees 223 using 5 classes of equal width. Chicago Cubs 200 Oakland 205 Chicago White Sox 217 Philadelphia 165 230 ï€ 124 Cincinnati 169 Pittsburgh 142 Approximate width of each class   21.2 Cleveland 192 St. Louis 175 5 Colorado 152 San Diego 136 Detroit 124 San Francisco 198 Florida 146 Seattle 152 Now we round this approximate width to a Houston 167 Tampa Bay 133 convenient number (say), 22. Kansas City 140 Texas 230 Los Angeles 155 Toronto 187 55 56 14 1/18/2025 Table 2.10 Frequency Distribution for the Data of Table 2.9 The lower limit of the first class can be taken as Total Home Runs Tally f 124 or any number less than 124. Suppose we take 124 as the lower limit of the first class. 124 – 145 |||| | 6 146 – 167 |||| |||| ||| 13 168 – 189 |||| 4 Then our classes will be 190 – 211 |||| 4 124 – 145, 146 – 167, 168 – 189, 190 – 211, 212 - 233 ||| 3 and 212 - 233 ∑f = 30 57 58 Example Solution 2-4 ï€ Calculate the relative frequencies and Table 2.11 Relative Frequency and Percentage Distributions for Table 2.10 percentages for Table 2.10 Total Home Relative Class Boundaries Percentage Runs Frequency 124 – 145 123.5 to less than 145.5.200 20.0 146 – 167 145.5 to less than 167.5.433 43.3 168 – 189 167.5 to less than 189.5.133 13.3 190 – 211 189.5 to less than 211.5.133 13.3 212 - 233 211.5 to less than 233.5.100 10.0 Sum =.999 Sum = 99.9% 59 60 15 1/18/2025 Figure 2.4 Relative frequency histogram for Table Figure 2.3 Frequency histogram for Table 2.10. 2.10. 15.50 Relative Frequency 12.40 Frequency 9.30 6.20 3.10 0 0 124-145 146-167 168-189 190-211 212-233 124-145 146-167 168-189 190-211 212-233 Classes 61 62 Classes Graphing Grouped Data cont. ï€ (P-45) ï€ Definition: Polygon ï€ A graph formed by joining the midpoints of the tops of successive bars in a histogram with straight lines is called a polygon. 63 64 16 1/18/2025 Frequency polygon for Table 2.8 Frequency polygon for Table 2.10 18 16 14 12 Frequency 10 8 6 4 2 0 375.5 825.5 1275.5 1725.5 2175.5 2625.5 3075.5 3525.5 Value (million $) 65 66 2.2.6 (P-49) CUMULATIVE FREQUENCY DISTRIBUTIONS Example 2-7 (P-50) ï€ Definition: ï€ Using the frequency distribution of Table ï€ A cumulative frequency distribution gives the 2.8, reproduced in the next slide, prepare a total number of values that fall below the cumulative frequency distribution for the upper boundary of each class. values of the baseball teams. ï€ A cumulative frequency distribution is constructed for quantitative data only. 67 68 17 1/18/2025 Example 2-7 Solution 2-7 Table 2.13 Cumulative Frequency Distribution of values of Value of a team No. of teams Baseball Teams, 2015 (in million $) (f) Class Limits Cumulative Frequency 601 – 1050 16 601 – 1050 16 1051 – 1500 9 601 – 1500 16 + 9= 25 1501 – 1950 1 601 – 1950 16 + 9+ 1= 26 601 – 2400 16 + 9+ 1+ 3 = 29 1951 – 2400 3 601 – 2850 16 + 9+ 1+ 3 + 0 = 29 2401 – 2850 0 601 – 3300 16 + 9+ 1+ 3 + 0 + 1 = 30 2851 – 3300 1 69 70 CUMULATIVE FREQUENCY Table 2.14 (P-51) Cumulative Relative Frequency and Cumulative Percentage DISTRIBUTIONS cont. Distributions for Values of baseball Teams, 2015 ï€ Calculating Cumulative Relative Frequency Cumulative Cumulative and Cumulative Percentage (P-50) Class Limits Relative Frequency Percentage 601 – 1050 16/30 =.5333 53.33 Cumulative frequency of a class Cumulative relative frequency  601 – 1500 25/30 =.8333 83.33 Total observations in the data set 601 – 1950 26/30 =.8667 86.67 601 – 2400 29/30 =.9667 96.67 Cumulative percentage  (Cumulative relative frequency) ï‚´ 100 601 – 2850 29/30 =.9667 96.67 601 – 3300 30/30 = 1.000 100.00 71 72 18 1/18/2025 2.2.7 (P-51) SHAPES OF HISTOGRAMS Figure 2.9 1. Symmetric Symmetric histograms 2. Skewed 3. Uniform or rectangular 73 74 Figure 2.10 (a) A histogram skewed to the right. (b) A histogram skewed to the left. Figure 2.11 A histogram with uniform distribution. (a) (b) Note: A skewed histogram is nonsymmemtric. 75 76 19 1/18/2025 Figure 2.12 (a) and (b) Symmetric frequency curves. (c) Frequency curve skewed to the right. (d) Frequency curve skewed to the left. 2.3 (P-55) STEM-AND-LEAF DISPLAYS ï€ Definition ï€ In a stem-and-leaf display of quantitative data, each value is divided into two portions – a stem and a leaf. The leaves for each stem are shown separately in a display. 77 78 Example 2-8 (P-55) Solution 2-8 ï€ The following are the scores of 30 college ï€ To construct a stem-and-leaf display for students on a statistics test: these scores, we split each score into two 75 52 80 96 65 79 71 87 93 95 parts. The first part contains the first digit, 69 72 81 61 76 86 79 68 50 92 which is called the stem. The second part 83 84 77 64 71 87 72 92 57 98 contains the second digit, which is called the leaf. Construct a stem-and-leaf display. 79 80 20 1/18/2025 Solution 2-8 Figure 2.15 Stem-and-leaf display. ï€ We observe from the data that the stems Stems for all scores are 5, 6, 7, 8, and 9 because all the scores lie in the range 50 to 98. Leaf for 52 5 2 6 Leaf for 75 7 5 8 9 81 82 Solution 2-8 Figure 2.16 Stem-and-leaf display of test scores. ï€ After we have listed the stems, we read the leaves for all scores and record them next to the corresponding stems on the right 5 2 0 7 side of the vertical line. 6 5 9 1 8 4 7 5 9 1 2 6 9 7 1 2 8 0 7 1 6 3 4 7 9 6 3 5 2 2 8 83 84 21 1/18/2025 Figure 2.17 Ranked stem-and-leaf display of test scores. Example 2-9 (P-57) ï€ The following data are monthly rents paid by a sample of 30 households selected from a 5 0 2 7 small city. 6 1 4 5 8 9 880 1081 721 1075 1023 775 1235 750 965 960 7 1 1 2 2 5 6 7 9 9 1210 985 1231 932 850 825 1000 915 1191 1035 1151 630 1175 952 1100 1140 750 1140 1370 1280 8 0 1 3 4 6 7 7 9 2 2 3 5 6 8 ï€ Construct a stem-and-leaf display for these data. 85 86 Solution 2-9 Example 2-10 (P-57) Figure 2.18 6 30 ï€ The following stem-and-leaf display is Stem-and-leaf display of rents. 7 21 75 50 50 prepared for the number of hours that 25 8 80 50 25 students spent working on computers 9 65 60 85 32 15 52 during the last month. 10 81 75 23 00 35 11 91 51 75 00 40 40 12 35 10 31 80 13 70 87 88 22 1/18/2025 Example 2-10 Solution 2-10 0 6 1 1 7 9 2 2 6 Figure 2.19 Grouped stem-and-leaf display. 3 2 4 7 8 4 1 5 6 9 9 0–2 6 * 1 7 9 * 2 6 5 3 6 8 3–5 2 4 7 8 * 1 5 6 9 9 * 3 6 8 6 2 4 4 5 7 6–8 2 4 4 5 7 * * 5 6 7 8 5 6 ï€ Prepare a new stem-and-leaf display by grouping the stems. 89 90 23