Chapter 2 - Describing Data - PowerPoint Presentation PDF
Document Details
Uploaded by Deleted User
Carleton University
Gary Bazdell
Tags
Summary
This PowerPoint presentation covers the fundamentals of describing data, including frequency tables, relative frequency distributions, bar charts, and pie charts. The presentation is specifically focused on qualitative and quantitative aspects of such distributions; it provides examples and explanations.
Full Transcript
CHAPTER 2 DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION Power Point Presentation Prepared by Gary Bazdell, Carleton University Learning Objectives LO2- Summarize qualitative variables with frequency and relative 1 frequency tables. LO2- Display a fre...
CHAPTER 2 DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION Power Point Presentation Prepared by Gary Bazdell, Carleton University Learning Objectives LO2- Summarize qualitative variables with frequency and relative 1 frequency tables. LO2- Display a frequency table using a bar or pie chart. 2 LO2- Summarize quantitative variables with frequency and 3 relative frequency distributions. LO2- Display a frequency distribution using a histogram or 4 frequency polygon. LO2- Construct and interpret a cumulative frequency distribution. 5 Construct and describe a stem-and-leaf display. LO2- 6 CONSTRUCTING FREQUENCY TABLES L02-1 Frequency Table A frequency table is a grouping of qualitative data into mutually exclusive (distinctive) classes showing the number of observations in each class. The number of observations in each class is called the class frequency. Example: Below is a frequency table for real estate listings in Halifax by type of dwelling. Dwelling Type Number of Listings Apartment 58 House 26 Townhouse 14 Total 98 Relative Class Frequency Class frequencies can be converted to relative class frequencies to show the fraction of the total number of observations in each class. A relative frequency captures the relationship between a class total and the total number of observations. Example: Below is a relative frequency table for the real estate listings in Halifax from the previous slide. Number of Relative Type Listings Fraction Frequency Percent Apartment 58 58/98 0.5918 59.18% House 26 26/98 0.2653 26.53% Townhouse 14 14/98 0.1429 14.29% Total 98 1 1.0000 100.00% GRAPHIC PRESENTATION OF QUALITATIVE DATA L02-2 Bar Chart The most common graphic form to present a qualitative variable is a bar chart. A bar chart graphically describes a frequency table using a series of uniformly wide rectangles. The horizontal axis shows the classes corresponding to the variable of interest. The vertical axis shows the frequency or relative frequency of each of the possible outcomes. A distinguishing characteristic of a bar chart is that there is a distance or gap between the bars. Bar Chart Example: Below we see a bar chart for the Halifax real estate data. Number of Halifax Real Estate Listings by Dwelling Type 70 60 50 Number of Listings 40 30 20 10 0 Apartment House Townhouse Dwelling Type Pie Chart A pie chart is another type of chart that is useful for depicting qualitative information. A pie chart graphically describes a frequency table using a circle that is divided into slices. Each slice of the circle represents the relative frequency of each class as a percentage of the total number of observations. Pie Chart Example: The table below shows a breakdown of lottery proceeds from the Ontario Lottery and Gaming Corporation (OLG). Use of Profits Percentage Share (%) Prizes 51.8 Province of Ontario 30.3 Retailers 7.1 Operating Expenses 8.4 Government of Canada 2.4 100.0 Pie Chart Example (cont’d): Below is the pie chart for the OLG use of profits. Example – Web Page Navigation A company is interested in how easy its web page design is to navigate. Two-hundred randomly selected regular Internet users were asked to perform a search task on the web page and to rate the ease of navigation as poor, good, excellent, or awesome. The results are shown in the following table: Awesome 100 Excellent 60 Good 30 Poor 10 1. What level of measurement is used for ease of navigation? 2. Draw a bar chart for the survey results. 3. Draw a pie chart for the survey results. Solution – Web Page Navigation 1. The data are measured on an ordinal scale. That is, the scale ranks the ease of navigation from a low of “poor” to a high of “awesome.” Also, the interval between each rating is unknown so it is impossible, for example, to conclude that a rating of good is twice the value of a poor rating. 2. We can construct a bar chart using the frequency table on the previous slide. The result is below. Solution – Web Page Navigation 3. To construct a pie chart, we can use the frequency table to compute the percentage of observations associated with each class. The table and pie chart are given below. Frequency Percent (%) Awesome 100 50% Excellent 60 30% Good 30 15% Poor 10 5% Total 200 100% In-Class Exercise Social welfare NGO decided to make plantation of fruit trees near city park. Below are the number of each fruit plant to be planted. Fruit Plant Number Mango 45 Pineapple 36 Strawberry 57 Apple 62 Total 200 (a) Is the data quantitative or qualitative? Why? (b) What is the table called? (c) Develop a bar chart to depict the information. (d) Develop a pie chart using the relative frequencies. CONSTRUCTING FREQUENCY DISTRIBUTIONS: QUANTITATIVE DATA L02-3 Frequency Distribution A frequency distribution is a grouping of quantitative data into mutually exclusive classes showing the number of observations in each class. We construct a frequency distribution by using the following steps: 1. Decide on the number of classes. 2. Determine the class interval or width. 3. Set the individual class limits. 4. Tally the observations into the classes. 5. Count the number of items in each class. Example – S&P/TSX Composite Index S&P/TSX composite index historical list volume data for 42 days from Sep Oct 11, 2011 to Dec 7, 2011 is given in the below table. What are the highest and lowest volumes? Construct a frequency distribution for this data. lowest highest 234,979,2 186,585,7 222,262,0 199,379,8 239,914,3 202,470,6 76 06 31 02 83 83 294,246,3 114,691,2 264,218,7 205,585,8 330,546,2 204,172,6 36 55 06 26 04 72 188,982,3 48,895,53 182,342,5 190,776,2 236,790,4 158,833,7 37 7 63 04 66 77 195,798,2 189,951,0 136,413,0 258,625,0 286,073,2 222,924,8 66 23 27 81 24 10 201,004,8 199,922,2 165,644,4 210,858,0 231,505,1 206,240,5 92 54 72 29 90 04 282,704,1 195,617,1 192,236,9 295,017,9 185,069,4 263,435,5 Solution – S&P/TSX Composite Index Step 1: Decide on the number of classes. A useful recipe to determine the number of classes () is the “ to the rule.” This rule suggests you select the smallest number () for the number of classes such that is greater than the number of observations (). Since there were days,. If we let , then. Since is less than , we need more than classes. If we let , then. Since is greater than , so the recommended number of classes is. Solution – S&P/TSX Composite Index Step 2: Determine the class interval or width. Generally the class interval or class width is the same for all classes. The classes all taken together must cover at least the distance from the minimum value in the raw data up to the maximum value. That is, Maximum value−Minimum value 𝑖≥ 𝑘 where i is the class interval and k is the number of classes. 330 546204− 48 895 537 For our example,𝑖 ≥ =46941 778. 6 We then round up to some convenient number like a multiple of 10 or 100. For our example, we can round up to 50,000,000. Solution – S&P/TSX Composite Index Step 3: Set the individual class limits. State clear class limits so you can put each observation into only one category; that is, avoid overlapping class limits. For our example, we can use the following class limits: to 50,000,00 0 under 0 50,000,00 to 100,000,0 0 under 00 100,000,0 to 150,000,0 00 under 00 150,000,0 to 200,000,0 00 under 00 200,000,0 to 250,000,0 00 under 00 Solution – S&P/TSX Composite Index Step 4: Tally the list volumes into the classes. When all list prices are tallied, that table would appear as: to 50,000,00 0 under 0 50,000,00 to 100,000,0 0 under 00 100,000,0 to 150,000,0 00 under 00 150,000,0 to 200,000,0 00 under 00 200,000,0 to 250,000,0 00 under 00 250,000,0 to 300,000,0 00 under 00 Solution – S&P/TSX Composite Index Step 5: Count the number of items in each class. For our example, the frequency distribution is as follows: Frequen List Volume cy to 50,000,00 0 1 under 0 50,000,00 to 100,000,0 0 0 under 00 100,000,0 to 150,000,0 2 00 under 00 150,000,0 to 200,000,0 Notice that the frequency distribution 00 under allows 00 us to observe that 14 most observations are clustered 200,000,0 to between 150,000,000 to under 250,000,0 250,000,000. 15 00 under 00 250,000,0 to 300,000,0 8 00 under 00 In-Class Exercise The profit earned, in dollars, for the first quarter of last year by the 10 distributors of a refrigerator company in Ottawa is given below : $2,130 3250 2657 4000 3843 5000 3500 6500 5900 4567 (a) What are the values such as $2130 and $3250 called? (b) Using $2000 up to $2500 as the first class, $2500 up to $3000 as the second class, and so forth, organize the profits into a frequency distribution. (c) What are the numbers in the right column of your frequency distribution called? (d) Describe the distribution of profits, based on the frequency distribution. Class Intervals and Class Midpoints We will use two other terms frequently: class midpoint and class interval. Class midpoint Class interval The midpoint is halfway To determine the class between the lower limits of interval, subtract the lower two consecutive classes. limit of the class from the It is computed by adding the lower limit of the next class. lower limits of consecutive You can also determine the classes and dividing the class interval by finding the result by two. difference between consecutive midpoints. A Software Example The following is a frequency distribution, produced by MegaStat, showing the List Volume of S&P/TSX composite index historical data (in millions). List Volume Cumulative Lower Upper Midpoint Width Frequency Percent Frequency Percent 0 < 50 25 50 1 2.38% 1 2.38% 50 < 100 75 50 0 0.00% 1 2.38% 100 < 150 125 50 2 4.76% 3 7.14% 150 < 200 175 50 14 33.33% 17 40.48% 200 < 250 225 50 15 35.71% 32 76.19% 250 < 300 275 50 8 19.05% 40 95.24% 300 < 350 325 50 2 4.76% 42 100.00% 42 100.00% In-Class Exercise Jack had 83 customers in his bookstore, last Sunday. The customers spent between $43.50 and $450. Jack wants to construct a frequency distribution of the amount spent by his customers for that day. (a) How many classes would you use? (b) What class interval would you suggest? (c) What actual classes would you suggest? Relative Frequency Distribution A relative frequency distribution converts each frequency to a relative frequency. To convert a frequency distribution to a relative frequency distribution, each of the class frequencies is divided by the total number of observations. Example – S&P/TSX Composite Index The relative frequency distribution for the S&P/TSX composite index historical list volume data is given below. Relative Frequen Frequenc Found List Volume cy y by to 50,000,00 0 1 0.024 1/42 under 0 to 100,000,0 50,000,000 0 0 0/42 under 00 100,000,00 to 150,000,0 2 0.048 2/42 0 under 00 150,000,00 to 200,000,0 14 0.333 14/42 0 under 00 200,000,00 to 250,000,0 15 0.357 15/42 0 under 00 In-Class Exercise Refer to the table on the previous slide, which gives the relative frequency distribution for the List Volume of the S&P/TSX composite index historical data. (a) How many list volume were listed for 50,000,000 to under 10,000,000? (b) What percent of list volume were listed for 200,000,000 to under 250,000,000? (c) What percent of the list volume were listed at 250,000,000 or more? GRAPHIC PRESENTATION OF A FREQUENCY DISTRIBUTION L02-4 Graphic Presentation of a Frequency Distribution. We will consider three charts that will help portray a frequency distribution graphically. They are (i) the histogram, (ii) the frequency polygon, and (iii) the cumulative frequency polygon. Histogram A histogram for a frequency distribution based on quantitative data is similar to a bar chart showing the distribution of qualitative data. The classes are marked on the horizontal axis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars. One important difference between a histogram and a bar chart is that the bars in a histogram are drawn adjacent to each other to account for the fact that quantitative data may be continuous and not discrete like qualitative data. Example – S&P/TSX Composite Index Construct a histogram for the frequency distribution given below. What conclusions can you reach based on the information presented in the histogram? Midpoint (in Frequenc List Volume millions) y to 50,000,00 0 25 1 under 0 50,000,00 to 100,000,0 75 0 0 under 00 100,000,0 to 150,000,0 125 2 00 under 00 150,000,0 to 200,000,0 175 14 00 under 00 200,000,0 to 250,000,0 225 15 00 under 00 Solution– S&P/TSX Composite Index A histogram for the S&P/TSX composite index historical data is given below. 16 15 14 14 12 10 Frequency 8 8 6 4 2 2 1 1 0 0 25 75 125 175 225 275 325 List Volume (in millions) Solution– S&P/TSX Composite Index Based on the above histogram, some observations we can make are as follows: 1. The lowest list volume is between 0 and under 50,000,000. The highest list volume is between 300,000,000 to under 350,000,000. 2. The largest class frequency is the 200,000,000 to under 250,000,000 class. A total of 15 of the 42 volumes are within this volume range. 3. Twenty-nine of the list volume, or 69.05 percent, had a list volume between 150,000,000 and to under 250,000,000. Frequency Polygon A frequency polygon also shows the shape of a distribution and is similar to a histogram. It consists of line segments connecting the points formed by the intersections of the class midpoints and the class frequencies. The midpoint of each class is scaled on the X-axis and the class frequencies on the Y-axis. To complete the frequency polygon, midpoints are added to both ends of the X-axis to “anchor” the polygon at zero frequencies. Example – Frequency Polygon Construct a frequency polygon for the frequency distribution given below. What conclusions can you reach based on the information presented in the histogram? Midpoint (in Frequenc List Volume millions) y to 50,000,00 0 25 1 under 0 50,000,00 to 100,000,0 75 0 0 under 00 100,000,0 to 150,000,0 125 2 00 under 00 150,000,0 to 200,000,0 175 14 00 under 00 200,000,0 to 250,000,0 225 15 00 under 00 Solution – Frequency Polygon A frequency polygon for the S&P/TSX composite index historical data is given below. Advantages Histogram Frequency Polygon Depicts each class as a Allows us to compare rectangle, with the height of directly two or more the rectangular bar frequency distributions by representing the number in constructing one on top of each class. the other. In-Class Exercise The grade results of an exam is shown in the following frequency distribution. Percentage (%) Number of Students 50 to under 55 15 55 to under 60 10 60 to under 65 16 65 to under 70 8 70 to under 75 9 (a) Portray the grades as a histogram. (b) Portray the grades as a relative frequency polygon. (c) Summarize the important facets of the distribution (such as classes with the highest and lowest frequencies). CUMULATIVE FREQUENCY DISTRIBUTIONS L02-5 Cumulative Frequency Distribution Suppose we wish to determine the number of observations that fall below or above a certain value. We can approximate this count by developing a cumulative frequency distribution and portraying it graphically as a cumulative frequency polygon, or ogive. There are two types of cumulative frequency distributions: 1.Less-than cumulative frequency distribution. 2.More-than cumulative frequency distribution. Example – S&P/TSX Composite Index The frequency distribution of the listings from S&P/TSX composite index historical prices is given below. Construct a less-than cumulative frequency polygon. Fifty percent of the volumes observed were less than what amount? Frequen List Volume cy to 0 50,000,000 1 under 50,000,00 to 100,000,00 0 0 under 0 100,000,0 to 150,000,00 2 00 under 0 150,000,0 to 200,000,00 14 00 under 0 200,000,0 to 250,000,00 15 Solution – S&P/TSX Composite Index The less-than cumulative frequency distribution for the S&P/TSX composite index historical list volume data is given below. Cumulati ve List Volume (in Freque Frequenc millions) ncy y Found by to 0 50 1 1 1 under to 10 50 0 1 1+0 under 0 to 15 100 2 3 1+0+2 under 0 to 20 150 14 17 1+0+2+14 under 0 to 25 200 15 32 1+0+2+14+15 Solution – S&P/TSX Composite Index To begin the plotting, note that 1 observation was less than 50,000,000. Thus the first point is X = 50 and Y = 1. The coordinates for the next point are X = 100 and Y =1. The rest of the points are plotted as follows: Cumulati ve List Volume (in Frequenc millions) y Less than 50 1 Less than 100 1 Less than 150 3 Less than 200 17 Less than 250 32 Less than 300 40 Less than 350 42 Solution – S&P/TSX Composite Index The less-than cumulative frequency for the S&P/TSX composite index historical data is given below. From this graph we can estimate visually that 50% of all volume measurements are below approximately 210,000,000. Example – S&P/TSX Composite Index The frequency distribution of the listings from S&P/TSX composite index historical prices is given below. Construct a more-than cumulative frequency polygon. Eighty percent of the volumes observed were greater than what amount? Frequen List Volume cy to 0 50,000,000 1 under 50,000,00 to 100,000,00 0 0 under 0 100,000,0 to 150,000,00 2 00 under 0 150,000,0 to 200,000,00 14 00 under 0 200,000,0 to 250,000,00 Solution – S&P/TSX Composite Index The less-than cumulative frequency distribution for the S&P/TSX composite index historical list volume data is given below. Cumulati ve List Volume (in Freque Frequenc millions) ncy y to 0 50 1 42 under to 10 50 0 41 under 0 to 15 100 2 41 under 0 to 20 150 14 39 under 0 to 25 Solution – S&P/TSX Composite Index To begin the plotting, note that all 42 measurements were 0 or higher, so the first point is X = 0 and Y = 42. The coordinates for the next point are X = 50 and Y = 41. The rest of the points are plotted as follows: Cumulati ve List Volume Frequenc (in millions) y 0 or more 42 50 or more 41 100 or more 41 150 or more 39 200 or more 25 250 or more 10 300 or more 2 Solution – S&P/TSX Composite Index The more-than cumulative frequency for the S&P/TSX composite index historical data is given below. From this graph we can estimate visually that 80% of all volume measurements are at or above approximately 170,000,000. In-Class Exercise A sample of number of transactions performed per hour by 20 employees in a bank is given in following table No. of Transactions per Hour Number of Employees 4 to under 9 4 9 to under 13 7 13 to under 17 6 17 to under 21 3 (a) What is the table called? (b) Develop a less-than and more-than cumulative frequency distribution and portray the distribution in cumulative frequency polygons. (c) Based on the cumulative frequency polygon, how many employees did perform12 transactions per hour or less? Half of the employees performed how many transactions per hour? Four employees performed how much or less? STEM-AND-LEAF DISPLAYS L02-6 Stem-and-Leaf Displays A stem-and-leaf display is another way of graphically presenting a set of quantitative data. Each numerical value is divided into two parts. The leading digit(s) becomes the stem and the trailing digit becomes the leaf. The stems are located along the vertical axis, and the leaf values are stacked against each other, in order from smallest to largest, along the horizontal axis. The digits themselves give a picture of the distribution. An advantage of the stem-and-leaf display over a frequency distribution is that we do not lose the identity of each observation. Constructing a Stem-and-Leaf Display To illustrate the construction of a stem-and-leaf display, suppose we observe the ages of seven randomly selected high school students. The ages we observe are 14, 16, 16, 15, 13, 17, 15, 15. The stem value is the leading digit or digits, in this case 1. The leaves are the trailing digits. The stem is placed to the left of a vertical line and the leaf values to the right. 1|46653755 Finally, we sort the values within each stem from smallest to largest. 1|34555667 Example Listed in the table below is the number of 30-second radio advertising spots purchased by each of the 45 members of the Greater Hilltown Automobile Dealers Association last year. Organize the data into a stem-and-leaf display. Around what values do the number of advertising spots tend to cluster? What is the fewest number of spots purchased by a dealer? The largest number purchased? 96 93 88 118 128 95 113 96 108 148 156 139 142 94 105 125 155 155 112 127 117 120 112 135 132 111 125 107 139 136 119 97 87 119 133 125 120 103 113 124 138 94 103 102 143 Solution From the data in the Table, we note that the smallest number of spots purchased is 87. So we will make the first stem value 8. The largest number is 156, so we will have the stem values begin at 8 and continue to 15. The first number in Table is 96, which Stem Leaf will have a stem value of 9 and a leaf 8 78 value of 6. Moving across the top row, 9 3445667 the second value is 93 and the third is 10 233578 88. The final stem-and-leaf display is 11 122337899 given to the right: 12 00455578 13 2356899 14 238 15 556