Frequency Distributions PDF
Document Details
Sara Asad
Tags
Summary
This document discusses frequency distributions, including how to organize and present data using frequency tables and graphs. It contains examples and learning checks related to the concepts.
Full Transcript
Frequency Distributions Course Instructor: Sara Asad One of the most common procedures for organizing a set of data is to place the scores in a frequency distribution. A frequency distribution is an organized tabulation of the number of individuals located in each category on the scal...
Frequency Distributions Course Instructor: Sara Asad One of the most common procedures for organizing a set of data is to place the scores in a frequency distribution. A frequency distribution is an organized tabulation of the number of individuals located in each category on the scale of measurement. A frequency distribution can be structured either as a table or as a graph, but in either case, the distribution presents the same two elements: 1. The set of categories that make up the original measurement scale. 2. A record of the frequency, or number of individuals in each category. Frequency Distribution Tables The simplest frequency distribution table presents the scale by listing the different measurement categories (X values) in a column from highest to lowest. Beside each X value, we indicate the frequency, or the number of times that particular measurement occurred in the data. It is customary to use an X as the column heading for the scores and an f as the column heading for the frequencies. An example of a frequency distribution table follows. 8 9 8 7 10 9 6 4 9 8 7 8 10 9 8 6 9 7 8 8 X f 10 2 9 5 8 7 7 3 6 2 5 0 4 1 Notice that the X values in a frequency distribution table represent the scale of measurement, not the actual set of scores. You also should notice that the frequencies can be used to find the total number of scores in the distribution. By adding up the frequencies, you obtain the total number of individuals: Σf = N Obtaining ΣX from a frequency distribution table X f ΣX = 5 + 4 + 4 + 3 + 3 + 3 + 2 + 2 + 2 +1 5 1 4 2 3 3 2 3 1 1 Caution: Doing calculations within the table works well for ΣX but can lead to errors for more complex formulas. Proportion measures the fraction of the total group that is associated with each score. In general, the proportion associated with each score is Proportion = p = f/n Because proportions describe the frequency (f) in relation to the total number (N), they often are called relative frequencies. In addition to using frequencies (f) and proportions (p), researchers often describe a distribution of scores with percentages. Learning Check Construct a frequency distribution table for the following set of scores: Scores: 3, 2, 3, 2, 4, 1, 3, 3, 5 Learning Check For this frequency X f distribution, how many individuals had a score of 5 1 X = 2? 4 2 a) 1 3 4 b) 2 2 3 c) 3 d) 4 1 2 Learning Check The following is a X f distribution of quiz scores. If a score of X = 2 or lower is 5 1 failing, then how many individuals failed the quiz? 4 2 a) 2 3 4 b) 3 2 3 c) 5 d) 9 1 2 Learning Check For the following frequency X f distribution, What is ΣX2 ? 4 1 3 2 a) 30 2 2 b) 45 1 3 c) 77 0 1 d) (17)2 = 289 Learning Check Find each value requested X f for the distribution of scores in the following table: 6 1 n 5 2 ΣX 4 2 ΣX2 3 4 Include columns for proportions and percentages in 2 3 your table 1 2 Grouped Frequency Distribution Tables When the scores are whole numbers, the total number of rows for a regular table can be obtained by finding the difference between the highest and the lowest scores and adding 1: Rows = highest – lowest + 1 When a set of data covers a wide range of values, it is unreasonable to list all the individual scores in a frequency distribution table. Consider, for example, a set of exam scores that range from a low of X = 41 to a high of X = 96. These scores cover a range of more than 50 points. If we were to list all of the individual scores from X =96 down to X =41, it would take 56 rows to complete the frequency distribution table. Although this would organize the data, the table would be long and cumbersome. Remember: The purpose for constructing a table is to obtain a relatively simple, organized picture of the data. This can be accomplished by grouping the scores into intervals and then listing the intervals in the table instead of listing each individual score. In grouped frequency distribution table, we present groups of scores rather than individual values. The groups, or intervals, are called class intervals. There are several guidelines that help guide you in the construction of a grouped frequency distribution table. The grouped frequency distribution table should have about 10 class intervals. The width of each interval should be a relatively simple number. For example, 2, 5, 10, or 20 would be a good choice for the interval width. The bottom score in each class interval should be a multiple of the width. All intervals should be the same width. They should cover the range of scores completely with no gaps and no overlaps, so that any particular score belongs in exactly one interval. you should note that after the scores have been placed in a grouped table, you lose information about the specific value for any individual score. Real Limits and Frequency Distributions The concept of real limits also applies to the class intervals of a grouped frequency distribution table. For example, a class interval of 40–49 contains scores from X = 40 to X = 49. These values are called the apparent limits of the interval because it appears that they form the upper and lower boundaries for the class interval. If you are measuring a continuous variable, however, a score of X =40 is actually an interval from 39.5 to 40.5. Similarly, X = 49 is an interval from 48.5 to 49.5. Therefore, the real limits of the interval are 39.5 (the lower real limit) and 49.5 (the upper real limit). Notice that the next higher class interval is 50–59, which has a lower real limit of 49.5. Thus, the two intervals meet at the real limit 49.5, so there are no gaps in the scale. You also should notice that the width of each class interval becomes easier to understand when you consider the real limits of an interval. Learning Check For this distribution, how many individuals had scores lower than X = 20? X f a) 2 24-25 2 b) 3 22-23 4 c) 4 20-21 6 d) Cannot be determined 18-19 3 16-17 1 Learning Check In a grouped frequency distribution one interval is listed as 20-24. Assuming that the scores are measuring a continuous variable, what is the width of this interval? a) 3 points b) 4 points c) 5 points d) 54 points Learning Check A set of scores ranges from a high of X = 48 to a low of X = 13. If these scores are placed in a grouped frequency distribution table with an interval width of 5 points, the bottom interval in the table would be a) 13-18 b) 13-19 c) 10-15 d) 10-14 Learning Check Using this frequency distribution table, how many individuals had a score of X = 73? Frequency Distribution Graphs A frequency distribution graph is basically a picture of the information available in a frequency distribution table. We consider several different types of graphs, but all start with two perpendicular lines called axes. The horizontal line is the X-axis, or the abscissa (ab-SIS-uh). The vertical line is the Y-axis, or the ordinate. The measurement scale (set of X values) is listed along the X-axis with values increasing from left to right. The frequencies are listed on the Y- axis with values increasing from bottom to top. As a general rule, the point where the two axes intersect should have a value of zero for both the scores and the frequencies. A final general rule is that the graph should be constructed so that its height (Y-axis) is approximately two-thirds to three-quarters of its length (X-axis). When the data consist of numerical scores that have been measured on an interval or ratio scale, there are two options for constructing a frequency distribution graph. The two types of graphs are called histograms and polygons. A bar graph is essentially the same as a histogram, except that spaces are left between adjacent bars. For a nominal scale, the space between bars emphasizes that the scale consists of separate, distinct categories. For ordinal scales, separate bars are used because you cannot assume that the categories are all the same size. Graphs for Population Distribution When you can obtain an exact frequency for each score in a population, you can construct frequency distribution graphs that are exactly the same as the histograms, polygons, and bar graphs that are typically used for samples. For example, if a population is defined as a specific group of N = 50 people, we could easily determine how many have IQs of X = 110. However, if we were interested in the entire population of adults in the United States, it would be impossible to obtain an exact count of the number of people with an IQ of 110. Although it is still possible to construct graphs showing frequency distributions for extremely large populations, the graphs usually involve two special features: relative frequencies and smooth curves. Relative Frequencies Although you usually cannot find the absolute frequency for each score in a population, you very often can obtain relative frequencies. For example, no one knows the exact number of male and female human beings living in USA because the exact numbers keep changing. However, based on past census data and general trends, we can estimate that the two numbers are very close, with women slightly outnumbering men. You can represent these relative frequencies in a bar graph by making the bar above female slightly taller than the bar above male. Smooth Curves When a population consists of numerical scores from an interval or a ratio scale, it is customary to draw the distribution with a smooth curve instead of the jagged, step- wise shapes that occur with histograms and polygons. The smooth curve indicates that you are not connecting a series of dots (real frequencies) but instead are showing the relative changes that occur from one score to the next. One commonly occurring population distribution is the normal curve. The word normal refers to a specific shape that can be precisely defined by an equation. Less precisely, we can describe a normal distribution as being symmetrical, with the greatest frequency in the middle and relatively smaller frequencies as you move toward either extreme. The Shape of a Frequency Distribution Rather than drawing a complete frequency distribution graph, researchers often simply describe a distribution by listing its characteristics. There are three characteristics that completely describe any distribution: shape, central tendency, and variability. Nearly all distributions can be classified as being either symmetrical or skewed. In a symmetrical distribution, it is possible to draw a vertical line through the middle so that one side of the distribution is a mirror image of the other. In a skewed distribution, the scores tend to pile up toward one end of the scale and taper off gradually at the other end. The section where the scores taper off toward one end of a distribution is called the tail of the distribution. A skewed distribution with the tail on the right-hand side is positively skewed because the tail points toward the positive (above-zero) end of the X-axis. If the tail points to the left, the distribution is negatively skewed. Learning Check A researcher records the gender and academic major for each student at a college basketball game. If the distribution of majors is shown in a frequency distribution graph, what type of graph should be used? Learning Check If the results from a research study are presented in a frequency distribution histogram, would it also be appropriate to show the same results in a polygon? Learning Check The seminar room in the library are identified by letters (A, B, C, and so on). A professor record the number of classes held in each room during the fall semester. If these values are presented in a frequency distribution graph, what kind of graph would be appropriate? a) A histogram b) A polygon c) A histogram or a polygon d) A bar graph Learning Check If a frequency distribution graph is drawn as a smooth curve, it is probably showing a __________ distribution? a) Sample b) Population c) Skewed d) Symmetrical Learning Check A group of quiz scores ranging from 4-9 are shown in a histogram. If the bars in the histogram gradually increase in height from left to right, what can you conclude about the set of quiz scores? 1. There are more high scores than there are low scores 2. There are more low scores than there are high scores 3. The height of the bar always increases as the scores increase 4. None of the above Percentiles, Percentile Ranks, and Interpolation Although the primary purpose of a frequency distribution is to provide a description of an entire set of scores, it also can be used to describe the position of an individual within the set. Because raw scores do not provide much information, it is desirable to transform them into a more meaningful form. One transformation that we consider changes raw scores into percentiles. The rank or percentile rank of a particular score is defined as the percentage of individuals in the distribution with scores equal to or less than the particular value. When a score is identified by its percentile rank, the score is called a percentile. Suppose, for example, that you have a score of X = 43 on an exam and that you know that exactly 60% of the class had scores of 43 or lower. Then your score X = 43 has a percentile rank of 60%, and your score would be called the 60th percentile. Notice that percentile rank refers to a percentage and that percentile refers to a score. Also notice that your rank or percentile describes your exact position within the distribution. Cumulative Frequency and Cumulative Percentage In the following frequency distribution table, we have included a cumulative frequency column headed by cf. For each row, the cumulative frequency value is obtained by adding up the frequencies in and below that category. For example, the score X = 3 has a cumulative frequency of 14 because exactly 14 individuals had scores of X = 3 or less. The cumulative frequencies show the number of individuals located at or below each score. To find percentiles, we must convert these frequencies into percentages. The resulting values are called cumulative percentages because they show the percentage of individuals who are accumulated as you move up the scale. They represent the percentage of individuals who are located in and below each category. you must remember that the X values in the table are usually measurements of a continuous variable and, therefore, represent intervals on the scale of measurement. Notice that each cumulative percentage value is associated with the upper real limit of its interval. Learning Check What is the 95th percentile? Learning Check What is the percentile rank for X = 3.5? Learning Check In a distribution of exam scores, which of the following would be the highest score? a) 20th percentile b) 80th percentile c) A score with a percentile rank of 15% d) A score with a percentile rank of 75% Learning Check Following are three rows from a frequency distribution table. For this distribution, what is the 90th percentile? a) X = 24.5 b) X = 25 c) X = 29 d) X = 29.5 Interpolation It is possible to determine some percentiles and percentile ranks directly from a frequency distribution table, provided the percentiles are upper real limits and the ranks are percentages that appear in the table. However, there are many values that do not appear directly in the table, and it is impossible to determine these values precisely. Because these values are not specifically reported in the table, you cannot answer the questions. However, it is possible to estimate these intermediate values by using a procedure known as interpolation. Notice that X = 7.0 is located in the interval bounded by the real limits of 6.5 and 7.5. The cumulative percentages corresponding to these real limits are 20% and 44%, respectively. These values are shown in the following table: For interpolation problems, it is always helpful to create a table showing the range on both scales. Step 1: Calculate difference between top and intermediate value score (7.5 – 7 = 0.5), difference between top and bottom score (7.5 – 6.5 = 1), and the difference between top and bottom percentages (44% - 20% = 24). Step 2: 0.5 / 1 (24) = 12 points Step 3: For the percentages, the top of the interval is 44%, so 12 points down would be 44 - 12 = 32% This is the answer. A score of X = 7.0 corresponds to a percentile rank of 32% Learning Check On a statistics exam, would you rather score at the 80th percentile th or at the 20 percentile? Stem and Leaf Display In 1977, J.W. Tukey presented a technique for organizing data that provides a simple alternative to a grouped frequency distribution table or graph (Tukey, 1977). This technique, called a stem and leaf display, requires that each score be separated into two parts: The first digit (or digits) is called the stem, and the last digit is called the leaf. For example, X = 85 would be separated into a stem of 8 and a leaf of 5. Similarly, X = 42 would have a stem of 4 and a leaf of 2. To construct a stem and leaf display for a set of data, the first step is to list all the stems in a column. The next step is to go through the data, one score at a time, and write the leaf for each score beside its stem. The number of leaves in the display shows the frequency associated with each stem. It also should be clear that the stem and leaf display has one important advantage over a traditional grouped frequency distribution. Specifically, the stem and leaf display allows you to identify every individual score in the data. Learning Check For the scores shown in the following stem and leaf display, what is the lowest score in the distribution? 9 374 a) 7 8 945 b) 15 7 7042 c) 50 6 68 d) 51 5 14 Learning Check For the scores shown in the following stem and leaf display, how many people had scores in the 70’s? a) 1 b) 2 c) 3 d) 4 Learning Check Use a stem and leaf display to organize the following distribution of scores