Week 2 Frequency Distributions PDF
Document Details
Uploaded by DevoutAgate4402
Skylar J. Laursen, MSc
Tags
Summary
This document presents a lesson on frequency distributions, covering various types of frequency distributions tables, histograms, polygons, and bar graphs. It explores how to organize and analyze data, along with calculating measures of central tendency like mean, median, and mode. The content is focused on the practical application of statistical techniques.
Full Transcript
Week 2 PSYC*1010(02) W25 Skylar J. Laursen, MSc Frequency Distributions After collecting data, the first task for a researcher is to organize and simplify the data so that it is possible to get a general overview of the results This is the goal of descriptive statistical techni...
Week 2 PSYC*1010(02) W25 Skylar J. Laursen, MSc Frequency Distributions After collecting data, the first task for a researcher is to organize and simplify the data so that it is possible to get a general overview of the results This is the goal of descriptive statistical techniques One method for simplifying and organizing data is to construct a frequency distribution 2 Frequency Distributions Frequency Distribution: an organized tabulation showing exactly how many individuals are located in each category on the scale of measurement Can be structured either as a table or as a graph, and presents the same two elements: 1. The set of categories that make up the original measurement scale 2. A record of the frequency, or number of individuals in each category 3 Frequency Distribution Tables A frequency distribution table consists of at least two columns 1. Lists the categories on the scale of measurement (X) Values are listed from highest to lowest, without skipping any 2. Frequency (f) Tallies are determined for each value (how often each X value occurs in the data set) The sum of the frequencies should equal N 4 Building a Frequency Distribution Table X f X values: 5, 3, 4, 8, 5, 2, 8, 4, 2, 6, 8, 10 5 Frequency Distribution Tables 3. A third column can be used for the proportion (p) for each category: p = f/N Because the proportions describe the frequency (f) in relation to the total number (N), they often are called relative frequencies 4. A fourth column can display the percentage of the distribution corresponding to each X value The percentage is found by multiplying p by 100 The sum of the percentage column is 100% 6 Building a Frequency Distribution Table X f p % X values: 10 1 5, 3, 4, 8, 5, 2, 8, 4, 2, 6, 8, 10 9 0 8 3 7 0 𝑝 =𝑓÷𝑁 6 1 5 2 % = 𝑝×100 4 2 3 1 2 2 1 0 Total N = 12 7 Grouped Frequency Distribution Sometimes a set of scores covers a wide range of values A list of all the X values would be too long to be a “simple” presentation of the data Grouped Frequency Distribution In a grouped table, the X column lists groups of scores, called class intervals, rather than individual values 8 Grouped Frequency Distribution The grouped frequency distribution table should have about 10 class intervals The width of each interval should be a relatively simple number such as 2, 5, 10, or 20 The bottom score is each class interval should be a multiple of the width All intervals should be the same width 9 Example: Grouped Frequency Distribution X f X values: 120 - 129 1 102, 105, 102, 41, 96, 79, 110 - 119 0 70, 120, 86, 75, 64, 54 100 - 109 3 90 - 99 1 80 - 89 1 Class Interval 70 - 79 3 Bottom Score 60 - 69 1 50 - 59 1 40 - 49 1 Width (inclusive) Total N = 12 10 Frequency Distribution Graphs X axis: Contains the score categories (X) Y axis: Contains the frequencies When the score categories consist of numerical scores from an interval or ratio scale, the graph should be either a histogram or a polygon 11 Histograms In a histogram, a bar is centered above each score (or class interval) The height of the bar corresponds to the frequency The width extends to the real limits, so that adjacent bars touch 12 Frequency 3 Distribution Histogram 2 frequency 1 0 1 2 3 4 5 6 7 8 9 10 X values 13 Polygons In a polygon, a dot is centered above each score The height of the dot corresponds to the frequency A continuous line is drawn from dot to dot to connect the series of dots The graph is completed by drawing a line down the x-axis (zero frequency) at each end of the range of scores 14 Frequency 3 Distribution Polygon frequency 2 1 0 1 2 3 4 5 6 7 8 9 10 11 X values 15 Bar Graphs Used when the score categories (X values) are measurements from a nominal or an ordinal scale A bar graph is just like a histogram, except that gaps or spaces are left between adjacent bars Nominal Scale: The space emphasizes that the scale consists of separate, distinct categories Ordinal Scale: Separate bars are used because you cannot assume that the categories are all the same size 16 Example: 10.0 Bar Graph 7.5 frequency 5.0 2.5 0.0 birds cats dogs hamsters others X values 17 Graphs for Population Distributions When you can obtain an exact frequency for each score in a population, you can construct frequency distribution graphs that are exactly the same as the histograms, polygons, and bar graphs that are typically used for samples Many population are so large that it is impossible to know the exact number of individuals (frequency) for any specific category 18 Relative Frequencies When the exact number of individuals is not known, population distributions can be shown using relative frequency instead of the absolute number of individuals for each category 19 Example: Relative Frequency 100 Males and Females in Psychology 75 frequency 50 25 0 females males X values 20 Smooth Curve If the scores in a population are measured on an interval or ratio scale, it is customary to present the distribution as a smooth curve rather than a jagged histogram or polygon The smooth curve emphasizes the fact that the distribution is not showing the exact frequency for each category 21 Smooth Curve Normal Curve: one commonly occurring population distribution The word normal refers to a specific shape that can be precisely defined by an equation 22 Example: Normal Distribution 23 Describing Frequency Distributions Researchers often simply describe a distribution by listing its characteristics Characteristics: 1. Central Tendency: measures where the center of the distribution is located 2. Variability: measures the degree to which the scores are spread over a wide range or are clustered together 3. Shape 24 Shape A graph shows the shape of the distribution Symmetrical: the left side of the graph is (roughly) a mirror image of the right side Skewed: the scores tend to pile up toward one end of the scale and taper off gradually at the other end Tail: the section where the scores taper off toward one end of a distribution 25 Positively and Negatively Skewed Distributions Positively Skewed: the scores tend to pile up on the left side of the distribution with the tail tapering off to the right Example: Wealth among citizens Negatively Skewed: the scores tend to pile up on the right side and the tails points to the left Example: Life Expectancy 26 Different Shapes for Distributions 27 Stem-and-Leaf Displays Stem-and-Leaf: provides an efficient method for obtaining and displaying a frequency distribution Each score is divided into a stem consisting of the first digit/digits, and a leaf consisting of the final digit Then, go through the list of scores, one at a time, and write the leaf for each score beside its stem 28 Stem-and-Leaf Displays The resulting display provides an organized picture of the entire distribution The number of leaves beside each stem corresponds to the frequency, and the individual leaves identify the individual scores 29 Building a Stem-and-Leaf Display Stem and Leaf Display X values: 25, 77, 38, 57, 52, 69, 64, 57, 44, 56, 52, 60, 39, 58, 58, 30, 50, 54, 51, 65 30 Central Tendency A statistical measure to determine a single score that defines the centre of a distribution Goal: to find the single score that is most typical or most representative of the entire group 31 Central Tendency “Average” or “Typical” This average value can be used to provide a simple description of an entire population or a sample Measures of central tendency are also useful for making comparisons between groups of individuals or between sets of data 32 Central Tendency There is no single, standard procedure for determining central tendency The problem is that no single measure produces a central, representative value in every situation There can be problems defining the “centre” of a distribution To deal with these problems, statisticians have developed 3 different methods for measuring central tendency Mean Median Mode 33 The Mean The sum of the scores divided by the number of scores Population: Sample: Σ𝑋 Σ𝑋 𝜇= 𝑀= 𝑁 𝑛 Example: The final grades for a fourth year Example: Professor X wants to examine psychology course at U of G are listed below. Psyc*1010 final grades at the U of G over the Find the Mean. last 4 years. They ask 5 students from each year (1st year, 2nd year, 3rd year, 4th year) to provide Scores: their final grade. Find the Mean. 83, 99, 81, 92, 93, 74, 89, 81, 82, 88, 84, 88, 69, 90, 68, 87, 87, 80, 85, 83 Scores: 75, 100, 52, 58, 82, 91, 73, 72, 60, 87, 74, 65, 80, 59, 90, 46, 60, 86, 85, 50 34 Alternative Definitions of the Mean Dividing the total equally: Think of the mean as the amount each individual received when the total (Σ𝑋) is divided equally among all the individuals (N) in the distribution The mean is a balance point: Think of the mean as a balance point for the distribution The total distance below the mean is the same as the total distance above the mean 35 Balancing Frequency Distribution X f 10 1 9 0 8 2 7 0 6 0 1 2 3 4 5 6 7 8 9 10 5 1 4 2 3 1 2 2 1 3 Total N = 12 36 The Weighted Mean Often it is necessary to combine two sets of scores and then find the overall mean for the combined group (Σ𝑋! + Σ𝑋" ) 𝑊𝑒𝑖𝑔ℎ𝑡𝑒𝑑 𝑀𝑒𝑎𝑛 = (𝑛! + 𝑛" ) To calculate the overall mean, we need 2 values: 1. The overall sum of the scores for the combined group (Σ𝑋), and 2. The total number of scores in the combined group (n) 37 The Weighted Mean Example 1: Two Samples with the same n Professor L wants to examine the weighted mean of exam scores across sections 01 and 02 of Psyc*1010 at U of G. She collects a sample of 10 students from each section and has them report their exam grade. Calculate the weighted mean for the two samples. Section 01: Section 02: 87, 49, 78, 59, 66, 42, 59, 52, 69, 44 61, 54, 43, 48, 67, 84, 48, 70, 89, 65 Σ𝑋! = 605 Σ𝑋" = 629 𝑀! = 60.50 𝑀" = 62.90 Weighted Mean (MW) = 61.70 When the two samples are the same size, the weighted mean will be halfway between the original two sample means 38 The Weighted Mean Unless there are the same number of scores for each group, the overall mean will not be halfway between the original two sample means When the samples are not the same size, one makes a larger contribution to the total group and therefore carries more weight in determining the overall mean 39 The Weighted Mean Example 2: Two Samples with different n’s Professor L wants to examine the weighted mean of exam scores across sections 01 and 03 of Psyc*1010 at U of G. She collects a sample of 15 students from section 01 and 5 students from sample 03, and has them report their exam grade. Calculate the weighted mean for the two samples. Section 01: Section 03: 60, 63, 47, 72, 80, 59, 80, 65, 48, 39, 72, 38, 77 67, 62, 72, 62, 73, 54, 59 Σ𝑋! = 605 975 Σ𝑋" = 274 629 𝑀! = 65.00 𝑀" = 54.8 Weighted Mean (MW) = 62.48 40 Alternative Methods for Calculating the Mean Frequency Table Determine the number of scores, n, by adding the frequencies Find the sum of the scores, Σ𝑋 by multiplying each X value by its frequency X f 𝑛 = Σ𝑓 10 1 𝑛= 9 0 8 3 Σ𝑋 = Σ𝑓𝑋 Σ𝑋 = 7 0 6 1 Σ𝑋 5 2 𝑀= 𝑀= 𝑛 4 2 3 1 2 2 1 0 41 Characteristics of the Mean In general, the characteristics of the mean result from the fact that every score in the distribution contributes to the value of the mean Specifically, every score adds to the total (Σ𝑋) and every score contributes one point to the number of scores (n) 1. Changing the value of any score will change the mean 42 Characteristics of the Mean 2. Adding a new score to a distribution, or removing an existing score, will usually change the mean The exception is when the new score (or the removed score) is exactly equal to the mean Original X values: Adding X values: Removing X values: 5, 3, 4, 8, 5, 2, 8, 4, 2, 5, 3, 4, 8, 5, 2, 8, 4, 2, 6, 5, 3, 4, 8, 5, 2, 8, 4, 2, 6, 6, 8, 10 8, 10, 15 8 10, 𝑀 = Σ𝑋 ÷ 𝑛 𝑀 = 6.15 𝑀=5 = 5.42 43 Characteristics of the Mean 2. Adding a new score to a distribution, or removing an existing score, will usually change the mean The exception is when the new score (or the removed score) is exactly equal to the mean Original X values: Adding an X value equal to the mean: 5, 3, 4, 8, 5, 2, 8, 4, 2, 6, 8, 5, 3, 4, 8, 5, 2, 8, 4, 2, 6, 8, 10, 5.42 10 𝑀 = Σ𝑋 ÷ 𝑛 𝑀 = 5.42 10 = (5 + 3 + 4 + 8 + 5 +2+8+4+2+6 + 8 + 10) ÷ 12 = 65 ÷ 12 44 = 5.42 Characteristics of the Mean 3. If a constant value is added to every score in a distribution, the same constant will be added to the mean Similarly, if you subtract a constant from every score, the same constant will be subtracted from the mean Original X values: Adding 2 to each X value: 5, 3, 4, 8, 5, 2, 8, 4, 2, 6, 8, 10 5, 3, 4, 8, 5, 2, 8, 4, 2, 6, 8, 10 𝑀 = Σ𝑋 ÷ 𝑛 +2 𝑀 = Σ(𝑋 + 2) ÷ 𝑛 = 5.42 = 7.42 45 Characteristics of the Mean 4. If every score in a distribution is multiplied by (or divided by) a constant value, the mean will change in the same way Original X values: Multiplying each X value by 2: 5, 3, 4, 8, 5, 2, 8, 4, 2, 6, 8, 10 5, 3, 4, 8, 5, 2, 8, 4, 2, 6, 8, 10 𝑀 = Σ𝑋 ÷ 𝑛 x2 𝑀 = Σ(2𝑋) ÷ 𝑛 = 5.42 = 10.83 46 The Median Goal: To locate the midpoint of the distribution If the scores in a distribution are listed in order from smallest to largest, the median is the midpoint of the list Defining the median as the midpoint of a distribution means that the scores are being divided into two equal-sized groups We are not locating the midpoint between the highest and lowest X values 47 The Median Calculating the Median: 1. With an odd number of scores, list the values in order and the median is the middle score in the list X values: 5, 3, 4, 8, 5, 2, 8, 4, 2, 6, 8 2. With an even number of scores, list the values in order, and the median is half-way between the middle two scores X values: 61, 98, 75, 77, 66, 75, 70, 83, 52, 53 48 The Mode The score or category that has the greatest frequency X f The only measure of central tendency that will 10 1 always correspond to an actual score in the data 9 0 8 3 The mean and median are both calculated values and 7 0 often produce an answer that does not equal any 6 1 score in the distribution 5 2 4 2 3 1 X values: 5, 3, 4, 8, 5, 2, 8, 4, 2, 6, 8, 10 2 2 1 0 49 The Mode Although a distribution will have only one mean, and only one median, it is possible to have more than one mode Bimodal: A distribution with two modes Multimodal: A distribution with more than two modes 50 A Bimodal Distribution 51 The Mean, the Median and the Mode Mean: A “balance point” – the distances above the mean have the same total as the distances below the mean Median: The middle of the distribution (in terms of scores) Mode: The score/value that occurs most often 52 Selecting a Measure of Central Tendency Extreme Scores or Skewed Distributions When a distribution has a few extreme scores, scores that are very different in value from most of the others, then the mean may not be a good representative of the majority of the distribution Because it is relatively unaffected by extreme scores, the median commonly is used when reporting the average value for a skewed distribution 53 An Extreme Score 54 Selecting a Measure of Central Tendency Undetermined Values Occasionally, you will encounter a situation in which an individual has an unknown or undetermined score This often occurs when you are measuring the number of errors (or amount of time) required for an individual to complete a task It is impossible to compute the mean for these data because of the undetermined value However, it is possible to determine the median 55 Selecting a Measure of Central Tendency Open-ended distributions When there is no upper limit (or lower limit) for one of the categories It is impossible to compute a mean for these data because you cannot find Σ𝑋 You can find the median 56 Selecting a Measure of Central Tendency Ordinal Data Many researchers believe that it is not appropriate to use the mean to describe central tendency for ordinal data When scores are measured on an ordinal scale, the median is always appropriate and is usually the preferred measure of central tendency 57 Selecting a Measure of Central Tendency When to use the Mode: Nominal Scale Always identifies an actual score and is thus useful in describing discrete variables The mode gives an indication of the shape of the distribution as well as a measure of central tendency 58 Reporting Measures of Central Tendency Measures of central tendency are commonly used in behavioural sciences to summarize and describe the results of a research study These values may be reported in text describing the results, or presented in tables or graphs Treatment Control Males 1.45 8.36 Females 3.83 14.77 59 Reporting Measures of Central Tendency Graphs can also be used to report and compare measures of central tendency The means (or medians) are displayed using a line graph, histogram, or bar graph, depending on the scale of measurement used for the independent variable 60 Reporting Measures of Central Tendency The height of a graph should be approximately two-thirds to three- quarters of its length Normally, the zero point for both the x- and y-axis is at the point where the two axes intersect However, when a value of zero is part of the data, it is common to move the zero point so that the graph does not overlap the axes 61 Type of Graphs 12 12 9 9 Mean Happiness Happiness 6 6 3 3 0 0 1 2 3 4 5 bird cat dog hamster other Number of Pets Owned Type of Pet 62 Central Tendency and the Shape of the Distribution Symmetrical Distribution: The right-hand side is a mirror image of the left-hand side The median is exactly at the centre because exactly half of the area in the graph will be on either side of the centre The mean is exactly at the centre because each score on the left side of the distribution is balanced by a corresponding score on the right If a symmetrical distribution has only one mode, it will also be in the centre of the distribution 63 Measures of Central Tendency for Skewed Distributions Skewed Disrtibutions: There is a strong tendency for the mean, median, and mode to be located in predictably different positions (especially for continuous variables) Positively Skewed: The most likely order of the 3 measures of central tendency from smallest to largest (left to right) is the mode, median, and mean Negatively Skewed: The most probably order is mean, median, and mode 64 Measures of Central Tendency for Skewed Distributions Positively Skewed Negatively Skewed 65