Summary

This document covers measures of variability, including what variability is, how important measuring it is, and the different types of measures. Various examples like calculating variability and finding standard deviations are included.

Full Transcript

Measures of Variability What is variability? The dispersion or spread of scores in a distribution "̅...

Measures of Variability What is variability? The dispersion or spread of scores in a distribution "̅ s "̅ 0 10 20 30 40 50 60 70 80 90 100 Anxiety score (x) This graph displays three examples of normal distributions of raw data collected using a measure of anxiety. Each distribution has a mean (30, 50, 50) and each distribution has a standard deviation (s), which is a measure of the average amount each data point differs from the mean of the distribution. Question: Answer: Why is it so important to measure Knowing the amount of variability in our data is how much variability there is in a ABSOLUTELY CRITICAL population and/or a set of data? to understanding almost everything we do in this course. Measures of variability can provide very useful information about a set of data How satisfied are you with your Hamilton education? (from 1 = not at all satisfied to 13 = very satisfied) Sample #1 Sample #2 1 5 3 The sum and the 7 5 mean do not reveal 7 how different these 7 7 sets of data are. 9 7 11 7 13 But when we examine the 9 variability by calculating the Σ " = 49 standard deviation (s), Σ " = 49 we can see that these two "̅ = 7 samples are quite different "̅ = 7 s = 4.32 from one another. s = 1.15 The data in sample 1 is highly variable compared to the data in sample 2. The standard deviation provides a measure of the AVERAGE amount that all the data points are from the mean of the data set. We say “On average, scores in Sample 1 deviate 4.32 scale units from the mean” Additional general info about variability & standard deviation s -1sd X +1sd € 12 x 82 x Let’s go back to our texting example to explore variability College College first seniors year students Frequency histogram x f x f 0 3 20 0 0 1 5 1 0 2 8 2 0 3 10 15 3 0 4 9 4 0 Frequency 5 7 5 0 10 6 5 6 1 7 3 7 12 8 0 8 20 5 9 0 9 14 10 0 10 3 $̅ 3.46 0 0 0 1 2 3 4 5 6 7 8 9 10 $̅ 8.12 %& 1.89 Number of text messages %& 0.92 Main goals: By the end of these 4 lectures, you should be able to: 1. Calculate the mean and standard deviation for a data set. 2. ‘Map’ the mean and standard deviation onto a distribution curve. 3. Convert the raw data into z scores. 4. Answer questions about how ‘unusual’ a particular score is within a distribution. How is variability measured? 1. Range 2. Interquartile range On your own (OYO) Every 3. Sum of squares (SS) single 4. Variance (s2) one of these 5. Standard deviation (s) matters A LOT Range n highest score – lowest score Distribution #1 Distribution #2 1 5 3 7 5 7 7 7 9 7 11 7 13 9 range: 13 – 1 = 12 range: 9 – 5 = 4 Range n highest score – lowest score n Limitations? – Ignores a lot of information – Susceptible to extreme scores Example: Number of absences/semester Class size = 12 0 1 1 1 2 2 2 2 3 3 4 16 Range: 16 – 0 = 16 Does the range adequately characterize the variability in this distribution? Interquartile Range (IQR) On your own (OYO) n Range of the middle 50% of scores n Why better? Student absences example: 0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 16 IQR: 3 – 1 = 2 How to compute IQR when On your own the number of scores is not easily (OYO) divisible by 4 Data: Number of hours of exercise per week in a sample of 14 students (N = 14) not on an athletic team 0 1 1 2 2 3 3 4 6 7 8 10 12 14 Q2 = 3.5 Step 1: Find the median (Q2): '+1 14 + 1 Position #: = = 7.5 2 2 3+4 Average of scores in the 7th 8th position: = 3.5 2 How to compute IQR when On your own the number of scores is not easily (OYO) divisible by 4 Data: Number of hours of exercise per week in a sample of 14 students (N = 14) not on an athletic team 0 1 1 2 2 3 3 4 6 7 8 10 12 14 Q1 = 2 Q2 = 3.5 Q3 = 8 Step 1: Find the median (Q2): 3.5 Step 2: Find Q1: the middle of the bottom half of scores Step 3: Find Q3: the middle of the top half of scores Step 4: IQR = Q3 – Q1 = 8 – 2 = 6 How is variability measured? 1. Range 2. Interquartile range 3. Sum of squares 4. Variance 5. Standard deviation Three really important measures of variability that we will use throughout this entire course n How far (on average) each score is from the mean Population Sample parameter statistic Sum of squares SS SS Variance s2 s2 Standard deviation s s “sigma” Preview: where we are going this lecture 1 Select the survey question 2 Collect and organize data, calculate a *sum of squares* (SS) This example = sum of squares (definitional formula) & 2 SS = ∑ X − X This example = population variance (definitional form.) 3 Calculate the variance (s2 or s2) This example = sample std deviation (computational) 2 ∑X 2 ∑X − 4 Calculate the standard deviation (s or s) s= N N−1 Optimism Example Survey question: How optimistic are you? 1 2 3 4 5 6 extremely extremely pessimistic optimistic Sample (N = 8) 1 2 Note: If we want to know 3 how far each 4 individual score (x) is 5 from the mean (! "), first, we need to 5 calculate the mean 6 THM: In these next slides, 6 we are building our way to understanding the Σ $ = 32 definitional formula for the sum of squares "̅ = 4 & 2 SS = ∑ X − X Next: subtract the mean from each sample value X # X # X − X 1 4 2 4 3 4 4 4 5 4 5 4 6 4 6 4 Next: subtract the mean from each sample value X # X X − X# Question: 1 4 -3 What happens if you add up all 2 4 -2 these deviations from the 3 4 -1 mean? 4 4 0 5 4 1 Answer: 5 4 1 The sum is equal to zero. 6 4 2 That’s not an 6 4 2 accurate representation of the variability in this sample! You can also think about it like this: 6 (difference from mean) 5 Optimism Rating 4 3 2 1 1 2 3 4 5 6 7 8 Participant The sum of the deviations below the mean equal the sum of the deviations above the mean (that’s the definition of the mean). X # X # X − X Reminder/big picture of 1 4 -3 whatQuestion: we are trying to do: Sotowhat Trying come up with a 2 4 -2 valuecanthat we do describes how much variability there is 3 4 -1 instead? in a sample 4 4 0 5 4 1 Answer: 5 4 1 Square the 6 4 2 deviations! 6 4 2 Answer: square the deviations (then sum them) # X # X − X # X−X ! This is the equation for the sum of squares SIDENOTE: This is not an & 2 SS = ∑ X − X = 24 (definitional form.) “of course!” kind of thing. Statisticians made a decision. Another option would have been to take the absolute value – but that cannot be “undone”… stay tuned. Let’s pause here and chat a bit about the Sum of Squares SS = ∑ X − X5 2 n SS = the sum of the squared deviations from the mean n Can the SS ever be negative? – No! (because it is squared) n Can the SS ever be zero? – Yes! (e.g., data set with all the same numbers) n Drawback of SS – The value of the SS depends on sample size. (SS gets bigger as N increases) SIDENOTE: This is not a good thing for a statistical measure Next: How can we fix this ever-increasing SS size issue? Answer: Instead of using the sum of squares, calculate the average of the ss, which is called the variance (' ! or ( ! ). n The average squared deviation from the mean 2 SS Next slide, instead of Population σ = “SS”, we use the N formula for SS here In this class, we’ll Sample 2 SS use the sample s = variance A LOT N−1 Pause: Why do we divide by N – 1 instead of N? Dividing by N to calculate the sample variance NOTE: This results in an underestimate of the true population makes the variance (so we don’t do that). sample variance (s2) an unbiased Good news, though, when calculating the sample variance, estimate of the subtracting 1 from N (as a correction factor) makes the population denominator a bit smaller, which makes the variance a bit variance. TBC… bigger. This yields a better estimate of the population variance. Variance (% ! or ( ! ) n The average squared deviation from the mean 2 SS ∑ 9−: # Population σ = 2 σ = N N 2 SS # SS = Sample s = 2 ∑ 9−; N−1 s = N−1 For our optimism example (SS = 24, N = 8): ! !" Question: How can we get this ! = #$% = 3.429 ≈ ". $" measure of variability into a value that is easier to interpret? Drawback of the variance? It’s in squared units, not the original units of the survey (How optimistic are you? 1 - 7), so it’s harder to interpret the meaning of this measure of variability. Answer: Easy - peasy! Convert the variance (which is a squared value) ‘back” into the units used in the survey by taking the square root of the variance (s2 or s2). When you do this step, you’ve just calculated the standard deviation (s or s) How to calculate the standard deviation Reminder: in our optimism example (s2 = 3.429, N = 8): 3.429 = 1.85 Remember to take intermediate SD = steps to 3 decimal places! WE MADE IT! Now, (finally) we can interpret the data with a value that ‘fits’ with the original scale (e.g., 1 – 7 optimism scale): We conclude that: Scores (people’s responses) in the sample vary, on average, 1.85 scale points from the mean of value of 4. Recap of what we have covered so far: 1 (difference from mean) Optimism Rating $̅ = 1.85 Σ ' = 32 1 2 3 4 5 6 7 8 %̅ = 4 sum of 2 The mean of these 8 data squares points is 4, and the average deviation from the mean is !" 1.85 units. Overall, we would 3 variance ( ! = #$% = 3.429 ≈ +. -+ say there is high variability in this sample of participants standard 4 deviation 3.429 = 1.85 PAUSE… It’s about to get a little confusing For two reasons: 1. Using frequency tables 2. Introducing new formulas On exams and in your homework, you will sometimes be asked to compute measures of variability (i.e., ss, variance, sd) from a frequency table (not just a list of numbers). X 1 X f 2 1 1 3 2 1 4 3 1 5 4 1 5 5 2 6 6 6 2 In the examples that follow, I’ll show you how to use the frequency table when computing variability measures. Definitional vs. Computational Formulas v We have been discussing the “definitional formulas” that are used to calculate the sum of squares, variance, and st. dev. v The def. formula allows you to “see” the math unfold (good). v But the def. formula is also unwieldy and hard to work with (bad). v Luckily, we can use the “computational formula” rather than the definitional formula to solve for the sum of squares. v These formulas (CFs) are what you will use on your homework assignments and exams. Definitional vs. Computational Formulas Always use the computational formula instead of the definitional formula to solve for SS! X X2 Definitional Computational 1 1 Formula for SS Formula for SS 2 4 Σ$ ( 3 9 Σ (" − ")̅ 2 Σ$ − 2 4 16 ' (The two formulas are algebraically equivalent) 5 25 5 25 What’s the 6 36 difference 6 36 between Σ$ = 32 (SX)2 and Sx2 ? Σ$2 = 152 (32)2 152 1024 Compute the standard deviation (s) using the computational formula X X2 1 1 ∑ X 2 ∑ X2 − 2 4 s= N N−1 3 9 4 16 Please Excuse 5 25 My Dear Aunt Sally: Parentheses 5 25 Exponents 6 36 Multiplication/Division Addition/Subtraction 6 36 Step by step directions: How to use a frequency table to generate the values needed to use the computational formula to calculate the sample standard deviation. First square the x values ! (#$)!./ 0 & X s= f X*f X2 X2 * f 102 1 1 1 1 1 2 1 2 4 4 To generate (SX)2: 3 1 3 9 9 1. Multiply the x values by the 4 1 4 16 16 frequency (f) that they occur 5 2 10 25 50 2. Then sum the fx values 6 2 12 36 72 3. Then square that sum "# = 32 "#! = 152 4. (SX)2 = (32)2 Then multiply the square by frequency. To generate SX2: 1. First square the x values 2. Then multiple the square (x2) by ("#)% the frequency (f). "#$ % & s= 3. Then sum those values & 4. SX2 = 152 Step by step directions: How to use a frequency table to generate the values needed to use the computational formula to calculate the sample standard deviation. First square the x values ! (#$)!./ 0 & X f X*f X2 X2 * f s= 1 1 1 1 1 102 2 1 2 4 4 To generate (SX)2: 3 1 3 9 9 1. Multiply the x values by the 4 1 4 16 16 frequency (f) that they occur 5 2 10 25 50 2. Then sum the fx values 6 2 12 36 72 3. Then square that sum "# = 32 "#! = 152 4. (SX)2 = (32)2 Then multiply the square by frequency. To generate SX2: 1. First square the x values ("#)% 2. Then multiple the square (x2) by "#$ % the frequency (f). & s= 3. Then sum those values & 4. SX2 = 152 Last – plug your values into the computational formula and solve ($%)" &'" $ ' X f X*f X2 X2 * f s= ($) 1 1 1 1 1 2 1 2 4 4 ("" 3 1 3 9 9 )*! $ ) 4 1 4 16 16 = +$) 5 2 10 25 50 6 2 12 36 72 *+", )*! $ ) "# = 32 "#! = 152 = , )*! $)!+ = ! (#$)! , Remember to./ 0 & take s= 102 intermediate !" = steps to 3 , decimal places and round the s = 3.429 = 1.85 final answer to 2 decimal places. How to say it like a scientist… 1 2 3 4 5 6 extremely extremely pessimistic optimistic Sample (N = 8) “On average, participants were 1 moderately optimistic (mean = 2 3.43) with a standard deviation 3 of 1.85 scale units. (report skew 4 here after looking at the shape 5 of the distribution). 5 6 Does your description give a full picture of the data? In other words, have you reported a measure of central 6 tendency and a measure of variability (in words and in numbers). Is there skewness present? If so, be sure to report the median! In this example, with a sample size of 8, you could add to the statement above “The data were negatively skewed, with a median optimism rating of 4.5.” Here is an exam practice problem Imagine we asked 40 college students how many caffeinated drinks they consume per day. Here is a frequency distribution of the data: # Drinks f 0 6 1 5 2 ∑X 2 ∑X − s= N 2 13 N−1 3 8 4 5 5 3 6 0 Compute all of the following descriptive statistics that are appropriate: Mode, Median, Mean, Range, Standard Deviation. The answers are on the next slide. Don’t peek until you solve it! Imagine we asked 40 college students how And here are the answers many caffeinated drinks they consume per 2 ∑X 2 day. Here is a frequency distribution of the ∑X − data: S= N PRO TIP: N−1 # Drinks f always.make.this.chart 0 6 1 5 2 13 3 8 4 5 5 3 6 0 Compute all of the following descriptive statistics that are appropriate: - Mode, Median, Mean, - Range, Standard Deviation. drinks = 2.25 = 2.25 HEY! One of the great things about this class is that MUCH of the time, you can check your own work by entering the data into SPSS