Chapter 3 & 4 PDF
Document Details
Uploaded by SeamlessMoonstone8761
UC San Diego
Tags
Summary
This document explains central tendency and variability in statistics, including concepts like mean, median, mode, and the empirical rule, with illustrative examples. The document also discusses different types of distributions.
Full Transcript
Chapter 3 Central Tendency Outline Central Tendency Mean Median Mode Describing Distributions 2 Central Tendency Central tendency: A measure of the center of a distribution, e....
Chapter 3 Central Tendency Outline Central Tendency Mean Median Mode Describing Distributions 2 Central Tendency Central tendency: A measure of the center of a distribution, e.g., mean, median, and mode Scientists, therapists, educators, etc. often have questions about what occurs in the middle of a distribution 3 E.g., what is the average number of symptoms experienced by individuals diagnosed with antisocial personality disorder? E.g., what is the most commonly occurring symptom of individuals with this disorder? Central tendencies answer such questions 4 Mean Mean: the sum of scores divided by the number of scores; also known as the average Population mean = μ = Σx/N μ = lowercase Greek letter mu N = population size Sample mean = M = Σx/N N = sample size 5 E.g., a recruiter for a psychotherapy school claims that the mean hourly rate charged by their psychotherapists is $500/hour Given the following sample of hourly rates: $100 $100 $100 $100 $2100 The mean is indeed $500 but the median and mode are only $100. The claim is accurate but misleading. 6 Advantage of the mean: Upon repeated sampling from a population, sample means give a more stable/consistent estimate of the population mean than sample modes or sample medians I.e., the sample means will be more similar to each other than the sample medians or modes Disadvantage of the mean: It may misrepresent the central tendency 7 Median Median: the middle number in an odd set of ordered numbers, and the average of the middle two numbers in an even set of ordered numbers E.g., calculate the median of the odd set: 1, 0, 5, 4, 6 (1) Order the numbers: 0, 1, 4, 5, 6 (2) Calculate the median location: (N + 1)/2 = (5 + 1)/2 = 3 (3) Count to the median: The 3rd number is the median, which is 4. 8 E.g., calculate the median of the even set: 2, 8, 0, 6, 4, 5 (1) Order the numbers: 0, 2, 4, 5, 6, 8 (2) Calculate the median location: (N + 1)/2 = (6 + 1)/2 = 3.5 (3) The middle two numbers are the 3rd and 4th numbers; the median is the average of the 3rd and 4th numbers: (4 + 5)/2 = 4.5 9 E.g., a recruiter for a psychotherapy school claims that the median hourly rate charged by their psychotherapists is $500/hour Given the following sample of hourly rates: $0 $50 $500 $500 $500 The median is indeed $500, but the mean is only $310. The claim is accurate but misleading. 10 Advantage of the median: It’s not affected by extreme scores as much as the mean Disadvantage of the median: It may misrepresent the central tendency 11 Mode Mode: The most commonly occurring score or value Calculate the mode for the following data: 1, 2, 2, 2, 3, 4 = 2 1, 2, 2, 3, 4, 4 = 2, 4 blue, blue, pink, pink, gray, gray, gray = gray 12 E.g., a recruiter for a psychotherapy school claims that the mode hourly rate charged by their psychotherapists is $500/hour Given the following sample of hourly rates: $100 $125 $150 $500 $500 The mode is indeed $500, but the mean is only $275, and the median is $150. The claim is accurate but misleading. 13 Advantages of the mode: It’s a score that actually happened, which is not necessarily true for mean and median It can be applied to nominal data; mean and median cannot Disadvantage of the mode: It may misrepresent the central tendency 14 Describing Distributions Graphed distributions vary in skew (symmetry) and kurtosis (pointedness) Positively skewed distribution: a distribution that trails off to the right Negatively skewed distribution: a distribution that trails off to the left 15 The median is typically but not always between the mean and mode of skewed distributions 16 The most common distribution in the theory of statistics is the normal distribution; the frequency of many behaviors are represented by a normal distribution Normal distribution: a theoretical distribution with data that are symmetrically distributed above and below the mean, median, and mode at the center of the distribution 17 A perfectly normal distribution has no skew, i.e., it’s perfectly symmetrical and is unimodal (one mode), though kurtosis may vary 18 19 Actual distributions are often not represented by smooth curves 20 How do distributions, central tendencies, and scales of measurement relate to one another? The mean is typically used to describe data that are normally distributed The median is typically used to describe data that are skewed The mode is typically used to describe data with modal distributions 21 Chapter 4 Variability Outline Range Variance Standard Deviation The Empirical Rule 23 Range Scores in a distribution vary; they are dispersed or spread out on the x axis The mean of 9 and 11 is 10. The mean of 5 and 15 is also 10, but these two scores are more dispersed from the mean. The spread or dispersion of scores is measured in multiple ways, e.g., by calculating the range Range: the difference between highest and lowest score 24 The range is most informative for data sets without outliers Outlier: an extreme score that falls substantially above or below most of the scores in a data set E.g., the range of: 1, 2, 3, 4, 5 is 5 – 1 = 4 E.g., the range of: 1, 2, 3, 4, 100 is 100 – 1 = 99 The latter set includes an outlier (100) and creates a range that may give a false impression that there are scores in the 10s, 20s, 30s, etc. 25 Variance Another measure of the dispersion of scores is variance Variance: the average squared distance that scores deviate from their mean 26 E.g., a population with two scores, 9 and 11, have a mean of 10 9 deviates from 10 by -1; the squared deviation = -12 = 1 11 deviates from 10 by +1; the squared deviation= +12 = 1 The average squared deviation = (1 + 1)/2 = 1, hence population variance = 1 in this case 27 Definitional formulas for population and sample variances: ∑( ) Population variance: σ = σ = lowercase Greek letter sigma ∑( ) Sample variance: SD = The numerators above refer to the sum of squares (SS) 28 Sum of squares (SS): the sum of the squared deviations of scores from their mean So, the formulas on the previous slide could also be expressed as: Population variance = SS/N and sample variance = SS/N – 1 29 N − 1 in the sample variance denominator = the degrees of freedom for the sample variance Degrees of freedom (df): the number of independent pieces of information that are free to vary, minus the number of mathematical restrictions E.g., if three numbers add up to 10, and two of the numbers are 2 and 3, the third number must be 5; it is not free to vary. Thus, there are 2 degrees of freedom in this case. 30 The definitional formulas are prone to rounding errors, so we use computational formulas: ( ) Population variance: σ = ( ) Sample variance: SD = 31 Standard Deviation Another measure of the dispersion of scores is the standard deviation Standard deviation: the average distance that scores deviate from their mean It’s easier to think in terms of standard deviation (the average distance that scores deviate from their mean) than variance (the average squared distance that scores deviate from their mean) 32 The standard deviation is the square root of the variance; the variance is the square of the standard deviation (Σx) Population standard deviation: Σx − σ = N N (Σx) Sample standard deviation: Σx − SD = N N−1 33 Calculating the SD using data fabricated to simulate the finding that average faces are more attractive and non-average faces (Langlois & Roggman, 1990). 34 Each pic in Set 4 Set 4 Rating Set 32 Rating is an average of 4 Pic 1 1 Pic 9 4 faces Pic 2 5 Pic 10 5 Pic 3 1 Pic 11 5 Each pic in Set 32 Pic 4 5 Pic 12 4 is an average of 32 Pic 5 4 Pic 13 3 faces Pic 6 2 Pic 14 3 Pic 7 2 Pic 15 3 Higher rating = Pic 8 4 Pic 16 5 more attractive Mean: 3 Mean: 4 35 Calculate SD for Set 4: 36 Values from the previous slide are entered below. Note there are 8 scores in Set 4. ( ) SD = SD = SD = 1.69 The Set 4 SD = 1.69. 37 The Empirical Rule The standard deviation of normally distributed data is useful for understanding the percent of scores that exist within a certain range from the mean because of the empirical rule Empirical rule: the rule that for data that are normally distributed, approximately: 68% of data lie within 1 standard deviation of the mean 95% of data lie within 2 standard deviations of the mean 99.7% of data lie within 3 standard deviations of the mean 38 E.g., given a normally distributed population with M = 12, SD = 4: 68% of scores range from 8 to 16 95% of scores range from 4 to 20 99.7% of scores range from 0 to 24 39