Correlation & Pearson r Lecture PDF
Document Details
Uploaded by QuietGrossular1828
Plymouth State University
Tags
Summary
This document is a lecture on correlation and Pearson's r, a statistical method used to measure the linear relationship between two variables. Key topics covered include calculating, interpreting correlation coefficients, and using scatterplots for visual representation. The lecture notes further explain how to determine statistical significance and use tools like SPSS for analysis. It would be suitable for a statistics course focused on inferential methods.
Full Transcript
CORRELATIONS Inferential Statistics Overview Correlation coefficients Scatterplots Calculating Pearson’s r Interpreting correlation coefficients Calculating & interpreting coefficient of determination Determining statistical significance Calculating Spearman’s correl...
CORRELATIONS Inferential Statistics Overview Correlation coefficients Scatterplots Calculating Pearson’s r Interpreting correlation coefficients Calculating & interpreting coefficient of determination Determining statistical significance Calculating Spearman’s correlation coefficient Correlation Reflects the degree of relation between variables Calculation of correlation coefficient Direction + (positive) or – (negative) Strength (i.e., magnitude) Further away from zero, the stronger the relation Form of the relationship Check yourself Indicate whether the following statements suggest a positive or negative relationship: High school students with lower IQs have lower GPAs More densely populated areas have higher crime rates Heavier automobiles yield poorer gas mileage More anxious people willingly spend more time performing a simple repetitive task Scatterplots Correlation & Scatterplots Exam1 Exam2 X Y Participa 100 95 nt1 r =.91 Participa 60 65 nt2 Participa 75 80 nt3 Participa 80 85 nt4 Participa 65 60 nt5Benefits of scatterplot Form of relation Participa 60 70 nt6 Any possible outliers? Rough guess of r Participa 85 80 nt7 Correlation & Scatterplots Number GPA of Y Arrests X Participa 0 4.0 nt1 Participa 5 3.7 nt2 Participa 10 2.8 nt3 Participa 20 2.5 nt4 Participa 30 1.0 nt5 Correlation & Scatterplots Number GPA of Y Arrests X Participa 0 4.0 nt1 r = -0.98 Participa 5 3.7 nt2 4 3.5 Participa 10 2.8 3 nt3 2.5 GPA Participa 20 2.5 2 nt4 1.5 1 Participa 30 1.0 0.5 nt5 0 0 10 20 30 40 #Hours of TV Watched Per Week of times arrested Pearson’s r Formula SP r= ( SS x )( SS y ) SP = Sum of products (of deviations) SSx = Sum of Squares of X SSy = Sum of Squares of Y Pearson’s r Calculating SP 1. Find X & Y Definitional formula deviations for each individual SP X M X Y M Y 2. Find product of deviations for each individual 3. Sum the products Computational formula X Y SP XY n Example #1 Calculating SP – Definitional Formula Step 2: Multiply the SP X M X Y M Y deviations from the mean X Y X-MX Y-MY (X-MX)(Y- 2 2 2-3 = - 2-4 = - MY) 2 4 1 2 (-1)(-2) = SP 8 2-3 = - 4-4 = 0 2 3 3 1 (-1)(0) = 0 Step 3: Sum the 5 7 3-3 = 0 3-4 = - (0)(-1) = 0 products 1 (2)(3) = 6 SX = 125-3 = 2 7-4 = 3 SY = 16 Step 1: Find deviations MX = SX/n = 12/4 = for X and Y separately 3 MY = SY/n = 16/4 = 4 Example #1 Calculating SP – Computational Formula X Y SP XY n X Y XY 2 2 4 2 4 8 XY 56 (12)(16) 3 3 9 SP 56 8 5 7 35 4 SX = 12 SY = 16 MX = SX/n = 12/4 = 3 MY = SY/n = 16/4 = 4 Calculating Pearson’s r 1. Calculate SP X Y SP X M X Y M Y SP XY n 2. Calculate SS for X 2 ( X ) SS X ( X M ) 2 SS X X 2 n 3. Calculate SS for Y ( Y ) 2 2 SSY (Y M ) 2 SSY Y n 4. Plug numbers into formula SP r ( SS X )( SSY ) Calculating Pearson’s r 1. Calculate SP X Y SP X M X Y M Y SP XY n 2. Calculate SS for X 2 ( X ) SS X ( X M ) 2 SS X X 2 n 3. Calculate SS for Y ( Y ) 2 2 SSY (Y M ) 2 SSY Y n X Y 4. Plug numbers into formula 2 2 2 4 SP r 3 3 ( SS X )( SSY ) 5 7 Example #1 - Answers Calculating Pearson’s r SP ( X ) 2 r SS X 2 SS ( X M ) 2 ( SS X )( SSY ) n X Y X-MX Y-MY (X-MX)(Y- XY X2 Y2 (X- (Y- 2 2 MY) 4 4 4 MX)2 MY)2 2-3 = - 2-4 = - 2 4 1 2 (-1)(-2) = 8 4 16 1 4 2-3 = - 4-4 = 0 2 1 0 3 3 9 9 9 1 (-1)(0) = 0 5 7 35 25 49 0 1 3-3 = 0 3-4 = - (0)(-1) = 0 1 4 9 SX = 12 (2)(3) =(12 6 )2 X 42 6 SY = 16 5-3 = 2 7-4 = SS 3 MX = SX/n = 12/4 = 4 8 r .87 3 (16) 2 (6)(14) MY = SY/n = 16/4 = SSY 78 14 4 4 Pearson’s r r = covariability of X and Y variability of X and Y separately Using Pearson’s r Prediction Validity Reliability Verbal Descriptions 1) r = -.84 between total mileage & auto resale value 2) r = -.35 between the number of days absent from school & performance on a math test 3) r = -.05 between height & IQ 4) r =.03 between anxiety level & college GPA 5) r =.56 between age of schoolchildren & reading comprehension level Interpreting correlations Describe a relationship between 2 vars Correlation does not equal causation r = +.12 DirectionalityProblem Third-variable Problem Restricted range Obscures relationship r = +.70 G PA SAT Interpreting correlations Outliers Can have BIG impact on correlation coefficient Interpreting correlations Strength & Prediction Coefficient of determination r2 Proportionof variability in one variable that can be determined from the relationship w/ the other variable r =.60, then r2 =.36 or 36% 36% of the total variability in X is consistently associated with variability in Y “predicted” and “accounted for” variability Mini-Review Correlations 2 Calculation of Pearson’s r ( X ) SS X X 2 Sum of product deviations n Using Pearson’s r Verbal descriptions X Y SP XY Interpretation of Pearson’s r n SP r ( SS X )( SSY ) Example #2 Practice – Calculate Pearson’s r 1. Calculate SP X Y SP X M X Y M Y SP XY n 2. Calculate SS for X 2 ( X ) Ex 1 SS X ( X M ) 2 SS X X 2 n X Y 3. Calculate SS for Y 2 2 9 ( Y ) SSY (Y M ) 2 SSY Y 2 n 1 10 4. Plug numbers into formula 3 6 SP 0 8 r ( SS X )( SSY ) 4 2 SP = S(X-MX)(Y- SS Example #2 MY) Y X Y X-MX Y-MY (X-MX)(Y- XY X2 Y2 (X- (Y- 2 9 MY) 18 4 81 MX)2 MY)2 2-2 = 0 9-7 = 2 1 10 1-2 = - 10-7 = (0)(2) = 0 10 1 10 0 4 3 6 1 3 (-1)(3) = - 18 0 1 9 3-2 = 1 6-7 = -1 3 9 36 1 1 0 8 0 0-2 = - 8-7 = 1 (1)(-1) = - 0 64 4 1 4 2 1 8 2 16 4 4 25 4-2 = 2 2-7 (= (-2)(1) = - 10-5)(352) SP 54 16 SX = 10 5 (2)(-5) = - SS SY = 35 10 (10) 2 X MX = SX/n = 10/5 = SS X 30 10 16 2 5 r .80 MYXY 54= 35/5 = = SY/n (35) 2 (10)( 40) 7 SSY 285 40 5 Hypothesis Testing Making inferences based on sample information Is it a statistically significant relationship? Or simply chance? Conceptually - Degrees of freedom Knowing M (the mean) restricts variability in sample Score 1 score will be dependent on othersX1 = 6 X2 = 4 X3 = 2 n = 5, SX = 20 X4 = X5 5 = 3 If we know first 3 scores If we know first 4 scores Σx = 20 With n=5, there can be only 4 df Correlations – Degrees of freedom There are no degrees of freedom when our sample size is 2. When there are only two points on a scatterplot, they will fit perfectly on a straight line. 6 5 4 Depression 3 2 1 0 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Anxiety Thus, for correlations df = n – 2 Using table to determine significance Find degrees of freedom Correlations: df = n – 2 Use level of significance (e.g., a =.05) for two-tailed test to find column in Table Determine critical value Value calculated r must equal or exceed to be significant Compare calculated r w/ critical value If calculated r less than critical value = not significant APA The correlation between hours watching television and amount of aggression is not significant, r (3) = -.80, p >.05. Think about sample size Spearman correlation Used when: Ordinal data If1 variable is on ratio scale, then change scores for that variable into ranks Difference between pair of ranks 2 6 D rs 1 n(n 2 1) Example #3: Spearman correlation 1st 2nd 2 race race 6 D 4 3 rs 1 2 1 2 n(n 1) 9 8 8 6 3 5 5 4 6 7 2 1 7 9 Example #3: Spearman - Answers 1st 2nd D D2 2 race race 6 D 4 3 1 1 rs 1 2 1 2 -1 1 n(n 1) 1 1 9 8 2 4 8 6 -2 4 3 5 1 1 5 4 -1 1 6 7 1 1 2 1 -2 4 7 9 6(18) 108 2 D 18 rs 1 2 1 .85 9(9 1) 9(80) Example #4: Spearman Correlation Two movie raters each watched the same six movies. Is there are relationship between Examplethe raters’ rankings? Rater Rater 1 2 1 6 2 4 3 5 4 3 5 2 6 1 Example #4: Spearman - answers Rater Rater D D2 2 6 D 1 2 -5 25 rs 1 2 1 6 -2 4 n(n 1) 2 4 -2 4 3 5 1 1 4 3 3 9 5 2 5 25 6 1 6(68) 408 2 D 68 rs 1 2 1 .94 6(6 1) 6(35) Pearson r (from SPSS) Correlations anxiety noise anxiety Pearson Correlation 1.869** Sig. (2-tailed)..001 N 10 10 noise Pearson Correlation.869** 1 Sig. (2-tailed).001. N 10 10 **. Correlation is significant at the 0.01 level (2-tailed). Spearman rs (from SPSS) Correlations anxiety noise Spearman's rho anxiety Correlation Coefficient 1.000.872** Sig. (2-tailed)..001 N 10 10 noise Correlation Coefficient.872** 1.000 Sig. (2-tailed).001. N 10 10 **. Correlation is significant at the 0.01 level (2-tailed). Example #5: Pearson’s r 35 Participant Motivation (X) Depression (Y) 1 3 8 2 6 4 3 9 2 4 2 2 1. Sketch a scatterplot. 2. Calculate the correlation coefficient. 3. Determine if it is statistically significant at the.05 level for a 2-tailed test. 4. Write an APA format conclusion.