Correlation & Pearson r Lecture PDF

Summary

This document is a lecture on correlation and Pearson's r, a statistical method used to measure the linear relationship between two variables. Key topics covered include calculating, interpreting correlation coefficients, and using scatterplots for visual representation. The lecture notes further explain how to determine statistical significance and use tools like SPSS for analysis. It would be suitable for a statistics course focused on inferential methods.

Full Transcript

CORRELATIONS Inferential Statistics Overview  Correlation coefficients  Scatterplots  Calculating Pearson’s r  Interpreting correlation coefficients  Calculating & interpreting coefficient of determination  Determining statistical significance  Calculating Spearman’s correl...

CORRELATIONS Inferential Statistics Overview  Correlation coefficients  Scatterplots  Calculating Pearson’s r  Interpreting correlation coefficients  Calculating & interpreting coefficient of determination  Determining statistical significance  Calculating Spearman’s correlation coefficient Correlation  Reflects the degree of relation between variables  Calculation of correlation coefficient Direction + (positive) or – (negative) Strength (i.e., magnitude) Further away from zero, the stronger the relation  Form of the relationship Check yourself  Indicate whether the following statements suggest a positive or negative relationship:  High school students with lower IQs have lower GPAs  More densely populated areas have higher crime rates  Heavier automobiles yield poorer gas mileage  More anxious people willingly spend more time performing a simple repetitive task Scatterplots Correlation & Scatterplots Exam1 Exam2 X Y Participa 100 95 nt1 r =.91 Participa 60 65 nt2 Participa 75 80 nt3 Participa 80 85 nt4 Participa 65 60 nt5Benefits of scatterplot Form of relation Participa 60 70 nt6 Any possible outliers? Rough guess of r Participa 85 80 nt7 Correlation & Scatterplots Number GPA of Y Arrests X Participa 0 4.0 nt1 Participa 5 3.7 nt2 Participa 10 2.8 nt3 Participa 20 2.5 nt4 Participa 30 1.0 nt5 Correlation & Scatterplots Number GPA of Y Arrests X Participa 0 4.0 nt1 r = -0.98 Participa 5 3.7 nt2 4 3.5 Participa 10 2.8 3 nt3 2.5 GPA Participa 20 2.5 2 nt4 1.5 1 Participa 30 1.0 0.5 nt5 0 0 10 20 30 40 #Hours of TV Watched Per Week of times arrested Pearson’s r  Formula SP r= ( SS x )( SS y )  SP = Sum of products (of deviations)  SSx = Sum of Squares of X  SSy = Sum of Squares of Y Pearson’s r  Calculating SP 1. Find X & Y  Definitional formula deviations for each individual SP  X  M X Y  M Y  2. Find product of deviations for each individual 3. Sum the products  Computational formula  X Y SP  XY  n Example #1 Calculating SP – Definitional Formula Step 2: Multiply the SP  X  M X Y  M Y  deviations from the mean X Y X-MX Y-MY (X-MX)(Y- 2 2 2-3 = - 2-4 = - MY) 2 4 1 2 (-1)(-2) = SP 8 2-3 = - 4-4 = 0 2 3 3 1 (-1)(0) = 0 Step 3: Sum the 5 7 3-3 = 0 3-4 = - (0)(-1) = 0 products 1 (2)(3) = 6 SX = 125-3 = 2 7-4 = 3 SY = 16 Step 1: Find deviations MX = SX/n = 12/4 = for X and Y separately 3 MY = SY/n = 16/4 = 4 Example #1 Calculating SP – Computational Formula  X Y SP  XY  n X Y XY 2 2 4 2 4 8  XY 56 (12)(16) 3 3 9 SP 56  8 5 7 35 4 SX = 12 SY = 16 MX = SX/n = 12/4 = 3 MY = SY/n = 16/4 = 4 Calculating Pearson’s r 1. Calculate SP  X Y SP  X  M X Y  M Y  SP  XY  n 2. Calculate SS for X 2 ( X ) SS X  ( X  M ) 2 SS X  X 2  n 3. Calculate SS for Y ( Y ) 2 2 SSY  (Y  M ) 2 SSY  Y  n 4. Plug numbers into formula SP r ( SS X )( SSY ) Calculating Pearson’s r 1. Calculate SP  X Y SP  X  M X Y  M Y  SP  XY  n 2. Calculate SS for X 2 ( X ) SS X  ( X  M ) 2 SS X  X 2  n 3. Calculate SS for Y ( Y ) 2 2 SSY  (Y  M ) 2 SSY  Y  n X Y 4. Plug numbers into formula 2 2 2 4 SP r 3 3 ( SS X )( SSY ) 5 7 Example #1 - Answers Calculating Pearson’s r SP ( X ) 2 r SS  X 2  SS  ( X  M ) 2 ( SS X )( SSY ) n X Y X-MX Y-MY (X-MX)(Y- XY X2 Y2 (X- (Y- 2 2 MY) 4 4 4 MX)2 MY)2 2-3 = - 2-4 = - 2 4 1 2 (-1)(-2) = 8 4 16 1 4 2-3 = - 4-4 = 0 2 1 0 3 3 9 9 9 1 (-1)(0) = 0 5 7 35 25 49 0 1 3-3 = 0 3-4 = - (0)(-1) = 0 1 4 9 SX = 12 (2)(3) =(12 6 )2 X 42  6 SY = 16 5-3 = 2 7-4 = SS 3 MX = SX/n = 12/4 = 4 8 r .87 3 (16) 2 (6)(14) MY = SY/n = 16/4 = SSY 78  14 4 4 Pearson’s r  r = covariability of X and Y variability of X and Y separately Using Pearson’s r  Prediction  Validity  Reliability Verbal Descriptions  1) r = -.84 between total mileage & auto resale value  2) r = -.35 between the number of days absent from school & performance on a math test  3) r = -.05 between height & IQ  4) r =.03 between anxiety level & college GPA  5) r =.56 between age of schoolchildren & reading comprehension level Interpreting correlations  Describe a relationship between 2 vars  Correlation does not equal causation r = +.12 DirectionalityProblem Third-variable Problem  Restricted range  Obscures relationship r = +.70 G PA SAT Interpreting correlations  Outliers  Can have BIG impact on correlation coefficient Interpreting correlations  Strength & Prediction  Coefficient of determination r2 Proportionof variability in one variable that can be determined from the relationship w/ the other variable  r =.60, then r2 =.36 or 36%  36% of the total variability in X is consistently associated with variability in Y  “predicted” and “accounted for” variability Mini-Review  Correlations 2  Calculation of Pearson’s r ( X ) SS X  X 2  Sum of product deviations n  Using Pearson’s r  Verbal descriptions  X Y SP  XY   Interpretation of Pearson’s r n SP r ( SS X )( SSY ) Example #2 Practice – Calculate Pearson’s r 1. Calculate SP  X Y SP  X  M X Y  M Y  SP  XY  n 2. Calculate SS for X 2 ( X ) Ex 1 SS X  ( X  M ) 2 SS X  X 2  n X Y 3. Calculate SS for Y 2 2 9 ( Y ) SSY  (Y  M ) 2 SSY  Y 2  n 1 10 4. Plug numbers into formula 3 6 SP 0 8 r ( SS X )( SSY ) 4 2 SP = S(X-MX)(Y- SS Example #2 MY) Y X Y X-MX Y-MY (X-MX)(Y- XY X2 Y2 (X- (Y- 2 9 MY) 18 4 81 MX)2 MY)2 2-2 = 0 9-7 = 2 1 10 1-2 = - 10-7 = (0)(2) = 0 10 1 10 0 4 3 6 1 3 (-1)(3) = - 18 0 1 9 3-2 = 1 6-7 = -1 3 9 36 1 1 0 8 0 0-2 = - 8-7 = 1 (1)(-1) = - 0 64 4 1 4 2 1 8 2 16 4 4 25 4-2 = 2 2-7 (= (-2)(1) = - 10-5)(352) SP 54   16 SX = 10 5 (2)(-5) = - SS SY = 35 10 (10) 2 X MX = SX/n = 10/5 = SS X 30  10  16 2 5 r .80 MYXY  54= 35/5 = = SY/n (35) 2 (10)( 40) 7 SSY 285  40 5 Hypothesis Testing  Making inferences based on sample information  Is it a statistically significant relationship? Or simply chance? Conceptually - Degrees of freedom  Knowing M (the mean) restricts variability in sample Score  1 score will be dependent on othersX1 = 6 X2 = 4 X3 = 2  n = 5, SX = 20 X4 = X5 5 = 3  If we know first 3 scores  If we know first 4 scores Σx = 20 With n=5, there can be only 4 df Correlations – Degrees of freedom  There are no degrees of freedom when our sample size is 2. When there are only two points on a scatterplot, they will fit perfectly on a straight line. 6 5 4 Depression 3 2 1 0 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Anxiety  Thus, for correlations df = n – 2 Using table to determine significance  Find degrees of freedom  Correlations: df = n – 2  Use level of significance (e.g., a =.05) for two-tailed test to find column in Table  Determine critical value  Value calculated r must equal or exceed to be significant  Compare calculated r w/ critical value  If calculated r less than critical value = not significant  APA  The correlation between hours watching television and amount of aggression is not significant, r (3) = -.80, p >.05. Think about sample size Spearman correlation  Used when:  Ordinal data If1 variable is on ratio scale, then change scores for that variable into ranks Difference between pair of ranks 2 6 D rs 1  n(n 2  1) Example #3: Spearman correlation 1st 2nd 2 race race 6 D 4 3 rs 1  2 1 2 n(n  1) 9 8 8 6 3 5 5 4 6 7 2 1 7 9 Example #3: Spearman - Answers 1st 2nd D D2 2 race race 6 D 4 3 1 1 rs 1  2 1 2 -1 1 n(n  1) 1 1 9 8 2 4 8 6 -2 4 3 5 1 1 5 4 -1 1 6 7 1 1 2 1 -2 4 7 9 6(18) 108 2  D 18 rs 1  2 1  .85 9(9  1) 9(80) Example #4: Spearman Correlation  Two movie raters each watched the same six movies. Is there are relationship between Examplethe raters’ rankings? Rater Rater 1 2 1 6 2 4 3 5 4 3 5 2 6 1 Example #4: Spearman - answers Rater Rater D D2 2 6 D 1 2 -5 25 rs 1  2 1 6 -2 4 n(n  1) 2 4 -2 4 3 5 1 1 4 3 3 9 5 2 5 25 6 1 6(68) 408 2  D 68 rs 1  2 1  .94 6(6  1) 6(35) Pearson r (from SPSS) Correlations anxiety noise anxiety Pearson Correlation 1.869** Sig. (2-tailed)..001 N 10 10 noise Pearson Correlation.869** 1 Sig. (2-tailed).001. N 10 10 **. Correlation is significant at the 0.01 level (2-tailed). Spearman rs (from SPSS) Correlations anxiety noise Spearman's rho anxiety Correlation Coefficient 1.000.872** Sig. (2-tailed)..001 N 10 10 noise Correlation Coefficient.872** 1.000 Sig. (2-tailed).001. N 10 10 **. Correlation is significant at the 0.01 level (2-tailed). Example #5: Pearson’s r 35 Participant Motivation (X) Depression (Y) 1 3 8 2 6 4 3 9 2 4 2 2 1. Sketch a scatterplot. 2. Calculate the correlation coefficient. 3. Determine if it is statistically significant at the.05 level for a 2-tailed test. 4. Write an APA format conclusion.

Use Quizgecko on...
Browser
Browser