Correlation & Rank Correlation PDF

Summary

This document is a collection of lecture slides on correlation and rank correlation, discussing linear relationships between paired data, scatter plots, hypothesis testing, and coefficients of determination. 

Full Transcript

Slide 1 Correlation & Rank Correlation Created by Erin Hodgess, Houston, Texas Copyright © 2004 Pearson Education, Inc. Lecture Objectives Slide 2 At the end of this lecture, students should be able to: I. Understand the concept of linear...

Slide 1 Correlation & Rank Correlation Created by Erin Hodgess, Houston, Texas Copyright © 2004 Pearson Education, Inc. Lecture Objectives Slide 2 At the end of this lecture, students should be able to: I. Understand the concept of linear relationship between paired quantitative data II. Use a scatter plot to assess whether a linear relationship exists and its direction III. Conduct a formal hypothesis test to find the value and assess the strength of the linear correlation (Karl-Pearson) and rank correlation (Spearman) coefficients using JMP. IV. Compute and interpret the coefficient of determination. Reference: Chapters 10 &13 sections 10.1 ,13.6 Copyright © 2004 Pearson Education, Inc. Slide 3 Is there a relationship between pulse rates and systolic blood pressure of women? Copyright © 2004 Pearson Education, Inc. Paired Data Slide 4 Overview o Is there a relationship? o If so, what is the strength of the relationship? Copyright © 2004 Pearson Education, Inc. Correlation Slide 5 Definition A correlation exists between two variables when one of them is related to the other in some way. Copyright © 2004 Pearson Education, Inc. Scatterplot Slide 6 Definition A Scatterplot (or scatter diagram) is a graph in which the paired (x, y) sample data are plotted with a horizontal x-axis and a vertical y- axis. Each individual (x, y) pair is plotted as a single point. Copyright © 2004 Pearson Education, Inc. Scatter Diagram Slide 7 of Paired Data Y Copyright © 2004 Pearson Education, Inc. Scatter plots illustrating different Slide 8 correlation structures 150 Perfect positive correlation 150 Perfect negative correlation 170.0 Strong negative correlation 140 140 150.0 130 130 120 120 130.0 110 110 100 100 110.0 90 90 90.0 80 80 70 70 70.0 60 60 50 50 50.0 0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12 80 Quadratic function Random values 150 No correlation 150 140 75 140 130 130 120 70 120 110 110 65 100 100 90 90 60 80 80 70 70 55 60 60 50 50 50 0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12 Copyright © 2004 Pearson Education, Inc. Slide 9 JMP Example: Scatterplot Examine relationship between blood sugar level Y and BMI in the file diabetes.jmp. Is there a linear relationship? Copyright © 2004 Pearson Education, Inc. JMP Fit Y by X : Scatterplot Slide 10 There is a positive linear relationship between blood sugar level Y and BMI variables. JMP Example: Scatterplot Slide 11 Fourteen different second-year medical students took blood pressure measurements of the same patient, and the results are listed below. Is there a correlation between systolic and diastolic values (mmHg)? Systolic(mmHg) 138 130 135 140 120 125 120 130 130 144 143 140 130 150 Diastolic (mmHg) 82 91 100 100 80 90 80 80 80 98 105 85 70 100 Copyright © 2004 Pearson Education, Inc. JMP Example: Scatterplot Slide 12 Copyright © 2004 Pearson Education, Inc. Linear Correlation Coefficient Slide 13 Definition The linear correlation coefficient r measures strength of the linear relationship between paired x and y values in a sample. Copyright © 2004 Pearson Education, Inc. Strength of Linear Relationship Slide 14 Copyright © 2004 Pearson Education, Inc. Linear Correlation Coefficient Slide 15 Assumptions 1. The sample of paired data (x, y) is a random sample. 2. The pairs of (x, y) data have a bivariate normal distribution. Copyright © 2004 Pearson Education, Inc. Linear Correlation Coefficient Notations Slide 16 n = number of pairs of data presented  = denotes the addition of the items indicated. x = denotes the sum of all x-values. x2 = indicates that each x-value should be squared and then those squares added. ( x)2 = indicates that the x-values should be added and the total then squared. xy = indicates that each x-value should be first multiplied by its corresponding y-value. After obtaining all such products, find their sum. r = represents linear correlation coefficient for a sample  = represents linear correlation coefficient for a population Copyright © 2004 Pearson Education, Inc. Example: Calculating r Slide 17 Data x 1 1 3 5 y 2 8 6 4 Copyright © 2004 Pearson Education, Inc. Calculating r Slide 18 Copyright © 2004 Pearson Education, Inc. Calculating r Slide 19 Data x 1 1 3 5 y 2 8 6 4 nxy – (x)(y) r= n(x2) – (x)2 n(y2) – (y)2 4(48) – (10)(20) r= 4(36) – (10)2 4(120) – (20)2 –8 r= = –0.135 59.329 Copyright © 2004 Pearson Education, Inc. Linear Correlation Coefficient rSlide 20 Properties 1. –1  r  1 2. Value of r does not change if all values of either variable are converted to a different scale. 3. The r is not affected by the choice of x and y. interchange x and y and the value of r will not change. 4. r measures strength of a linear relationship. Copyright © 2004 Pearson Education, Inc. Explained Variation Slide 21 Coefficient of determination Interpreting The value of r2 is the proportion of the variation in y that is explained by the linear relationship between x and y. This value (r2) is called coefficient of determination Copyright © 2004 Pearson Education, Inc. Example for 2 r : Slide 22 Boats and Manatees Using the boat/manatee data in Table 9-1, we have found that the value of the linear correlation coefficient r = 0.922. What proportion of the variation of the manatee deaths can be explained by the variation in the number of boat registrations? With r = 0.922, we get r2 = 0.850. We conclude that 0.850 (or about 85%) of the variation in manatee deaths can be explained by the linear relationship between the number of boat registrations and the number of manatee deaths from boats. This implies that 15% of the variation of manatee deaths cannot be explained by the number of boat registrations. Copyright © 2004 Pearson Education, Inc. Linear Correlation Coefficient Slide 23 Formal Hypothesis Testing We wish to determine whether there is a significant linear correlation between two variables. let H0:  = (no significant linear correlation) H1:   (significant linear correlation) Copyright © 2004 Pearson Education, Inc. JMP Example: Pearson Linear Slide 24 Correlation Examine association between the variables “Y” & “BMI” using “diabetes.jmp” dataset in JMP. 1. What is the strength of the relationship? 2. What is the direction of the relationship? 3. Does a significant linear relationship exist between the two variables? 4. What proportion of the variation in Y is explained by the relationship? Copyright © 2004 Pearson Education, Inc. Slide 25 JMP Output: Pearson Correlation Analyze Multivariate Methods  Multivariate. From multivariate  correlation probability Multivariate Correlations Pearson correlation Y BMI coefficient “r” Y 1.0000 0.5865 BMI 0.5865 1.0000 Correlation Probability Y BMI P value of Y

Use Quizgecko on...
Browser
Browser