CHS 729 Week 6: Covariance and Correlation PowerPoint Slides PDF

Summary

This document presents slides from CHS 729 Week 6, covering topics such as covariance, correlation including Pearson's coefficient, conceptual models and the methods section of a research paper. The slides use the R programming language to demonstrate some concepts, and also includes topics such as visualization of data and non-linear associations.

Full Transcript

CHS 729 Week 6 Covariance and Correlation Path Diagrams / Conceptual Models Anatomy of a Methods Sections How 2 Variables “Vary Together” Often, we want to know the relationship (or association) between two variables. When we have continuous variables X and Y, it can be of interest...

CHS 729 Week 6 Covariance and Correlation Path Diagrams / Conceptual Models Anatomy of a Methods Sections How 2 Variables “Vary Together” Often, we want to know the relationship (or association) between two variables. When we have continuous variables X and Y, it can be of interest to understand how they “vary together” In other words, how are changes in X associated with changes in Y (and vice versa)? For example, we may understand that as a person gets older (X = age), that they develop more gray hairs (Y = amount of gray hair) – these variables “vary together” Covariance Covariance is a measure of how X and Y vary together. If larger values of X are associated with larger values of Y, then we refer to this as positive covariance If larger values of X are associated with smaller values of Y, then we refer to this as negative covariance. Understanding how two variables “vary together” is a fundamental task in statistics – as this can tell us many things including:  How exposure to an intervention is associated with changes in an outcome  How exposure to a risk/protective factor is associated with an outcome  How risk and protective factors are associated with one another Measuring Covariance For each participant in our sample, i, we are calculating the distance of their values and from the mean values of X and Y, then taking the average* The term contains two important pieces of information:  How far participant i value of X is from the sample mean  If the participant i value of X is greater than or less than the sample mean. Measuring Covariance If a participant i value of X and Y are both greater than the sample average or both less than the sample average, they will contribute a positive value to the total covariance. If a participant i value of X and Y are on opposite sides of their respective means, then it will contribute a negative value to the total covariance. Thinking Intuitively About What Covariance Tells Us Positive covariance indicates that 1) values of X greater than the sample mean are associated with values of Y greater than the sample mean and 2) values of X less than the sample mean are associated with values of Y less than the sample mean. Negative covariance indicates that 1) values of X greater than the sample mean are associated with values of Y less than the sample mean and 2) values of X less than the sample mean are associated with values of Y greater than the sample mean. Covariance is a Non-Standard Measure of Association Covariance is a measure of association – it does not explain why X and Y may vary together. It only refl ects that they do! Further, it is not standardized. Simply changing the unit of a variable can change the magnitude of the measured covariance. For example, if we measured the covariance between the weight of a vehicle in pounds and its miles per gallons, we would get a totally different value of covariance if we shifted the weight of vehicles to be in grams (even though the information is identical). Measuring Covariance in R We can measure covariance in R using the cov() function. We provide two arguments, each a vector representing one of the two variables we are measuring covariance of. The function will return the value of covariance. Covariance can take any value from negative infinity to positive infinity. Here, we see that there is negative covariance between the fuel effi ciency of vehicles (mpg) and their horse power (hp) Visualizing Covariance Visualizing Covariance w/ Scatterplots The ggplot2 library is really useful for making visualizations Here we can see that points appear to follow a pattern from top-left to bottom-right, indicating negative covariance Modifying Appearance Ggplot has many built in themes Allowing you to change appearance of plots These can also be customized, as you get more familiar with the package. Covariance Matrix It can be useful to calculate the covariance between all variable in your dataset to create what is called a covariance matrix Each value represents the covariance between the row and column variables. The main diagonal (top left to the bottom right) shows the variance of each variable, as the row and column variables are the same. Simply supply the data from to the cov() function Correlation Covariance is useful but diffi cult to interpret because it is not standardized. Correlation is the standardized version of covariance, and for continuous, normally distributed variables it is calculated like so: Where are the standard deviations of X and Y. This always results in a value between -1 and 1. Pearson’s Correlation Coefficient When we have sample data and want to calculate the correlation between normally distributed variables X and Y, we can compute Pearson’s Correlation. It is easy to make this calculation in R, like so: Interpreting Pearson’s Coefficient Pearson’s Coeffi cient is a measure of linearity In other words, it captures if there exists a linear function (y = mx + b) that captures the covariance between X and Y and how well the data fits to this function. A correlation of 1 means that there exists a linear function y = mx + b that perfectly describes the relationship of X and Y and that A correlation of 1 means that there exists a linear function y = mx + b that perfectly describes the relationship of X and Y and that Visualizing Perfect Correlation Correlation of 1 Correlation of -1 Perfect Correlation Doesn’t Happen By Chance Very Often (Ever) Here we see positive correlation. While we do not fit a linear function to calculate correlation, it is useful for visualization. The line represents a “best fit” The correlation score represents how well the data “fits” to this line. In this care, r = 0.659 Interpreting Correlation There is no objective rule for interpreting the strength of a correlation score. It depends on context, methods being applied, and the nature of the variables. There are many hand-wavey rules for interpretation, such as: 

Use Quizgecko on...
Browser
Browser