Research Design and Statistics Lecture 3 - Bivariate Correlation and Regression PDF
Document Details
Uploaded by CommodiousApostrophe6548
University of York
Tags
Related
- EPMATH235 Statistics Extra Exercises Worksheet PDF
- NSE212W7 Analyzing Quantitative Data-Descriptive & Bivariate Statistics, 2024
- Bivariate Data & Correlation - PDF
- Bivariate Associations/Analysis - Session 3 PDF
- POL1803: Analyse des techniques quantitatives - Cours 9 - Corrélation et régression PDF
- Bivariate Kenngrößen PDF
Summary
This document presents a lecture on bivariate correlation and regression, detailing concepts like variance, covariance, and correlation coefficients. It also includes a decision tree flowchart for choosing the appropriate statistical test based on several factors, such as data type and number of predictors.
Full Transcript
Research Design and Statistics [RDS] Lecture 3 Correlation and Simple Regression What we will do today Variance Covariance Correlation (as a model) Parametric Non-parametric Assumptions Simple regression (as a model) Correlation [Chapter 8 Andy Field 5th Ed]...
Research Design and Statistics [RDS] Lecture 3 Correlation and Simple Regression What we will do today Variance Covariance Correlation (as a model) Parametric Non-parametric Assumptions Simple regression (as a model) Correlation [Chapter 8 Andy Field 5th Ed] A General approach is that our outcomes can be predicted by a model and what relains is the error For correlation the model is that our outcome is modelled by scaling (multiplying by a constant) another variable as follows This equation means ‘the outcome for an entity is predicted from their score on the predictor variable plus some error’. The model is described by a parameter, b1, which in this context represents the relationship between the predictor variable (X) and the outcome. Decision tree - our learning framework 1. What sort of CONTINUOUS (CONT) CATEGORICAL (CAT) measurement? 2. How many predictor TWO variables? ONE TWO (or more) ONE (or more) 3. What type of predictor CONT CAT CONT CAT BOTH CAT CONT CAT CONT BOTH variable? 4. How many levels of MORE THAN categorical predictor? TWO TWO 5. Same (S) or Different (D) participants for each S D S D S D BOTH D D predictor level? independent ANOVA Multiple regression 6. Meets assumptions t-test (independent) measures ANVOA One-way repeated measures ANOVA t-test (dependent) Factorial repeated Factorial ANOVA for parametric tests? Factorial mixed Correlation or Independent Regression YES ANCOVA One-way Pearson ANOVA Logistic Regression Logistic Regression Logistic Regression Log-linear analysis Chi-Squared test Mann-Whitney Kruskal-Wallis Spearman Friedman Willcoxon NO Variance We have already looked at dispersion in terms of the standard deviation We can look at the relationship between the standard deviation and variance It’s a simple relationship Variance is a feature of the outcome measurements we acquire that we want to be able to predict with a model that captures the effect of the predictor variables we have manipulated or measured Today we will be looking at the case where we have measured a value of an outcome and predictor variable for each individual Variance This is the expression for variance Note that it is the standard deviation squared. That means it captures the average of the squared difference the outcome values from the mean of all outcomes Covariance This is the expression for covariance Note how similar it is to An example visits to squared Exam squared participant the pub mean difference difference Score mean difference difference 1 0 -2.27 5.14 55 -9.27 85.87 2 0 -2.27 5.14 48 -16.27 264.60 3 1 -1.27 1.60 58 -6.27 39.27 4 1 -1.27 1.60 55 -9.27 85.87 5 1 -1.27 1.60 62 -2.27 5.14 6 2 -0.27 0.07 68 3.73 13.94 7 2 -0.27 0.07 65 0.73 0.54 8 2 2.27 -0.27 0.07 62 64.27 -2.27 5.14 9 2 -0.27 0.07 58 -6.27 39.27 10 2 -0.27 0.07 68 3.73 13.94 11 3 0.73 0.54 62 -2.27 5.14 12 3 0.73 0.54 75 10.73 115.20 13 4 1.73 3.00 85 20.73 429.87 14 5 2.73 7.47 68 3.73 13.94 15 6 3.73 13.94 75 10.73 115.20 Sum 0.00 40.93 Sum 0.00 1,232.93 Variance 2.92 Variance 88.07 Stdev 1.71 Stdev 9.38 Covariance - what we do 1. Calculate the error between the mean and each subject’s score for the first variable (x). 2. Calculate the error between the mean and their score for the second variable (y). 3. Multiply these error values. 4. Add these values and you get the product deviations. 5. The covariance is the average product deviations Covariance This is the expression for covariance So the covariance will be large when values below the mean for one variable Revisit our example Standardizing Covariance Problem: Covariance is dependent upon the units of measurement. So we need to STANDARDISE it. We standardise by dividing by product of the standard deviations of both variables. Standardized version of Covariance = CORRELATIONAL COEFFICIENT or Pearson’s r Pearson Correlation Coefficient It varies between -1 and +1 (direction of the relationship) 🡽 0 = no relationship It measures the strength of a relationship between one variable and another hence its use in calculating effect size 🡽 ±.1 = small effect 🡽 ±.3 = medium effect 🡽 ±.5 = large effect As sample size increases, so the value of r at which a significant result occurs, Correlation For correlation the model is that our outcome is modelled by scaling (multiplying by a constant) another variable as follows This equation means ‘the outcome for an entity is predicted from their score on the predictor variable plus some error’. The model is described by a parameter, b1, which in this context represents the relationship between the predictor variable (X) and the outcome. We have now learned that the correlation coefficient gives the ratio of covariance to a measure of variance Examples of correlations Coefficient of Determination r2 r-squared can be used to calculate the amount of shared variance r =.1, r2 =.01 (1%) r =.3, r2 =.09 (9%) r =.5, r2 =.25 (25%) r =.9, r2 =.81 (81%) Gives you the true strength of the correlation but without an indication of its direction. Decision tree - our learning framework 1. What sort of CONTINUOUS (CONT) CATEGORICAL (CAT) measurement? 2. How many predictor TWO variables? ONE TWO (or more) ONE (or more) 3. What type of predictor CONT CAT CONT CAT BOTH CAT CONT CAT CONT BOTH variable? 4. How many levels of MORE THAN categorical predictor? TWO TWO 5. Same (S) or Different (D) participants for each S D S D S D BOTH D D predictor level? independent ANOVA Multiple regression 6. Meets assumptions t-test (independent) measures ANVOA One-way repeated measures ANOVA t-test (dependent) Factorial repeated Factorial ANOVA for parametric tests? Factorial mixed Correlation or Independent Regression YES ANCOVA One-way Pearson ANOVA Logistic Regression Logistic Regression Logistic Regression Log-linear analysis Chi-Squared test Mann-Whitney Kruskal-Wallis Spearman Friedman Willcoxon NO Different types of correlation Non Parametric data Spearman’s ρ (rho, rs) ○ Variables are not normally distributed and the measures are on ordinal scale ( e.g. grades) ○ Works by first ranking the data n(numbers converted into ranks), and then running Pearson’s r on the ranked data Kendall’s τ (tau) ○ For small datasets, many tied ranks ○ Better estimate of correlation in population than Spearman’s ρ Decision tree - our learning framework 1. What sort of CONTINUOUS (CONT) CATEGORICAL (CAT) measurement? 2. How many predictor TWO variables? ONE TWO (or more) ONE (or more) 3. What type of predictor CONT CAT CONT CAT BOTH CAT CONT CAT CONT BOTH variable? 4. How many levels of MORE THAN categorical predictor? TWO TWO 5. Same (S) or Different (D) participants for each S D S D S D BOTH D D predictor level? independent ANOVA Multiple regression 6. Meets assumptions t-test (independent) measures ANVOA One-way repeated measures ANOVA t-test (dependent) Factorial repeated Factorial ANOVA for parametric tests? Factorial mixed Correlation or Independent Regression YES ANCOVA One-way Pearson ANOVA Logistic Regression Logistic Regression Logistic Regression Log-linear analysis Chi-Squared test Mann-Whitney Kruskal-Wallis Spearman Friedman Willcoxon NO Simple Regression [Chapter 9 Field 5th Ed] 🡽 Regression is a way of predicting things that you have not measured 🡽 Predicting an outcome variable from one predictor variable. OR 🡽 Predicting a dependent variable from one independent variable. 🡽 Used to create a linear model of the relationship between two variables Features of the model for simple regression analysis The straight line used for the model has two parameters: The gradient (describing how the outcome changes for a unit increment of the predictor) The intercept (of the vertical axis), which tells us the value of the outcome variable when the predictor is zero Regression: An Example Error album sales not explained by advertising Outcome interce budget variable = Predictor Variable pt Album sales = Advertising Regression coefficient for budget predictor = direction and strength of the relationship between advertising budget & album sales Is the model any good? The Model Sum of The closer the sum of squares of the model is Squares to the total sum of squares of the data, the better the model accounts for the data, and the smaller the residual sum of squares must SSM =SST - SSR be. Capturing how good the model is with r2 The Model Sum of The closer the sum of squares of the model is Squares to the total sum of squares of the data, the better the model accounts for the data, and the smaller the residual sum of squares must SSM =SST - SSR be. 🡽 The proportion of variance accounted for by the regression model. 🡽 The Pearson Correlation Coefficient Squared- coefficient of determination 🡽 Overall fit of the model - model summary 🡽 Adjusted R2 - how well R2 generalizes to the population Testing the Model: F-ratio 🡽 F–Ratio- testing is the line better than the mean 🡽 Overall model (fitted line) is a good fit 🡽 Mean Squared Error 🡽 Sums of Squares are total values. 🡽 They can be expressed as averages. 🡽 These are called Mean Squares, MS SSM/df=# of variable in the model SSR/df=# of observation - # of Model parameters bi = the change in Standard Error indicates how the outcome far off you would be, on associated with a average, if you were to use the Line independent variable and th coefficients: unit change in the model to predict scores on the Intercept=b0 predictor. dependent variable Beta = Slope= bi r Assumptions 🡽 Variable Type: 🡽 Outcome must be continuous 🡽 Predictors can be continuous or dichotomous. 🡽 Non-Zero Variance: 🡽 Predictors must not have zero variance. 🡽 Independence: 🡽 All values of the outcome should come from a different person. 🡽 Linearity: 🡽 The relationship we model is, in reality, linear. Assumptions 🡽 Homoscedasticity: 🡽 For each value of the predictors the variance of the error term should be constant. 🡽 Independent Errors: 🡽 For any pair of observations, the error terms should be uncorrelated (see Durbin-Watson test). 🡽 Normally-distributed Errors Homoscedasticity GOO BAD Normality of Errors GOO BAD Does correlation mean causation? Correlations DO NOT mean causation, even if they make sense (e.g. advertising budget) Exam Score Spurious correlations can occur when an unknown variable could drive the effect Visits to the Pub Decision tree - our learning framework 1. What sort of CONTINUOUS (CONT) CATEGORICAL (CAT) measurement? 2. How many predictor TWO variables? ONE TWO (or more) ONE (or more) 3. What type of predictor CONT CAT CONT CAT BOTH CAT CONT CAT CONT BOTH variable? 4. How many levels of MORE THAN categorical predictor? TWO TWO 5. Same (S) or Different (D) participants for each S D S D S D BOTH D D predictor level? independent ANOVA Multiple regression 6. Meets assumptions t-test (independent) measures ANVOA One-way repeated measures ANOVA t-test (dependent) Factorial repeated Factorial ANOVA for parametric tests? Factorial mixed Correlation or Independent Regression YES ANCOVA One-way Pearson ANOVA Logistic Regression Logistic Regression Logistic Regression Log-linear analysis Chi-Squared test Mann-Whitney Kruskal-Wallis Spearman Friedman Willcoxon NO Next week Decision tree - our learning framework 1. What sort of CONTINUOUS (CONT) CATEGORICAL (CAT) measurement? 2. How many predictor TWO variables? ONE TWO (or more) ONE (or more) 3. What type of predictor CONT CAT CONT CAT BOTH CAT CONT CAT CONT BOTH variable? 4. How many levels of MORE THAN categorical predictor? TWO TWO 5. Same (S) or Different (D) participants for each S D S D S D BOTH D D predictor level? independent ANOVA Multiple regression 6. Meets assumptions t-test (independent) measures ANVOA One-way repeated measures ANOVA t-test (dependent) Factorial repeated Factorial ANOVA for parametric tests? Factorial mixed Correlation or Independent Regression YES ANCOVA One-way Pearson ANOVA Logistic Regression Logistic Regression Logistic Regression Log-linear analysis Chi-Squared test Mann-Whitney Kruskal-Wallis Spearman Friedman Willcoxon NO