3003PSY Survey Design and Analysis in Psychology - Regression Assumptions PDF

Summary

This Griffith University mini-lecture covers regression assumptions, focusing specifically on residuals. The lecture explains how to interpret residuals and address violations. It touches on normality, homoscedasticity, and independence of residuals in the context of regression analysis.

Full Transcript

3003PSY Survey Design and Analysis in Psychology TESTS OF ASSUMPTIONS REGRESSION ASSUMPTIONS uRegression assumptions are actually made about the residuals (i.e., the e in Y = b0 + b1x1 + e) uWe can’t do anything directly to the error as it is what is left over after the line (i.e., predic...

3003PSY Survey Design and Analysis in Psychology TESTS OF ASSUMPTIONS REGRESSION ASSUMPTIONS uRegression assumptions are actually made about the residuals (i.e., the e in Y = b0 + b1x1 + e) uWe can’t do anything directly to the error as it is what is left over after the line (i.e., predicted Y) has been fit uregression will still “work” for nonlinear relationships but interpretations of parameters on face value may be meaningless… REGRESSION ASSUMPTIONS uNormality uresiduals (errors) are normally distributed with mean = 0 uoften said, centred around 0 (zero) uHomoscedasticity uconstant variance of residuals (across predicted scores) uIndependence of errors uThe residuals are uncorrelated with Y ulinearity of the relationship If we want the benefits of regression to answer our RQ, we have to pay the costs of ensuring that our data meet the underlying assumptions. uWe predict the actual (Y) score from the Xk predictors—what Y “should” be uA predicted score (Y′) is derived from this Recap of how uUnless the correlation is perfect, Y ≠ Y′, MR partitions ¡ Y - Y′ = e scores ue = error/residual u= whatever leftover in Y that Y′ doesn’t account for uY = Y′ + e uY = Predicted + Residual RECAP OF HOW MR PARTITIONS SCORES uEach person “has”: uan actual (Y) score ua predicted Y′ score ua residual “score” uIf the regression equation has underestimated the actual score, the residual will be positive uIf instead the true score was overestimated, the residual will be negative GRE_Q dataset—first 4 cases QRE_Q as predictor Stats_Exam (Y) Predicted (Y′) Residual e Zpredict Zresid 65 76.26828 -11.26828 1.34171 -1.17814 73 74.91639 -1.91639 1.13125 -.20037 85 74.24045 10.75955 1.02602 1.12495 80 74.24045 5.75955 1.02602.60218 Stats_Exam (Y) Predicted (Y′) Residual e Zpredict Zresid 65 76.26828 -11.26828 1.34171 -1.17814 73 74.91639 -1.91639 1.13125 -.20037 85 74.24045 10.75955 1.02602 1.12495 80 74.24045 5.75955 1.02602.60218 The assumptions are about the residuals… The assumptions are about the residuals but we use the standardised scores and regress the standardised residuals on the standardised predicted score. We don't need to do this separately we use syntax and add this to the multiple regression syntax We cover this in the tutorials uThe residuals can take on distinctive patterns if there is something systematic and amiss. WHY uIf there were only true score and random RESIDUALS error in our data and we hadn’t omitted any important predictors of our outcome, “MATTER” we would expect nice neat, normally distributed residuals centred around zero… uAnything else means that there might be a problem NORMALITY The assumptions are about the residuals but we use the standardised scores NORMALITY HOMOSCEDASCITY :A STYLISED FIGURE uWhen we have our residuals correlated with predicted scores uAnd we have random error (i.e., error is independent of association with the criterion) uAnd we haven't left out important predictors uWe should have a rectangular shape uWe have met the assumption of homoscedascity HOMOSCEDASCITY :A STYLISED FIGURE The larger this range, the worse the prediction uThe residuals should be evenly scattered above and below zero Zero uThe range of residuals around zero should be narrow HOMOSCEDASCITY :A STYLISED FIGURE uIfhowever our residuals look like a funnel or a fan then this suggests that we have not met the assumption of homoscedascity uThe distribution of residuals across the range of predicted values of Y is not even HOMOSCEDASCITY :A STYLISED FIGURE Why is prediction worse here than here…?? uSuch a pattern suggests that there is worse prediction at low predicted values of Y than at high predicted values of Y Zero uNotice the greater variability of residuals at low level of predicted scores on Y relative to the lower This often indicates skew in one variability of the predicted scores or more predictors—but… not always at high predicted scores. INTERPRETING RESIDUALS: HOMOSCEDASCITY HOW DO WE ADDRESS VIOLATONS? uThe following topic will deal with the situation in which inspection of the residuals plots suggests that one or more regression assumptions have been violated uShort of using a different statistical procedure entirely, we have 2 courses of action open to us: 1. We may find one or more outlying data points on one or more variables that are unduly influencing the regression analysis and leading to potentially erroneous conclusions HOW DO WE ADDRESS VIOLATONS? 2. One or more predictors may deviate greatly from normality—skewness and kurtosis. Again, this may lead to errors. a) In this case, we generally apply one or more transformations of the data (in turn, not concurrently). b) We replace the actual variable with the transformed one in the regression. c) Certain transformations may help alleviate skewness (caution, hit and miss!) Outliers and transformations are covered in the next topic and in the tutorials SUMMARY u Regression is a powerful analytic approach that requires consideration of underlying assumptions u Assumptions are made with regard to the residuals u We assume normality of residuals, homoscedascity, independence of residuals, and linearity u These assumptions are tested by requesting additional information in our syntax uViolations of assumptions may reflect skew, outliers, and/or non-linear associations

Use Quizgecko on...
Browser
Browser