Tutorial Multiple Regression Transcript PDF

Document Details

MesmerizedPeridot

Uploaded by MesmerizedPeridot

Griffith University

Tags

multiple regression statistical analysis regression analysis statistics

Summary

This document is a transcript of a tutorial on multiple regression, focusing on concepts and applications in statistics and data analysis. It includes examples and an explanation of the regression equation. It's aimed at an undergraduate level.

Full Transcript

SPEAKER 0 Dr David Riley. And today I'm going to be taking you through multiple SPEAKER 1 regression. SPEAKER 0 So last week we will have given you a gentle introduction to multiple regression. SPEAKER 1 We introduced, of course, the by various regression, which involves SPEAKER 0 a single predi...

SPEAKER 0 Dr David Riley. And today I'm going to be taking you through multiple SPEAKER 1 regression. SPEAKER 0 So last week we will have given you a gentle introduction to multiple regression. SPEAKER 1 We introduced, of course, the by various regression, which involves SPEAKER 0 a single predictor. And we mentioned last week that it would be possible to extend that to include multiple predictors. This week you'll actually get some practise with it, and we'll be using the tutorial data set again. The attitudes to statistics I will give you a go at actually interpreting multiple regression so multiple regression like by Varia. It involves calculation and prediction of schools, but instead we use multiple predictors, all things being equal. If we have multiple predictors, we're going to get a better estimate off what a participant school will be. So instead of running regression for each predictor separately and weaken them, run one regression with multiple critic. SPEAKER 1 Two variables. SPEAKER 0 So remember the equation? Why hat it was B zero, which is your intercept plus B one, which is your slope coefficient multiplied by SPEAKER 1 predictor one. SPEAKER 0 Well, for multiple regression we just extended. You can see here that we're Daisy chaining as many extra predictors Ahs. We need. So in this example, we've got three predictors. Your constant remains be zero. So that's the value. Why, when X is actually equal? SPEAKER 1 Zero predictor one. SPEAKER 0 We've got the slope coefficient b one x one and predicted to be two x two predicted three b three SPEAKER 1 x three. SPEAKER 0 But there's no reason why we couldn't have 1/4 of SPEAKER 1 50 or even a six predict to there, just depending on the types of variables that we've analysed and our theoretical justification for including them. SPEAKER 0 So I'm going to give you example of multiple regression because sometimes when you've got a concrete example, it's easier SPEAKER 1 to understand. So imagine we have this research question. SPEAKER 0 What factors predict more positive attitudes towards video games? Well, we could have some predictors here based on our SPEAKER 1 reading of the literature that there would be aged effects, SPEAKER 0 someone who plays video games a lot, and in particular, if someone who spends a lot of time playing video SPEAKER 1 games, that might be a predictor off more positive attitudes. Now the outcome variable well, the dependent variable here is still attitudes towards video games. SPEAKER 0 If we're going to pop this in tow s process, where have some syntax here regression and would tell us Pierce's what variables were going to include that first line. So we're going to include video game attitudes, aid whether or not you're a gamer. So dummy coded and the amount of time spent video SPEAKER 1 gaming. SPEAKER 0 We didn't go through all this sin tax in greater SPEAKER 1 detail later on in the Chitral. SPEAKER 0 But for now, just understand that we're identifying all the variables we include in our analysis in the very first SPEAKER 1 line of that syntax. SPEAKER 0 So we might get some output like thiss so we can see here hour R squared, which is the proportion of variance in the outcome that Khun be accounted for SPEAKER 1 by a ll the predictive variables combined. SPEAKER 0 So it might be aged accounting for most of this, or it could be amount of time. Or it could be a dummy code, whether you are a video game. But when we add them all together, this explains our SPEAKER 1 combined variance. SPEAKER 0 At this stage, though we don't know which if any SPEAKER 1 of the predictors of significant also got our you can SPEAKER 0 see here on the right here We've got our a pie chart which explains how much for variance is accounted for by age, whether or not your game and the SPEAKER 1 amount of time spent and you can see here we've SPEAKER 0 got the remaining 22.7, which isn't accounted for, sir. SPEAKER 1 It's unexplained variance. SPEAKER 0 We can see here that collectively, all of our predictors do explain a significant amount of parent because this P SPEAKER 1 value is less than point I five. In fact, it's actually less in Syria. 01 So there we have our coefficient tail and that SPEAKER 0 gives us more information on the calculation of the regression SPEAKER 1 equations. So we've got our standardised coefficients here on the left, SPEAKER 0 and you can then see on the right there sorry on the left there that we've actually listed the variable SPEAKER 1 names. SPEAKER 0 Now, age obviously is going to be in different units. It's in years, whereas amount of time spent video gaming SPEAKER 1 might be in hours or minutes, probably ours. SPEAKER 0 And the beauty of the regression equation is that it doesn't matter. It adjust these be values to compensate for whatever units SPEAKER 1 of measurement we're using. SPEAKER 0 But when we want to compare things like age, an SPEAKER 1 amount of time on standardised unit, we have this column here the standardised coefficients or the bait awaits that little SPEAKER 0 symbol There was a bee with a funny tower. SPEAKER 1 That's actually great. Let us debater. SPEAKER 0 This place is a ll the predictors on the same scale which makes a lot easier to compare. Ah, larger beta way indicates a stronger predictor. So having a look att, the three variables, their age SPEAKER 1 gamer and time spent video gaming. SPEAKER 0 Which of these is the strongest predictor? Can you see it? So amount of time spent video gaming is the strongest predictor followed by whether or not you're a moderate gamer when we see a Jew, although they definitely is in effect there, it's relatively small. And we also have this column here for our statistical significance. So in the statistical significance column, we can actually test the null hypothesis that there is no association between each SPEAKER 1 of these predictors and the outcome. SPEAKER 0 So, for example, for a JJ, although it was a negative correlation here, it fails the test of statistical significance SPEAKER 1 because the P value is greater than 0.5 So we SPEAKER 0 would retain them al hypothesis and say that when we're evaluating the contribution of age, game and ties meant video gaming. It doesn't make a significant unique contribution. We can see here the game of those. It's a lesson point I five as well as time SPEAKER 1 spent video gaming. SPEAKER 0 We also have this extra column here, a zero order SPEAKER 1 partial and part which will go through later and the tutorial. SPEAKER 0 And if you want to generate that in S V SS, you just add sliced CPP Teo, your syntax. So the CPP or the zero partial and semi partial column that gives us additional information about the correlations in SPEAKER 1 our model. SPEAKER 0 So zero order is the by various Pearsons are correlation SPEAKER 1 between the predictor and the outcome variable. But on the beauty of this is that gives you SPEAKER 0 the same output as if you'd just gone to the correlate menu and done a by various correlation. So it's not taking into account the contribution offthe e SPEAKER 1 other variables in the regression equation. SPEAKER 0 Whereas the partial is the correlation between the predictor in SPEAKER 1 the outcome, with the effects of all the other predictors SPEAKER 0 removed or partial about. We don't generally use thiss, so unless you have a specific reason for reporting it, I would admit it Well, SPEAKER 1 we do pay attention to us. A semi partials. SPEAKER 0 In fact, we actually square thiss and so we call SPEAKER 1 it the Squared semi partials. SPEAKER 0 That's the correlation between the predictor and the outcome with a ll the other predictors partial doubt. And we square that semi partial to give the percentage of variance explained or the unique effect of the predictor. So, for example, if I wanted to take that bottom one there, which was 10.269 and I, too, could square that let's get a trusty calculator really 269 a square that it accounts for about 7%. All right, so the partial correlation just a refresher on the theory here. That's the correlation between the predictor and the outcome variables. With the effect of all the other predictors removed, we SPEAKER 1 actually take them out. The Sami partial was. What we're really interested in is the correlation between the SPEAKER 0 predictor and the outcome, with the effect of the other predictors removed only from the focal predictor. SPEAKER 1 A lot of jargon, but basically it explains the unique SPEAKER 0 percentage of variance. So the unique effect, for example, off being a gamer SPEAKER 1 or the amount of time spent being gaming well citizen SPEAKER 0 from the partial in the semi partial. Well, with the partial here on the left, the effects of the predictor, for example, aid is removed from the SPEAKER 1 equation. SPEAKER 0 But with the semi partials on in the overlap between SPEAKER 1 X one and the other predictors who removed so basically SPEAKER 0 gives you the unique effect off at that particular predictor. So we can see here we've got zero order. So even though our age correlation here on the left s are in the top row, even though it's not statistically significant, thie unique contribution actually does correlate fairly strongly SPEAKER 1 with outcome there. SPEAKER 0 So there is a negative correlation, Tween age and video SPEAKER 1 gaming. SPEAKER 0 But when we consider the effect of the other variables, SPEAKER 1 they overlap quite considerably. So that's why you're seeing different values here for your zero order to the squared semi partials here on the right, because the unique contribution of age is actually much, much smaller. SPEAKER 0 Just remember, through zero order, it's the Pearsons correlation between each predictor in the dependent variable, but without taking into consideration the contribution, the explanation to variance off the other SPEAKER 1 predictors so we can see here. SPEAKER 0 We've got a negative correlation between age and video game SPEAKER 1 attitudes. SPEAKER 0 We have a positive correlation between being a game and SPEAKER 1 your video game out of cheeks. SPEAKER 0 The condom makes sense. If you didn't like gaming, I had negative attitude. SPEAKER 1 You probably wouldn't be a game on a positive correlation between the amount of time spent. SPEAKER 0 But when we consider all three of those predictors and SPEAKER 1 throw them into the regression equation, actually unique contribution here the squared semi parcels is a whole lot smaller because SPEAKER 0 some of these predictors overlap with each other. That's your squared. Seven Pass was calm, determines thie unique variance accounted for SPEAKER 1 by age predictor. So if we took a judge here now been negative. SPEAKER 0 0.116 squared about 1.3 4%. Variants soap, really tiny. We have a look. ATT being a gamer and the amount of time spent. SPEAKER 1 You can see they represent much larger values so we SPEAKER 0 can see a JJ game time spent video gaming, and the remainder is actually shared variance across age and being SPEAKER 1 a game. And the amount of Time Thea amount of variants here SPEAKER 0 is actually quite considerable. That's shared across these three was the unique contribution is SPEAKER 1 actually relatively small for each one. SPEAKER 0 So hopefully that's a good refresher of multiple regression. SPEAKER 1 In a moment, we're going to show you how to actually run a mall progression in spc. SPEAKER 0 Now we have a look at the tutorial data far, which is attitudes to statistics. And if we click on the variable view, we can see we've got our individual items as well as our variables here that we created using the transform command. And we've got a attitudes to statistics is a field. SPEAKER 1 We've got attitudes, Teo. During a statistics course, we've got a variable hedonic ho did, which is mass BC. Whether you did basically advance maths have a JJ, which SPEAKER 0 is a continuous fair of age in years. SPEAKER 1 We also have gender, which is dummy code. SPEAKER 0 So we now need to open a new window to record some syntax. And I'm going to select the syntax here. Workbook copy Going to S P SS. We open a new window, go to syntax, have pasted SPEAKER 1 that in. So let's take it through nice and slowly, very first part ofthe thie regression equation. SPEAKER 0 It specifies the variables that are involved. So we're going to be looking at attitudes to a statistics course, we're going to be looking at gender, which SPEAKER 1 we've done meek loaded. SPEAKER 0 We're going to look at age and also whether you've SPEAKER 1 done basically advanced mathematics math, B C. SPEAKER 0 Now the SS is kind of dumb. You've got to drive it and tell it what to do. So we've actually gotta specify in this third line what SPEAKER 1 the dependent variable is. It could have been any of those ones. SPEAKER 0 You just have to make sure that you're dependent. SPEAKER 1 Variable is also including your your line here of variables, so slash dep equals course and that slash Enter that SPEAKER 0 would tell us what variables remaining and go into the SPEAKER 1 progression equation. SPEAKER 0 So unless I enter a specific set, then it will SPEAKER 1 add whatever of the variables are left. Yes. Sorry. Any other variables that are left here in this list Now the other line that we need to have a SPEAKER 0 look at is this one here. So by default, we want the basic. SPEAKER 1 So you're r squared in the coefficients. SPEAKER 0 So we just do slash statistics equals defaults. We also add CPP, which gives us our zero order SPEAKER 1 part and suede semi partials. I'm going to run that. Now click on play. SPEAKER 0 We're gonna have a look at the fierce out the very first line here. It tells us what variables were entered into the regression SPEAKER 1 equation. SPEAKER 0 So I maths BC age and female note that we SPEAKER 1 couldn't end the variable course because that's our dependent. Vary. We have our r squared model summary here. SPEAKER 0 So if I have a look at this column here, this tells us that overall math, specie age and being SPEAKER 1 dummy code it is male or female explains 15.8 percent of the variance in the outcome, about 16%. SPEAKER 0 We know, too, that our overall model here is statistically significant because the significance for the aunt over there it's actually less an Alfa 0.5 that says that overall, these three predictors do explain some variance in the model. But we have no guarantees about which of the predictors SPEAKER 1 is significant. Now, a final one here is our coefficients on. SPEAKER 0 We can see here on the left. We've got our regression equation so intercept and this is slope for female. This's a slope for age, and this is a slope SPEAKER 1 for math bees. He and then we've got our coefficients here that we SPEAKER 0 wanted to use to test the null hypothesis. So is there a significant effect of gender or unique, SPEAKER 1 effective gender? SPEAKER 0 So we can see here that this one here is actually greater than point of fight? So there's no unique contribution agenda, the same with age, SPEAKER 1 the same with math B. C. So we might be going on here so we can see here now that our individual predictors do not make a significant unique contribution, yet our by vary it correlations SPEAKER 0 would suggest they do on. We do have an overall significant model in our an SPEAKER 1 overstatement. So what might be going on? SPEAKER 0 Well, there's a few things we need to evaluate. The first is the combined explanation off all of these SPEAKER 1 predictors, so math, B C, age and gender. Overall, they clearly do explain a fair degree off variance approximately 16%. SPEAKER 0 So overall model was significant. But what were happening is that the individual predictors they're not significant in their own right. What's happening is that we've got some overlap between being SPEAKER 1 female on DH, for example, maths bc males generally and more likely to have studied maths BC, so that some SPEAKER 0 of the variance is overlapping here on. We also for this sample have a bit of an SPEAKER 1 overlap here with age. SPEAKER 0 So when we consider each of these three predictors individually, SPEAKER 1 well, we're not seeing a significant unique contribution. SPEAKER 0 But collectively they do make a significant contribution. SPEAKER 1 It's just none is strong enough in their own right. Teo emerges a cig statistically significant predictor. SPEAKER 0 Let's go through our workbook now steps for interpreting a SPEAKER 1 standard multiple regression. The first thing is to examine the output is thie overall F ratio. If this is significant, it means that the predictive variables Khun B used in combination predict the criterion or the dependent variable. SPEAKER 0 So that is clearly met. Yeah, we've shown you how to get r squared which SPEAKER 1 in this case here is 16.8, 15 0.8 rather and SPEAKER 0 then we come to the bees, bees and the bees. SPEAKER 1 So the individual predictors now be weights. SPEAKER 0 Remember, our un standardised are efficient. We have our standardised here and then we have our SPEAKER 1 tests of statistical significance. SPEAKER 0 So even though these are individually significant, Andi age is fairly close. So it might partially be our low sample size here. SPEAKER 1 We've only got I think. See, we've only got about 68 participants. SPEAKER 0 We had a bigger sample. These might well be statistically significant. So we haven't got enough evidence to reject the null SPEAKER 1 hypothesis and say that age makes a unique contribution. SPEAKER 0 But this certainly a shared contribution of age math, BC SPEAKER 1 and gender here that does explain our model. SPEAKER 0 And how much for variance? Well, we can have a look here at the zero order and then the squared semi partials S r squared. So the squared semi partials. Let's compare the semi partials. So I'm going to take or age here. SPEAKER 1 Yeah, our our is negative 0.235 And if we would SPEAKER 0 a square that 235 were to square that it explains about 5% variants, let's see how much unique variance the SPEAKER 1 gender explains the 0.1 54 0.154 square that only about SPEAKER 0 2.3% to about 4% variance. That means that it's unique. Ferencz is much lower than in Syria order, and it must be sharing some variance with thes. SPEAKER 1 Other predictors are age will be seeing it's two the SPEAKER 0 same now for age 0.278 0.278 squared. That's 7% 7.7% Now let's have a look at its unique variant 0.223 only about five percent unique variance. It's still a meaningful amount. And had this been a larger sample size, we might SPEAKER 1 well find thiss to be statistically significant, not have a SPEAKER 0 look at the third predictor here. The squared semi partial 0.216 went to 16 school and about 4.6% with rounding 4.7. So had this been a much larger sound aside, we might well have found that that test of statistical significance SPEAKER 1 showed a unique contribution. That's one of the things we have to do mine for when we're running an experiment that we have adequate statistical power to detect a meaningful effect. More, the more predictors we add to the regression equation, SPEAKER 0 the more variables are going to be sharing variance. So if we want to actually have statistical power, you also need to consider the sample size ofthe thie sample that you're collecting and how many predictors Otherwise, you know, SPEAKER 1 you could just throw. Many predicted soon this you'd like, but if you do that you loved on lack statistical power to fight a unique contribution like the effect of age or math B c. Here. So that's the semi partials. SPEAKER 0 And next week's tutorial. We're going to show you how to cheque your data SPEAKER 1 for problematic data points and violations of the assumption of model regression. SPEAKER 0 So we actually haven't done this? Yes. And that would generally the first thing you do before SPEAKER 1 you take a peek at your data. SPEAKER 0 We did want to expose you to mall for aggression SPEAKER 1 so you can practise this statistical technique for your assignment. SPEAKER 0 So regardless of which predictors you have chosen for the SPEAKER 1 assignment, you could actually enter those predictors once you have a donna file and then test whether they make a significant unique contribution, just a CZ we've done today with our mobile regression on gender, age and maths. BC

Use Quizgecko on...
Browser
Browser