Regression Diagnostics Transcript PDF
Document Details
Uploaded by MesmerizedPeridot
Griffith University
Tags
Summary
This transcript discusses checking for multivariate outliers using Mahalanobis distance. It explains how to identify and address issues with residuals, and how these can affect normality and plasticity of data. The video further discusses the importance of checking the normality of variables.
Full Transcript
SPEAKER 0 Hello. Everyone in this video we are going to be checking for Multivariate outweighs using Mahala enormous distance. This is another thing that you do when you have issues with residuals in terms of normality or plasticity, might help bring things together more so multiple Outlying persona...
SPEAKER 0 Hello. Everyone in this video we are going to be checking for Multivariate outweighs using Mahala enormous distance. This is another thing that you do when you have issues with residuals in terms of normality or plasticity, might help bring things together more so multiple Outlying personal is a data point, which is an outlying due to a combination of any individual. So, for instance, a 17 year old make $100,000 a year might be a moderate outline, not because it's on the herd up to 17 or because it's incredibly unusual, earning $100,000 a year. But the combination of those two factors makes it a bit now. What Mohammed distance does is that it provides us with a value for the 10 most extreme data points. Each of those dying of signifies how distance distance data point is from the centre, this multi dimensional normal that would create further away. The number that we are provided with is going to be based on a skyscraper distribution, so just like you would with test on we're going to do is compare the obtained value. It's a critical value to judge whether we have any multi, very outlines. So are obtained. Value is larger than our critical value because they were cultivated outline. So let's run through the steps involved in this now. Personally, it's pretty simple. So here we have our regressions in tax that we've been using so far. We have our variables. Top gear maths, BC age course or defined dependent variable. We've asked for all the things that you want, including things that we need to cheque our assumptions to get a Mulholland office distance. It's just a matter of adding a find a bit to your residual simplex. So at the end, the residuals thought here, just type in our flyers and specify Mohammed on DH. That's it. So let's run it on what we get. Okay, so we can see here we have our standard of rational quits way on summary or an Alba coefficients table with your relations portion correlations and then the bottom. Here we have something. So this is the Holland this distance so you can see it's listed the 10 most extreme points starting by with the most extreme one, and going down we're interested in is thiss value here under statistics. So this is our obtained by we're going to compare to a critical value will see in a guy square they cry square table will be on Gryphon for you to access at any time you want even downloaded and use it when you need it. When you get your assignments, Mitchell, you have to do it so way. Know what? Our highest values this case 9.53 and then we go to our chi square table and we find the critical life. So for my longest distance, we're going to be using the point seriously. One probability and then we're just gonna track across from the degrees of freedom, the corresponding model. How about the deliveries of freedom in a very outlawed correspond to is simply the amount of particulars. So in this case, you have to predictors age on Matt specie we have we have to. So we just go across from to all the way over to Quincy is your one. And here we have our car got stolen. If our obtained value is larger than this, we considered it a multi. Very now, in all the other sets, our lives I was 9.53 So we would simply say we do not have any more how players and move on. However, if in your assigned you do have you different, basically Multivariate out lawyers are problematic enough that they tend to have some sort of effect that we might not discover until not to be dramatic. But, you know, we can't really see exactly the effect. So when you find one just removed from the data set and then re run the analysis, then you want in your results. Action? No way. Found a couple of our flyers. According my long distance will remove them. This is Theo effect that it had. Okay, that is it for multi. Very alive. So next video will be looking at normality off variable. So we've included