PSY 2500 Lecture 16: Correlational Research PDF

PSY 2500 - Lecture16 \` Welcome back. Today we\'re going to start chapter 12. You should have noted in the syllabus and the schedule, that chapter 12 is covered before chapter 11. That is intentional because chapter 12 was on correlational research designs, which is a much simpler concept than chapter 11, which is on factorial designs. So, we\'re covering this first before getting into more complex designs. Now, just as a reminder, the five quantitative research strategies are descriptive correlational, experimental, quasi-experimental and non-experimental. We\'ve covered experimental which looks for causal relationships between variables, specifically cause and effect relationships, using the concepts of manipulation of an independent variable and control of extraneous variables. We just finished covering quasi-experimental and non-experimental designs. quasi experimental designs include pre post designs and non-equivalent group designs. Whereas non-experimental designs look at the relationship between two groups or treatment conditions. Quasi-experimental and non-experimental research strategies are very similar. There\'s also the descriptive research strategy which looks for a snapshot or to describe individual variables as they exist in a given population at a given time. And then today, we\'re covering the correlational research strategy, which looks for natural relationships between typically anyway two variables. So, this will be our focus. However, we will also talk about correlational designs towards the end of the lecture that involve more than two variables. Just to review quasi-experimental strategies and non-experimental strategies, you cannot with either of these research strategies establish unambiguous cause and effect relationships. That\'s because, inherently built into the designs of quasi experimental strategies and non-experimental strategies, confounding variables cannot be completely eliminated, so they don\'t have quite the rigor of experimental designs. Quasi experimental strategy designs attempt to minimize threats to internal validity, or in other words attempt to minimize confounds and alternative explanations. That\'s why quasi experimental are considered almost experiments because they approach the rigor of experimental research. Non-experimental designs are very, very similar. However, they typically do not minimize threats to internal validity, and so have less rigor than quasi experimental designs. Thus, the difference between these two approaches is the degree to which they limit confounding variables and control for threats to internal validity, whereby quasi-experimental designs make more of an effort to account for confounding variables and threats to internal validity than non-experimental designs do. Specifically, we covered nonequivalent groups designs, pre post designs, and developmental research designs. There were three types of non-equivalent group designs which are between subjects designs that are either non-experimental or quasi experimental. The first we covered is called the differential research design. This is nonexperimental. The differential research design divides participants up into different groups based on a preexisting characteristic or trait, such as male versus female, and then measures a dependent variable. The posttest only nonequivalent control group design is also considered nonexperimental. As the name suggests, this involves looking at two nonequivalent groups, such as two different schools, where one has let\'s say, for example, implemented a program and the other has not, and involves measuring both of these schools only once and comparing them for your dependent variable. There are of course, many potential differences between two different nonequivalent groups. So, confounding variables are always a threat to this design. Finally, the pretest posttest nonequivalent control group design is considered quasi experimental. Here using the same example as previously, let\'s say we\'re looking at two nonequivalent groups such as two different schools. Let\'s say we measure both groups prior to the implementation of a program. So, both groups we have a pretest measurement. One school gets some kind of treatment or new programs, such as a stop bullying program. We wait a certain amount of time and both schools are again measured a second time. So, we have a treatment group and a non-equivalent control group. Pre post designs are within subjects quasi-experimental or non-experimental designs. We covered two specifically. The one group pretest posttest design is nonexperimental. It involves testing one group of individuals prior to some type of treatment, and again, testing the same group once thereafter. Basically, you could have one group of participants respond to their feelings on a political candidate, for example, you have them watch a political campaign ad, and you ask them again how their feelings on that political candidate may have changed. This is a commonly used example of a one group pretest posttest design. A time series design is considered quasi-experimental. It involves a minimum of three measurements per phase, and it involves two phases. So, a minimum of three observations of the same individuals under the same conditions before implementing your treatment, and then a minimum of three observations of the same individuals in the same conditions after implementing your treatment. Finally, developmental research designs, there is cross sectional, which involves looking at different age groups, which is very similar to a differential research design. But it\'s not called a differential research design because age is the variable of interest. So, it\'s called a cross sectional design. For example, measuring group of five year old, a group of 10 year old, and group of 15 year olds, where age is your quasi-independent variable. This is an example of a cross sectional design. A longitudinal design is similar either to one group pretest posttest design or to a time series design. Depending on how often you bring individuals back for follow ups. It involves testing the same group of individuals multiple times over a prolonged period. Finally, cross sectional longitudinal is a cross between these first two developmental research designs. It involves taking multiple different age groups and following all of them over time. That brings us to correlational research. Just the correlational research strategy involves designs that describe the relationship between variables and measure the relationship strength. Essentially, correlational research establishes if a relationship exists, so we\'re looking at trends, not causal effects. It also describes the nature of the relationship. Is it positive versus negative, for example. Correlational research however, does not explain the relationship. It also does not manipulate, or control variables. Now the data from correlational designs involves two or more measurements of at least one measurement per variable, typically from the same person. We\'ll first talk about two variables only because this is the most basic and most common type of correlational research. So, for example, you could measure IQ and creativity. I could give everyone in the class an IQ test, like an intelligence inventory of some sort, and also a creativity test. Now, let\'s say each of these two tests gives them a score from zero to 100. Just for argument\'s sake, I could then look at whether IQ is related to creativity or in other words, as IQ goes up, do you also see an increase in creativity. So, the goal in correlational research is to identify patterns, and to measure the strength of a relationship or in other words, the strength of that pattern. A commonly used phrase here within the sciences is correlation does not equal causation. This example here provides an illustration of that. So, what we have here is, each country being a unit of measure here, and each of these countries, we have a measure of chocolate consumption per kilogram per year per capita on the x axis, and a measure of Nobel Prize winning laureates per 10 million people in the population on the y axis. Now, our value of our is our correlation coefficient is.791. We\'ll get into the statistics shortly, but that\'s a very strong correlation. And it\'s very unlikely to be spurious or an error because you can see our P value is less than.0001. What this suggests, if you didn\'t realize that correlation does not equal causation, is that eating chocolate makes you very, very smart because countries that eat more chocolate produce more Nobel Prize winning laureates. However, this is clearly not the case as much as we would love it to be, naturally since chocolate is fantastic, but is what\'s called a spurious correlation. So, what likely happening here is this is either a correlation by chance, or is something else has like a third variable issue. So perhaps it\'s not that Switzerland, Sweden, Germany, United Kingdom, Norway, Austria, Ireland and so on it\'s perhaps not the consumption of chocolate that is leading to more Nobel Prize winning laureates, but perhaps it\'s just coincidental that these countries enjoy their chocolate, but also have very good education programs. So, correlation does not equal causation. Here\'s some other examples. These aren\'t scatterplots just to be clear, these are just looking at data how they track one another. And the top graph is US spending on science, space, and technology, which is the green line. And the red line is suicides by hanging, strangulation, suffocation. It appears from this graph that these two things are very highly related as one goes up, the other tends to go up. But this is again called a spurious correlation, something that numerically appears to be related, but is unlikely to be causally related. The bottom example the green line is per capita consumption of cheese in the United States, whereas the blue line is number of people who die by becoming tangled in their bedsheets. Again, very unlikely to be causally related. Although perhaps eating too much cheese makes it easy to become tangled in your bedsheets, who knows, but perhaps unlikely. In most correlational studies, we measure two variables, although we can measure more than two as I mentioned, usually, this is giving us two scores, one that we mark on the x axis and one mark on the y axis. And these two scores are from the same individual. Sometimes you measure more than two, the variable in more than two different ways. And in this case, multiple scores are often grouped into pairs for the purposes of evaluation may be clear, we\'re looking at two scores from one source. The dot on the scatterplot will indicate the source, and where that is located should be the intersection of the x and y axis for those two variables. So, that source could be a person as is most common. Or the dot could represent a family, a group, an organization, a country, etc, that is treated as if it is a single individual. For example, we could look at the correlation between parent and child IQ, we could plot parent IQ on the x axis, child IQ on the X on the y axis, and then the dots would be where those two would intersect in each family. Another example, we could look at the correlation between country exports and wealth. And again, we would have a dot per country where number of exports could go on the x axis and country wealth could go on the y axis. Just to remind you from previous discussion on this topic, this is what a scatterplot looks like. This is a way of illustrating your correlations. Now here we have persons A, B, C, D and E. And for each of these individuals, we have two scores, an X score and a y score. So, this is just obviously illustrative. But we\'ve got scores for our x axis, just one variable scores for Y axis, which is a second variable, again, the x axis is the horizontal line, the y axis is the vertical line. With the x axis, you always have an increase in scores when going from left to right. With the y axis, you always have an increase in the scores from going from bottom to top. So, the higher the dots, the higher the score, or the more to the right the dot the higher the score for the y and x axes respectively. And again, each is one source. So here it\'s people, we have two variables measured per person, Person A, is on the far, the far left. So, their score for the x axis is one, their score for the y axis is three. So, if I were to draw a line straight out right, from the score three, from three going draw my line to the right, and then also draw from the x axis from the score one, a vertical line, that dot where a is where those two lines would intersect. You then do the same thing for all other individuals in your particular data set. And you look at the pattern of the data. Here we have an example of a positive correlation, a positive linear correlation specifically. So, if I were to draw the slope to calculate and draw the slope of the line that would best represent all of these points, that line would slope upwards towards the right. Essentially, what this tells us is as one variable goes up, the other variable tends to go up. So, there is a positive correlation. So, how do you measure these relationships, we use the value of r. This is called the correlation coefficient, and it\'s denoted by a lowercase r. So, the correlation coefficient is a numerical value that describes the relationship between two variables via three characteristics. It describes the direction of the relationship, the form of the relationship, and the strength or consistency of that relationship. We\'ll go into each of these concepts in a moment independently, but the direction we\'re talking positive or negative form, we\'re talking whether clusters around a straight line or curved line, and strength or consistency, we\'re talking about how predictable that relationship is. Correlation coefficients range from positive one to negative one, where a score of one whether it\'s positive or negative, refers to the perfect linear relationship. So, to be clear, positive and negative just indicates the direction of the relationship. And the numerical value of r, when you ignore the sign, positive or negative, tells you about the strength. So, something that has a value of r equals one means it\'s 100% predictable, as one variable goes up, the other goes up an equal amount. In reality, this almost never happens. So more often, you\'ll see a point something score. So again, you want to look for looking at a strength or consistency of a relationship, you want to look at the numerical value alone. So, if I have a value of r equals -.9, and have a separate correlation where the R value is +.6, that negative value is a stronger relationship than the positive value I just mentioned. So, as I just mentioned, the correlation coefficient measures the direction, form, and strength of the relationship. That first component, direction, refers to the sign of the correlation coefficient or the sign of the value of r, whether it is a positive sign or negative sign. And that sign, plus or minus simply indicates the direction of the relationship. If it is a positive sign, that means you have a positive relationship. So that means as a variable x increases, a variable y also increases. Your variables are thus related or trending in the same direction, and that yields a positive value of r. An example of a positive relationship would be the relationship between height and weight. Although not a perfectly 100% predictable relationship, tall people tend to weigh more because they have more skeletal and muscle mass. So, if one variable goes up, the other also tends to go up. When you have a negative sign for a value of r, that indicates a negative relationship. This means that as one variable such as variable x increases, the other variable y decreases, so they are inversely related, or they go in opposite directions. This yields a negative value of r. An example of this would be speed and accuracy. The faster someone is at trying to hit a target, for example, the less likely they are to accurately hit said target. The direction of the relationship is shown on your scatterplot. As you can see, in the top example, a positive relationship slopes upward towards the right because as one goes up, the other tends to increase as well. Whereas a negative relationship slopes downward towards the right, because as one goes up, the other tends to go down. The form of the relationship refers to the consistent and predictable relationship between two variables. And that consistent predictable relationship between two variables can cluster around either a straight line or a curved line. When they stress the cluster around a straight line. This is called a linear relationship. So, data points cluster around a straight line. In other words, if we were to calculate the slope of the line, you could for best fit, find a line that basically approximates a straight line sloping up to the right if it\'s positive or down to the right if it\'s negative. Most often we are measuring a linear relationship. So, this is the most common, and we measure it using a specific correlation coefficient test called Pearson\'s r, or Pearson\'s correlation coefficient. This is a different statistical test than what we use to measure a monotonic relationship. A monotonic relationship, also referred to as a curvilinear relationship is when changes in one variable tend to accompany changes in another variable, but the amount is not constant. So, you\'re not seeing points clustering around a straight line. Instead, it\'s a curvilinear relationship whereby a lot the points clustered around a curved line. And this curved line is still one directional, meaning it will still be consistently positive or consistently negative. So that curved line can slope upward towards the right or downward towards the right. For a monotonic relationship, when that is expected, we would measure it using Spearman's r, or Spearman's correlation coefficient. An example of this would be the relationship between practice and skill. Now at first, the more you practice some new task, let\'s say learning to play guitar, more you practice, the better you get. And that\'s a pretty quick relationship, a little bit of practice leads to a big improvement when you\'re first learning a skill. But after a while, you get pretty good at it. And so, the same amount of practice yields much less of an increase in skill. So that relationship seems to taper off after a point. So that you will need significantly more practice to see an increase or a comparable increase in skill, after a certain amount of time. They are still related, just how strong that relationship is, will vary. And finally, the strength or consistency of the relationship, this is referring to the degree of consistency as measured by the correlation coefficient, or, in other words, the value of r. So, the value of r, just the number alone, will range from zero to one. When including the signs, that goes from positive one to negative one. But remember, the value is what is telling us about the strength of the relationship or in other words, how predictable that relationship is. The sign of positive or negative is only indicating the direction. So, the value of r tells us how much the relationship deviates from 100% predictable. If you have a value of r that is either positive or negative, but equals one, that means it\'s 100% predictable that relationship so you have a perfect relationship. In other words, however, if you have a value of r that is zero or close to zero, that means there is no relationship. So basically, r the value of r, not looking at the sign, describes the consistency and strength of that observed trend. However, regardless of how strong the relationship is, it never equals a cause-and-effect relationship because it doesn\'t control for confidence in any way. So, there are always alternative explanations possible when looking at a correlational design. So, what are these look like? Here we have a scatterplot, which is indicating direction and strength of a relationship. We have our x axis on the horizontal line and our Y axis in the vertical line. And if we were to figure out the line that best fits all of those points, we could see that it would slope upward towards the right. And therefore, this demonstrates or illustrates a strong positive relationship. So, it also be indicative of a linear relationship, because the points would roughly cluster around a straight line. But what about in this example here? Now if we just look at the points that are within the darker oval, it still looks like a very strong relationship. In fact, it still looks like a very strong positive linear relationship. But what about the two additional points found between the dark oval and the wider dotted line blue circle. Those two additional points do not conform to the otherwise observed pattern. Those specific points are referred to as outliers. So, an outlier is any point within a scatterplot that does not actually conform to an observed trend falls far out with the norm of the rest of that sample. In other words, outliers can be problematic because to include these two data points in your correlation coefficient will make it appear that the relationship is weaker than perhaps it really is. Because it\'s possible that these outliers are reflecting true data, or that something else is going on. For example, let\'s say that our Y axis is happiness. The point on the bottom could be someone who recently lost someone close to them. Whereas a point on the top could be someone who recently won the lottery. So, these are events that are out with normal behavior and events. And if you dig a little bit deeper as a researcher into what\'s going on behind these outliers, you may actually have reason to exclude them from your analyses, in order to better illustrate the true relationship. But you must always have a valid reason for excluding data points before you actually go ahead and do it. So, you can\'t just exclude outliers because they don\'t conform to your hypothesis. For example, if you do, however, have a good reason to delete outliers from your data set, then power to just make sure that you\'re not doing it willy nilly just because you feel like it, you have to be justified. Here\'s an example of a strong negative relationship. Again, you can see this the linear relationship, whereby the points are roughly clustering around a straight line that slopes in this case downward towards the right, so as one variable goes up, the other variable tends to go down. And this is what no real relationship looks like in terms of a scatterplot. You can see easily in this illustration that there is no real trend in the data. The points are kind of all over the place and aren\'t accompanied by any kind of pattern or conforming to any kind of pattern. Now, based on the information you just received, you can probably guess which of the scatter plots shown above plot A, B, C, and D corresponds to each of these are values. So, where r equals -.23, do you think in your head is this A, B, C, or D? So, we\'re looking for one that\'s sloping downwards in a negative direction, so downward towards the right, that\'s if you answered A that is correct. Know that this is a relatively weak correlation,.23. So, correlation coefficient isn\'t huge, which is why a lot of the points in plot A are not very closely clustered together. But there is still some evidence of a trend whereby, albeit a weak one, whereby as scores go for one variable go up, they go down to the other variable. The remaining three scores are all positive. So, we should be looking for ones that slope upward towards the right. Looking at these plots, B, C, and D, us, which would you think would be our equals.33? The answer is B. So, you can see again, this is a relatively weak relationship, but there is still evidence of, if you were to figure out the average slope of all these points, it would be a line that slopes upward towards the right. What about for.4? Well, we should be looking for points that cluster a little bit more tightly together, but still slope upward towards the right, the answer here is C. And finally, a moderate relationship, r equals positive.61, that would be D. So, as you can see, as the value of r gets higher, the points on your scatterplot get more and more tightly clustered together. And then that\'s more closely approximate a straight line. What about what these examples, these ones are a little bit more tricky. Which of these do you think would be r equals -.85. So, we\'re looking for a negative relationship, meaning if we were to calculate the slope, it should be a line sloping downward towards the right. And.85 is reasonably high. The answer here would be B. Now what about a negative relationship of.75? Interestingly enough, the answer here is D. Now this may be confusing you somewhat because it looks like D has points that are more closely approximated together, they\'re more closely clustering around that invisible line, then plot B, but plot B has a higher value of r. So how does that make sense? Looking at plot D, you can see that there are some outliers. There are some points that are not really following in the trend and are not clustering around as tightly as the other points are around that invisible line. So, the Example plot B versus plot D is to illustrate how those outliers can have an impact on the apparent effect size of your relationship. So, although the number is making it appear as if B is a stronger relationship than D is, this is really because of the presence of these outliers. And if I were to remove those couple of outliers from plot D, I would be showing a stronger relationship than I see in plot B. However, as I mentioned before, you have to have a very good reason before you can just start removing data points from your data. And so, assuming we don\'t have that justification here, we would just have to put up with it. Now what about -.29? The answer there would be a very weak relationship. But if you\'re trying to figure out what that line would be, it would be indeed sloping downward towards the right. Leaving us finally with r equals.75, the answer here would be C. So, we\'ve been talking about the correlational research strategy. And previously, we talked about the experimental research strategy, and the differential research design, which is a which falls under the category of non-experimental research strategy. It\'s important, however, to again emphasize the differences between these three approaches. And it's experimental design has the goal of establishing a cause-and-effect relationship between variables. This is done through manipulation of an independent variable, measurement of a dependent variable and control of extraneous variables. So, for example, we could look at testosterone and self-esteem, use an experimental strategy. By manipulating testosterone level, let\'s say by randomly allocating participants into groups, one group receives androgel, which is a synthetic testosterone and the other receives a placebo gel, and then we measure our dependent variable which is self-esteem in both groups. We could also look at this exact same question in correlational design. So, the goal of correlational design is to demonstrate the existence of a relationship. The correlational design does not explain, manipulate or control other variables. For example, if we were looking at that question, using a correlational design, we would simply take a group of people, measure their testosterone and also measure their self esteem, see if they are related. Finally, the differential research design, which again is nonexperimental, the goal of which is to establish that an existence of a relationship is present. And but, we establish this existence by comparing differences between groups, one variable is used a preexisting trait is used to create groups. And a second variable, which is a dependent variable is measured across groups. So, to look at the same sort of research question, using a non-experimental differential research design, we could first measure our group of participants for testosterone level, and then place them into a high testosterone level group, or a low testosterone level group. Depending on what that measurement revealed. Then we would measure self-esteem in both groups and compare. So, you can see that we can get at similar research questions using each of these approaches, but that these strategies are different in how they investigate these measures. So, going back to correlational research, what would you predict here, researchers find a negative relationship between GPA and number of hours spent playing video games among high school boys. Based on that trend, what grades would you predict for boys playing above average hours of video games? Since we found a negative relationship, that means as one variable goes up, the other goes down. So, if boys based on this research are playing above our average hours of video games, so high number of video game hours, that means their GPA, if it\'s following this trend would be expected to be lower. You can also use correlational designs when you have non numerical scores. Now as you know already, sometimes you have nominal data, for example. So, sometimes scores do not have an inherent numerical value. One example using two non-numerical scores would be if you\'re looking at the relationship between gender and success in a problem solving task, this is to non-numerical dichotomous variables gender, male, female, and problem solving succeed fail. We\'ll come back to that example. This example when we talk about two non-numerical variables. If for example, however you have one non numerical variable in your study, then you have two options, you could convert the non-numerical score into a number. For example, male equals zero female equals one. Then you can compute Pearson\'s correlation coefficient looking for a linear relationship, except here, it\'s called a PI Point by serial correlation. It\'s called a point by serial correlation, because one, it\'s basically recognition that use a non-numerical variable that was converted into a number. Now when running a point by serial correlation, your sign of positive versus negative is effectively meaningless. It\'s meaningless because you arbitrarily assigned numbers to each of your variables from the non-numerical variable. Now, although this is possible, it\'s more common to instead use option two. That would be to switch to a non-experimental differential research design. When using this option, your non numerical scores would be used to separate into groups. So basically, whatever your non numerical variable is, is the trait used to categorize your participants. So male participants in one group female participants in the other group, then you would run your standard t tests and ANOVA as discussed previously, when both of your variables are non-numerical, you still have two options. If using the example from earlier where we\'re looking at the relationship between gender, and success in a problem-solving task, you could organize your data into matrix. Here one variable forms the rows, the other variable form the columns. So, you have success in the problem-solving task, the outcome when loss forming the columns, and gender male female forming the rows, then, the value in each of the cells represents the number per that category. So, number of males who won is 12 number of males who lost was eight number of females who won was 17 number of females who lost was three. Then you can compute a chi square and compare the proportions of winners versus losers for male and female participants. This is most commonly what is done when you have two non-numerical variables. But you do have an alternative option, you could compute both non numerical scores into numbers. And this can only be done when there are only two categories. So, for example, male equals zero female equals one loss equals zero when equals one and then running Pearson\'s r or Pearson\'s correlation coefficient, you can compute the correlation, although in this case, because you arbitrarily assigned two sets of numbers, you call it a phi coefficient. So here the concept of a linear versus curvilinear relationship is meaningless, because it will be linear since you applied to arbitrary dichotomous numbers, 01, etc. Also, the sign will also be meaningless because the sign will basically be dependent on what you\'ve assigned each category to numerically. So, we only use the phi coefficient to measure the strength of the relationship only and nothing else. So, what are the applications of correlational designs? They are most useful in prediction. So basically, using one variable to predict another variable or variables. When you have a consistent relationship, that means that one variable predicts another. And this is commonly used in a series of applied situations, such as warning signs of suicide, relapse, to drug taking, predictors of longevity, and so on. All of these so-called predictors are things that are correlated with something else. Now within a correlational design, both variables that you\'re looking at are technically equal. However, that being said, it is typical for one to be used to predict the other. So, often you know something about one variable and you\'re using, you\'re trying to see how its relationship with another variable can predict it. Now a different way to investigate this same type of relationship is a different statistical technique called regression. Regression is the statistical process of using one or more variables to predict another. Now the variables that are used being used to predict the other variable are called predictor variables or predictors. And the variable they are trying to predict is called the criterion variable. So, a predictor variable is used to predict another. This is typically the variable we know more about. And we\'re trying to use knowledge about this one variable is predictor to project the criterion variable. The criterion variable is predicted by the predictor variable or variables, as we\'ll see in a moment. So, when you look at one predictor, and use it to predict one criterion variable, this is just a different way of running a correlational test. Please do note the change in terminology. However, we no longer have a dependent variable. If you\'re using a correlational design, you don\'t have you don\'t have a dependent variable, you do not have an independent variable and you do not have a quasi-independent variable. Instead, you have a predictor and a criterion variable. In some cases, you may have more than one predictive variable, but you always have one criterion variable. It is important to identify your predictor or predictors, and criterion variable if you are planning to use a correlational design for your final project. Now an example of this let\'s say you have to do the graduate record examination the graduate record exam and his examination is a standardized test for individuals hoping to apply to graduate school. Now, they use they make graduate want to be students go through the graduate record examination because they use it as a predictor for grade point average in graduate school. So, in this example, the scores on the GRE is your predictor variable. And your GPA in graduate school is your criterion variable, the thing you\'re trying to predict. So, the criterion variable is the variable of interest. We also use correlational strategies to determine reliability and validity of a measure. And just to remind you, reliability refers to the consistency and stability of measurements, whereas validity refers to whether or not the measurement procedure measures what it claims to. Both are defined by relationships and established using correlations. For example, test-retest reliability is the relationship between the original scores and the follow up measurements of the same construct, we\'re using the same measure. Concurrent validity is when test scores are strongly related to an established test measuring the same construct. So, we determine reliability and validity of a measure using correlations. When it comes to evaluating theories, theories tend to generate questions about relationships. For example, there are twin studies. Now twin studies are when you look at identical twins, oftentimes when they\'re reared in different environments, and that way you can try and gauge what aspects of behavior are nature versus nurture. One such example looked at nature nurture, but when it came to the smoking debate, Kenler et al. in 2000 did indeed find a small correlation in tobacco use among twins separated at birth, which suggests that genetics plays a small role in tobacco use. However, they found a much larger correlation among twins that were raised together, which suggests that the environment plays a larger role than genetics when it comes to smoking related behaviors. Thus, correlations can address theoretical issues. And this is a common application of correlational data to basically see if the predicted trends exist in the wider population, as in the predict the predicted trends, as predicted from the theory. The Weinstein article that you\'re meant to have read by now does showcase this exact kind of an idea. So how do you interpret your correlation coefficient? The numerical value ranges from zero to one where positive or negative one means or indicates a perfect 100% predictable relationship. The direction refers to the sign positive is a positive relationship. So as one goes up, the other also goes up in terms of your two variables are bi negative as a negative relationship. So as one goes up, the other goes down. There\'s also something called the coefficient of determination. Now we talked about the correlation coefficient, which is odd. The coefficient of determination is the squared value of the correlation coefficient. So, it\'s r squared. The coefficient of determination tells you the amount of variability in one variable that is predicted by its relationship with the other variable. So, for example, let\'s say we have a correlation between IQ and GPA where r equals point eight zero. If we were to square that, to figure out our coefficient of determination, that means r squared equals point six four. That means that 64% of the differences in GPA are predicted by IQ. Or in other words, 64% of the variance in an in GPA is explained by people\'s IQ. There\'s also your statistical significance, also known as your p value. So, we\'ve been talking about measures of effect size, basically, now we\'re going to talk about measures of statistical significance. p value is what we commonly use. And when we have something that is statistically significant, meaning that your r value is of a p less than.05. That significance just means that it\'s unlikely to have been produced by random variation. In other words, it means your results are unlikely to be an error, an error in terms of a calculation that is, you could still of course, be wrong. Now when you have a larger sample, you\'re more likely to reveal a so-called real relationship because remember the law of large numbers, the larger the sample, the more likely the sample means approximate those of the general population. You can also however, have a p value that is statistically significant, despite having a weak correlation. These are two different constructs basically, or concepts. Your correlation coefficient is telling you how big or how predictable the effect is, whereby your p value is telling you how likely it is that it was an error. But your calculations are incorrect. So, you can have statistically significant correlations that are also very small. Because significance does not mean it\'s a strong correlation. There are some guidelines with respect to how strong is considered a small versus medium versus large effect when it comes to correlations. It also depends on whether you\'re looking at behavioral research versus non behavioral research. So, here\'s some guidelines for interpreting the strength of a correlation. A small relationship would be an r value of around.1 or lower or an r squared of.01, which is that which means that your coefficient of determination is explaining 1% of the variance. A medium effect would be an r of.3 or an r squared of.09. So, 9% of the variance explained. And a large effect would be an r.5 or an r squared of.25, which is 25% of the variance explained the variance in your criterion variable explained by your predictor variable that is. Now we\'ve been talking about behavioral research. However, these guidelines are with respect to behavior. And that\'s because large individual differences make the ability to predict a small amount of behavior differences in behavior a relatively large accomplishment. When looking at correlations in non-behavioral related data, you that would require a much larger r for should be considered a large or medium effect. So, for example, test-retest reliability should be above r equals.8. There are some strengths of correlational strategy. It\'s very useful in preliminary work, for example, so perhaps you\'ve never there\'s no research to date on the relationship between two variables. Well, it\'d be quite simple to just record what exists. So, see if there is any evidence in the general population of a trend that might suggest there\'s more of a relationship. So, oftentimes correlational research is kind of the first step in a new research area. It also identifies relationship for further investigation. So, perhaps you find something using a correlation and then you can investigate the variables further using other research strategies, it could lead you to experimental designs for example. Similarly, correlational work can be used to investigate variables that are impossible or an unethical to manipulate and another strength of the correlational research strategies that it has very high external validity. There are of course, weaknesses of correlational designs, they offer no clear, unambiguous explanation, unlike experiments, and that\'s because you cannot infer cause and effect relationships from correlational results. Correlation does not equal causation. Also, despite having good external validity, or in other words, good generalizability, they have very low internal validity. That\'s because it\'s common for there to be the presence of confounds. There are two major sources of confounds or two major limitations of the correlational research strategy. The first is a third variable problem. And the second is the directionality problem, both of which are concepts we\'ve covered previously. But just to remember, just to remind you, the third variable problem is when there is the presence of a confound that is actually explaining the perceived relationship. So, let\'s say you find a correlation between variable a and variable b. Well, although it appears that variable A is acting on variable B, it could actually be a variable A is related to the variable X, which is related to variable B, but that variable A and variable B independently are not related to each other. So, this is the third variable problem. The directionality problem is when you do not know which is the cause, and which is the effect. So perhaps you see a correlation between A and B, but you don\'t know if A is causing B, or if B is causing A. These are the two major limitations that the experimental research strategy attempts to overcome. Also, correlational research is very easy to misinterpret, in particular by media outlets, and so on. So, let\'s start with your research. You have a conclusion A is correlated with B given C assuming D and E under certain conditions, and so on, which of course gets translated by the university PR office. Every university has a public relations office or media office of some sort, and they are responsible for releasing press releases. So, for media release, scientists find potential link between A and B under certain conditions the small print that gets picked up by news wire organizations, A causes B, say scientists. Who are read by people on the internet, scientists out to kill us again. Then noticed by cable news, A causes B all the time what will this mean for Obama? Bring to local Eyewitness News, A the killer among us, what you don\'t know about A can kill you. And finally, next thing you know, you\'ve got your grandma who is wearing a tin hat to ward off A. So obviously, this is a bit of an exaggerated illustration, courtesy of what\'s called PhD comics. But it does get the point across that oftentimes people who don\'t have training on research methods don\'t understand the difference between correlation and cause and effect relationships, which can lead them to misinterpret the results of scientists. So, always think critically when reading things from the media. Now, Weinstein, as you all know, from having read the article, was interested in how health studies are often guided by theories of health behaviors. Weinstein argued that rigorous theory testing is needed within the health behavior fields. This argument was made because of the excessive reliance on correlational designs that Weinstein noted basically, Weinstein noted that there was no experimental or limited experimental data when it came to theories that guide health behaviors. Basically, perceptions receive the most attention in health behavior theories, perceptions refers to beliefs, attitudes, intentions, and so on. But when we\'re relying very exclusively on correlational design to test theories of health behavior, there\'s the potential for a directionality problem. Do perceptions cause behavior? Or could behavior be causing perceptions? There\'s also a third variable problem. Could prior experience be playing a role? What about the present state current events for example, the swine flu scare that happened or a bird flu scare or whatever the case may be, whatever the current flu is going around, also could intentions mediate the relationship between perception and behavior. So, once he argued that correlations overestimate the ability of cognitive oriented theories to explain behaviors, but once he does concede that too much control equals a loss of external validity. So, Weinstein does at least acknowledge that there is that tradeoff between your internal validity versus your external validity. And while correlational designs do have high external validity, they lose internal validity and of course, internal validity is very important. So, Weinstein came up with what\'s called a path analysis. So, path analyses are used to represent alternative models of causal links between perceptions and behavior, at least in this example. So, here\'s subscripts 01 and two refer to successive times of measurement. And the lowercase letters are standardized path coefficients. Now you don\'t really need to know what that means that\'s fine for now, but this basically is illustrating how the path between perception and behavior can effectively change. So, we have perception one acting on behavior two for instance, a very straightforward path in example, a which is commonly assumed to be the case, but you\'ve got B where you have a behavior acting on a perception and also acting on a second behavior. An example C, we have perception one acting on behavior two and we have behavior one acting on perception one and also acting on behavior two and things can get even more complex than that. Looking at D here we\'ve got perception zero acting on perception one and perception zero acting on behavior one, behavior one acting on perception one, both behavior one and perception one acting on behavior two, and so on. So, the point Weinstein\'s trying to make is that there is a conundrum not controlling for prior behavior leads to overstating the effects of perceptions on behavior, but controlling prior behavior let me underestimates these effects. And so, based on the possibility of mediating relationships between perceptions and behavior, the confidence currently held an existing theories of health behaviors is likely unwarranted. Weinstein argues that more experiments and quasi experiments are definitely needed to test these theories. And that cause-and-effect information will be key to developing health behavior treatments. What about when you have more than two variables? I did mention earlier on that there are cases where you can look at correlational designs that have more than one predictor variable. Most of the time, we look at how one variable is related to one second variable on a correlational design. Usually, they only have two variables. But as you know from our multiple discussions on compounds, one variable is usually related to multiple other variables. So, we know GPA is related to IQ. But it\'s also related to motivation, tenacity, parental influence, and a host of other things. multiple regression is a statistical technique that uses two or more predictor variables to predict one criterion variable. So, for example, let\'s say our criterion variable is GPA grade point average, we could look at how motivation and IQ predict GPA. So, in this example, motivation and IQ are predictors. GPAs are criterion variable. multiple regression allows you to examine the relationship between two variables while controlling for the influence of other variables, for example, potential confounding variables. You can also enter predictors one at a time to see independent contributions of each. So, multiple regression models will tell you about the influence of all of your predictors together, and the influence of each of your predictors independently while controlling for the influence of the other predictors in predicting your criterion variable. And I know that\'s a bit of a mouthful, but basically, you can see how multiple variables working together can explain differences between people in your criterion variable. Now, remember, predictor variables only predict so even if you\'re trying to control for potential confounds by using multiple regression and looking at multiple predictors at once, you\'re still not explaining your relationship. So, it is in that respect describing the relationship only. So, to review, with correlational research, we examine and measure the strength of relationships between variables. Usually, this is two variables online. And in these cases, the scatterplot is used to illustrate the characteristics of the relationship specifically the direction, form and strength or degree of consistency of that relationship, which is also measured using the correlation coefficient or R. This is useful there. correlational research is useful for predictions to establish validity and reliability and of course to evaluate theories, although they should not be used exclusively to evaluate theories as the Weinstein article points out. The major disadvantage of correlational designs is they cannot just determine causation because of the constant threats of a third variable problem and the directionality problem. As before, if you have any questions, please post them to the forum in Moodle or send me an email

PSY 2500 Lecture 16: Correlational Research PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue