2. Probability Testing [2] copy.docx

CRITICAL READING: CORNELL NOTES Probability Testing [2] Name: Date: 14 September 2023 Section: Lecture 2 Period: Questions/Main Ideas/Vocabulary Notes/Answers/Definitions/Examples/Sentences Sampling Distribution If we repeat this process over and over, we can obtain the sampling distribution. This shows us that the most likely thing we are going to find is correlations of close to zero. In a way, this is a bit like testing the null hypothesis because we can see the relative probability of obtaining correlations of different strengths when the true population correlation is equal to zero. NHST In reality, we don't test the null hypothesis using simulations. We can use the well-known properties of sampling distributions to determine the likelihood of obtaining a particular test statistic (if the null hypothesis were true). Type I Error (A False Positive) If we set our α to p = .05, then this is like saying that we are accepting that 5% of the time the true population is zero, and what we have observed is simply due to chance. Type II Error (A False Negative) The opposite is also possible; there may actually be an effect in the population and due to sampling error, we may fail to find it. Sampling Error Any time we take a sample from a population, there is going to be sampling error. Because of this, anything which we observe in our sample could possibly be a false positive or a false negative. Replication Crisis The replication crisis in psychology is arguably due in part to a lack of understanding of NHST and p-values, and that error is an inherent part of the framework. Failure to replicate is in part simply due to sampling error. If a paper gets published and the underlying effect is a false positive, then it is unsurprising that subsequent studies cannot replicate the finding. This is compounded by the fact that there is a bias toward publishing research that has significant results. It is hard to get a study that has a null result published. A type 1 error (false positive) has a better chance of getting published than a correct rejection. Improving the Overall Reliability of Psychological Research One obvious thing to do is to spend more time in research methods classes trying to teach students more effectively about how null hypothesis significance testing works and what p-values can and can't tell us. The better we are able to understand these fundamental concepts, the better our ability to interpret our own findings and those of other researchers. Don’t Just Rely on p-values, Include Effect Sizes p < .05. Great, it is significant, therefore we have found something interesting and important, right? Not necessarily. We should always include some form of effect size when reporting our findings; this can tell us about the relative strength of the effect we have observed. Just because something is of statistical significance doesn't mean it is of psychological significance. As sample size increases, even small effects can be significant. For a t-tests, a Cohen’s d is a nice interpretable effect size as it is a measure of the difference between two means expressed in standard deviations. r-values and R2 values (such as from a regression), are also easy to interpret and to report. It is possible to convert just about any test statistic into an R value. Where Possible, Report Whole p-values It has been suggested that reporting p-values just in terms of the alpha criterion reinforces the idea that meeting this criterion is the be-all and end-all of statistical decision making. APA guidelines now suggest we report exact p-values where possible. p = .049 and p = .011 are both significant at an alpha of .05 but one of them is only just significant. Sometimes, reporting exact p-values isn’t easy to do especially you’re summarising data in a table. In this case, you might provide a range of alpha criteria like p < .05, p < .01, p < .001. Using Appropriate Analyses The most widely used forms of analyses are known as parametric tests. These include t-tests, ANOVA, regression and correlations. Parametric tests rely upon a number of assumptions about the properties of distributions such as normality and homogeneity of variance. If your data violate more than one of these assumptions, then you should employ more appropriate analyses such as nonparametric tests. If you don’t, you run the risk of finding positive or a false negative. Apply Appropriate Corrections When Making Multiple Comparisons When we set an alpha criterion at .05, we are accepting that there is a 5% chance that what we are observing might be a false positive (under the assumption that the null hypothesis is true). If we do one test and the probability of observing a false positive is 5%, it follows that if we do multiple tests, then the probability of observing a false positive in at least one of these tests increases. This can be particularly problematic for ANOVA when we perform post-hoc multiple comparisons. One solution is to apply a correction to the p-values. A commonly applied is to use the Bonferroni correction. This multiples the p-values by the number of comparisons you’re making. It then takes the next smallest p-value and multiplies this by the number of comparisons minus 1. The Holm correction is probably the preferred approach.

2. Probability Testing [2] copy.docx

Document Details

Related

Full Transcript