Lecture 1 and 2 Revision Notes and FAQs.docx
Document Details

Uploaded by JudiciousNephrite2042
Full Transcript
Lecture 1 and 2 revision – basic concepts Key terms and revision Null hypothesis: The null hypothesis (denoted as H0) is a statement that there is no effect or no difference between two variables or groups. It is the default or starting position that researchers aim to disprove through their study....
Lecture 1 and 2 revision – basic concepts Key terms and revision Null hypothesis: The null hypothesis (denoted as H0) is a statement that there is no effect or no difference between two variables or groups. It is the default or starting position that researchers aim to disprove through their study. Some key points about the null hypothesis: It represents the idea that any observed difference or relationship in the sample data is due to chance alone, and not due to a real effect in the population. It is usually stated as an equality, such as "there is no difference in mean test scores between the control and treatment groups" or "the correlation between variables X and Y is zero." Rejecting the null hypothesis means there is sufficient evidence to conclude there is a real effect or relationship in the population. The alternative hypothesis (H1 or Ha) represents the idea that there is an effect or relationship in the population Variance is a statistical measure that quantifies the spread or dispersion of values within a dataset. It helps us understand how much individual data points deviate from the central tendency (such as the mean). There are two main types of variance: sample variance and population variance Why do we care? Why do we do stats in psychology? Brings up the idea of variance. Imagine you are a physicist, and we are measuring the speed of light – the speed of light is always the same – it doesn’t matter what time of day it is – it’s always the same – it’s always that value. Some sciences can measure precisely what they are talking about, but others can’t – like psychology we are dealing with humans like us three and that introduces variance Basic premise is we are claiming that our treatment of what we are doing has an effect – but is the effect due to some natural variation over time, or is it due to our intervention We are always talking about variance – t-test, ANOVA, regression etc Population variance and sample variance Calculate sample variance when your dataset represents a sample taken from a larger population of interest. Calculate population variance when your dataset represents the entire population (i.e., every value you’re interested in). Notations Σ = sum of n = sample size Xi = number in sample μ = population mean M = sample mean i = holds for every person in the population Unbiased estimate of population variance /n-1 Estimating population variance (σ2) Sample mean – population mean (Xi – μ) Square it (2) Plug it into the formula /sample size and sum of them Three different estimators here (read from left to right) – estimates of the population variance ( 1 ) Assumes we know the population mean and generally we don’t. Sampling distribution is most accurate as the average is actual population variance, less wild estimates (2) Underestimating population variance with sampling distribution lots of around zero. (3) Very close to true population variance as the estimator is not as biased, some wild estimates, corrected for. Squared deviation terms will be underestimated using n – 1 allows you to correct for that to make the result bigger. N – 1 has a larger proportional effect when n is small. When n is larger the sample mean is likely to be a good estimator of the population mean. Example: Random sample of 3 with normally distributed population – mean of 10 and variance of 1 EXAMPLE 1: unbiased estimator using the population mean Sample Xi Sample Score Xi – μ Sample score – population mean (Xi – μ)2 10 10 – 10 = 0 0 11 11 – 10 = 1 1 11 11 – 10 = 1 1 Population variance estimate is low – however this doesn’t allow us to say there is a bias. Build a sampling distribution - you would repeat this process to build up the estimates and sampling distribution to you get to a point like the chart below. Population variance is 1 – on average – this is unbiased estimator Lots of samples where samples are less than 1 – this is underestimating. When we did 10k sample the estimate is almost 1 – this is an unbiased estimator. Bias is a property of the estimator. Example 2 - biased estimator using the sample mean We take each sample score minus sample mean instead of the population mean Xi Xi – M (Xi – M)2 10 10 – 10.67 = -.67 0.44 11 11 – 10.67 =.33 0.11 11 11 – 10.67 =.33 0.11 Example 2 Repeat the process – in this example repeat 10k times to calculate the sampling distibution This is showing that the estimator of population variance is.67 – given that the actual population variance is 1 – this is evidence that the estimator is bias to underestimate the population variance. Example 3 –unbiased estimator using the sample mean Note – n = 3 so our denominator is 3 – 1 = 2 (on the bottom line) Nice comparison to the true value – the estimator isn’t biased Power Why do we need a test to have 80% statistical power It means we have 80% chance of rejecting the null hypothesis given the null hypothesis is false and given A particular sample sizes A particular sample effect size A particular alpha level (often.05 probability of rejecting a true null hypothesis) Other considerations including those related to whether the assumptions of the test are satisfied. With 80% power, if the true effect size is X, then in 80% of studies we would expect to detect a significant effect at the chosen alpha level (e.g. p<0.05). The other 20% would fail to detect the effect In summary, 80% power is a widely accepted standard that balances the risks of false positives and false negatives, while keeping sample sizes feasible. It provides a good rule of thumb for designing well-powered experiments to draw reliable conclusions Important Note 1. We don’t have great intuitions about sample size as it relates to power 2. Psychologists often use sample sizes that are far too small 3. Many studies have < 20 observations per group, while studying effects likely to be considerably smaller than sex differences in weight (which required 46 per group) Central Limit Theorem (FYI only) The central limit theorem says that the sampling distribution of the mean will always be normally distributed, as long as the sample size is large enough. Regardless of whether the population has a normal, Poisson, binomial, or any other distribution, the sampling distribution of the mean will be normal. if you draw random samples of size n from a population with any distribution, the distribution of the sample means (the sampling distribution of the mean) will approximate a normal distribution as n increases, with a mean equal to the population mean (μ) and a standard deviation equal to the population standard deviation (σ) divided by the square root of the sample size (√n) Why do we need significance testing and what is their purpose Significance tests can never / prove or disprove theories. They provide probalistic information and at most can corroborate theories. P is the probability of getting our observed result, or a more extreme result if the null hypothesis is true. The p-value measures how likely it is that the observed data could have occurred if the null hypothesis (which typically states there is no effect or difference) is true. A very small p-value (typically less than 0.05) indicates that the observed data is very unlikely to have occurred if the null hypothesis is true. This suggests the null hypothesis should be rejected in favour of the alternative hypothesis. A larger p-value (greater than 0.05) indicates the observed data is reasonably likely to have occurred even if the null hypothesis is true. This means the null hypothesis should not be rejected. A review of statistical inference issues Why is the replication crisis centred on psychology? Gelman (2016) mentions five reasons: 1. Sophistication. Unlike other scientific fields psychology focuses very much on the concepts of validity and reliability and did so before many other fields of study. Because the focus on these key concepts was developed early in psychology it made the field more open to criticism than other fields. 2. Openness. Unlike other fields the psychology one is an open one where the sharing of data is common and therefore it is easier to find mistakes. 3. Overconfidence deriving from research designs. Clean designs lead to overconfidence. 4. Involvement of some prominent academics. Few leading findings and academics dragged into replication crisis therefore whole field is taken with them. 5. The general interest of psychology. More open to the public and their interests. Some routes to replication issues 1. Outright fraud (rare) 2. P-hacking, data dredging, data snooping, fishing expeditions (likely rarer than is commonly believed) for example, an individual going hunting for the smallest p-value in a data set then publishing results from this (p-hacking) 3. "The garden of forking paths" (likely more common than is generally realised) The problem can arise when researchers make a series of decisions during data analysis without pre-specifying an analysis plan. This can inflate the false-positive rate, even in the absence of intentional p-hacking or fishing expeditions. For example, It is well known publishing that there is a correlation between eating breakfast and school performance. You, as a researcher, performed this study however you found no significant relationship for this hypothesis. However, you did find one for maths. You then go on to publish an article about the relationship between eating breakfast and maths performance. ▪ This is overcome by preregistering hypotheses How to fix the problem: pre-registration of hypotheses and methods Objective of confidence interval and population mean A confidence interval is a range of values that describes the uncertainty surrounding an estimate. We indicate a confidence interval by its endpoints; for example, the 95% confidence interval for the number of people, of all ages, in poverty in the United States in 1995 (based on the March 1996 Current Population Survey) is "35,534,124 to 37,315,094." A confidence interval is also itself an estimate. Interpretation We are 95% confident that the population mean is between 35,534,124 to 37,315,094 Advantages of confidence interval They give us a set of values that if they were your null you would have been rejected They tell us about the precision of our estimate Advantages of p values Give a clear indication of evidence against the null hypothesis Make sure you report both Confidence Intervals and Population Means (unlikely that we would know sigma – population SD) Mean + / - margin of error When we know the population SD σ (sigma) When we know σ, the formula for a (1−α) × 100% Confidence Interval for the population mean μ is: This is 95% confidence interval when a =.05 $$M \pm z_{\frac{\alpha}{2}}\frac{\sigma}{\sqrt{n}}$$ M = is the sample mean $z_{\frac{\alpha}{2}}$ = refers to a critical value in the upper tail of the standard normal distribution (what’s remaining in the tail) $\frac{\sigma}{\sqrt{n}}$ = is the population standard deviation σ divided by the square root of the sample size n $\frac{\sigma}{\sqrt{n}}\ $is also called the standard error of the mean α= .05 Look for 95% CI when a is.05 - don’t always assume that distribution is normal and therefore you can’t always use 1.96 critical value Worked example where (sigma) σ is known (really unlikely that we would know sigma – population SD) We need to know the value of z to calculate the 95% You study 100 people who do an IQ test after ingesting MDMA. They score 95.2 on average. You decide* to treat the population standard deviation as a known value σ = 15. $$95.2 \pm 1.96\ \times \frac{15}{\sqrt{100}}$$ How to read a CI: 95% CI (92.26, 98.14) – e.g. confidence interval spans between 92.6 and 98.14) When we don’t know the population SD σ If you’re randomly sampling from a normally distributed population, it can be shown that: $t = \ \frac{M - \mu}{s/\sqrt{n}}\ \sim\ t(n - 1)$ Translated: t is the sample mean M minus the population mean μ, divided by the estimated standard error of the mean, which is the sample standard deviation s divided by the square root of the sample size n. t has a t-distribution with (n − 1) degrees of freedom. t distribution – not normally distributed with n – 1 degrees of freedom. T distribution has fatter tails. E.g sample size is n = 3 therefore df = 2 The CI formula when we don’t know σ When we don’t know σ, the formula for a (1−α) × 100% Confidence Interval for the population mean μ is: $$M \pm t_{\frac{\alpha}{2}}\frac{s}{\sqrt{n}}$$ M is the sample mean $t_{\frac{\alpha}{2}}$refers to a critical value in the upper tail of a t-distribution with degrees of freedom equal to the sample size n minus 1. $\frac{s}{\sqrt{n}}$ is the sample standard deviation s divided by the square root of the sample size n $\frac{s}{\sqrt{n}}\ $is also called the estimated standard error of the mean Research Design Basic vs applied research Basic = principles of behaviour and mental processes e.g leads to concepts such as the attenuation model applied research = direct relevance to the real word e.g. test effect of mobile phones on driving performance Laboratory vs field research Lab = greater controls, connected to basic research) field research = realistically resembles environment we are generalising to, connects to applied research Quantitative Most research and uses data and numbers Qualitative Interviews case studies, narrative and results Experimental research Between-subjects designs (differences between groups and not because of the manipulation, thus random assignment to groups with large sample size, or matching) Within-subjects (all exposed to IV at all levels, like trying all dosages of the drug, concern is sequencing effects and solution is counterbalancing) – the participant is included in all groups and types Single-factor design: one IV, and levels (placebo and treatment(different dosages), t-tests) – one way ANOVA Factorial-designs: more than one IV (need to consider interaction term, ANOVAs Correlational research Correlation research occurs between naturally occurring variables, while experimental research involves manipulation of variables Regression: predicting a variable from other variables Good when cannot be done for practical or ethical reasons, ecological validity Correlation does not equal causation Regression and correlation are linked Regression, ANOVA and T -tests are linear models Fitting Data to Models Data = model (representation of reality) + residual (what we don’t know) Residual = Data – Model Want models to fit data well and want our models that are simple (parsimonious) vs a model that fits the data well but is more complex Assessing fit of the model Examine the residuals (as in the table modelled above) Use a summary measure (such as percentage of variance accounted for, or other more complex measures we’ll discuss this semester) Use a statistical test (null hypothesis tests are still often used in this context) → represent multivariate models in matrix form (Y = Xb + e) What is a matrix b0 = intercept