Lecture 1 - Intro to Stats PDF
Document Details
Uploaded by CommodiousApostrophe6548
University of York
Tags
Summary
This lecture introduces the fundamentals of statistics and research design, including different statistical tests and a decision tree to select the correct approach. It also discusses types of variables and measurement scales.
Full Transcript
Research Design and Statistics [RDS] Who am I and what do I do? Name: Tony Morland (call me Tony) Email: [email protected] Research: Structural, functional and chemical properties of the brain underpinning human visio...
Research Design and Statistics [RDS] Who am I and what do I do? Name: Tony Morland (call me Tony) Email: [email protected] Research: Structural, functional and chemical properties of the brain underpinning human vision in health and disease Teaching: MSc Research Design and Statistics, Topics in Cognitive Neuroscience, Project Supervision Learning Outcomes Lectures - Basic Stats theory and application - Understand different research designs - Focus on understanding when and why different methods are applied - Develop the ability to interpret empirical studies (including your own) Practicals - carry out relevant statistical tests using SPSS - interpret empirical data - PLEASE WATCH THE VIDEOS TO HELP YOU DO THE PRACTICALS Overall - To develop the confidence to seek out the best way to analyze your data even if that means using an approach you are not yet familiar with Design and Statistics 1. We have a question or hypothesis about a population 2. Propose a study to gather data 3. The design of the study aims to optimise it to gain the most valuable information about the hypothesis 4. Collect data 5. Use statistics to test the hypothesis base on a model of the data 6. Examine and interpret the results. Learning Framework - We more often than not want to know which statistical approach to use and be secure in the knowledge that the test is appropriate - The way I have found this most straightforward is to have a series of questions that allow a path to be taken through a decision tree - The destination in the decision tree is the appropriate statistical approach to use - The decision tree therefore guides you to the appropriate inferential statistics to use Decision tree - our learning framework 1. What sort of CONTINUOUS (CONT) CATEGORICAL (CAT) measurement? 2. How many predictor TWO variables? ONE TWO (or more) ONE (or more) 3. What type of predictor CONT CAT CONT CAT BOTH CAT CONT CAT CONT BOTH variable? 4. How many levels of MORE THAN categorical predictor? TWO TWO 5. Same (S) or Different (D) participants for each S D S D S D BOTH D D predictor level? independent ANOVA Multiple regression 6. Meets assumptions t-test (independent) measures ANVOA One-way repeated measures ANOVA t-test (dependent) Factorial repeated Factorial ANOVA for parametric tests? Factorial mixed Correlation or Independent Regression YES ANCOVA One-way Pearson ANOVA Logistic Regression Logistic Regression Logistic Regression Log-linear analysis Chi-Squared test Mann-Whitney Kruskal-Wallis Spearman Friedman Willcoxon NO Analogy - Think of statistical tests in our framework as tools - We then have to know what dictates when to use each tool - If we have a nail, let’s make sure we use a hammer - If we have a screw, we’ll use a screwdriver - If we have a bolt, a spanner is best There are two very important questions in the framework that need to be answered and these are what we will cover in this and the next lecture - What type of Outcome Measurement do we have? - How are those Outcomes distributed and whether the distribution allows parametric statistics to be applied? Question: What type of outcome measurement ? 1. What sort of CONTINUOUS (CONT) CATEGORICAL (CAT) measurement? 2. How many predictor TWO variables? ONE TWO (or more) ONE (or more) 3. What type of predictor CONT CAT CONT CAT BOTH CAT CONT CAT CONT BOTH variable? 4. How many levels of MORE THAN categorical predictor? TWO TWO 5. Same (S) or Different (D) participants for each S D S D S D BOTH D D predictor level? independent ANOVA Multiple regression 6. Meets assumptions t-test (independent) measures ANVOA One-way repeated measures ANOVA t-test (dependent) Factorial repeated Factorial ANOVA for parametric tests? Factorial mixed Correlation or Independent Regression YES ANCOVA One-way Pearson ANOVA Logistic Regression Logistic Regression Logistic Regression Log-linear analysis Chi-Squared test Mann-Whitney Kruskal-Wallis Spearman Friedman Willcoxon NO Measuring and measurements - We start with a question (hypothesis) - We measure answers - outcomes - that inform on our question - We can obtain multiple outcomes from the same people - We can obtain outcomes under different conditions, groups or both We specify what we measure and under what conditions we measure them in the design of the experiment or study (we will come back to design later) Types of outcomes we measure - Ratio - Interval - Ordinal - Nominal Continuous variables Continuous: there is an infinite number of possible values these variables can take on- entities get a distinct score ○ Interval variable: Equal intervals on the variable represent equal differences in the property being measured e.g. the difference between 600ms and 800ms is equivalent to the difference between 1300ms and 1500ms. ○ Ratio variable: The same as an interval variable and also has a clear definition of 0.0. E.g. Participant height or weight Note, here we are using the term ‘variables’ because both outcomes and predictors are variables and as we will see it is not only the type of outcome but also the type of predictor variable that influences our choice of statistical test Categorical Variables A variable that cannot take on all values within the limits of the variable - entities are divided into distinct categories Nominal variable: There are two or more categories e.g. whether someone is an omnivore, vegetarian, vegan, or fruitarian. Ordinal variable: categories have a logical, incremental order e.g. whether people got a fail, a pass, a merit or a distinction in their exam e.g. a Likert Scale* *often outcomes measured on a likert scale are treated as continuous after inspection of the distribution of the data and some may argue that the divisions on the scale are equal Decision tree - our learning framework 1. What sort of CONTINUOUS (CONT) CATEGORICAL (CAT) measurement? 2. How many predictor TWO variables? ONE TWO (or more) ONE (or more) 3. What type of predictor CONT CAT CONT CAT BOTH CAT CONT CAT CONT BOTH variable? 4. How many levels of MORE THAN categorical predictor? TWO TWO 5. Same (S) or Different (D) participants for each S D S D S D BOTH D D predictor level? independent ANOVA Multiple regression 6. Meets assumptions t-test (independent) measures ANVOA One-way repeated measures ANOVA t-test (dependent) Factorial repeated Factorial ANOVA for parametric tests? Factorial mixed Correlation or Independent Regression YES ANCOVA One-way Pearson ANOVA Logistic Regression Logistic Regression Logistic Regression Log-linear analysis Chi-Squared test Mann-Whitney Kruskal-Wallis Spearman Friedman Willcoxon NO A thing about Outcomes: Measurement error - The discrepancy between the actual value we’re trying to measure, and the number we use to represent that value. - The values have to have the same meaning over time and across situations. VALIDITY (instrument measures what it set out to measure) RELIABILITY (ability of the measure to produce the same results under the same conditions) - Test-Retest Reliability (ability of a measure to produce consistent results when the same entities are tested at two different points in time) Types of variation Systematic Variation - Differences in performance created by a specific experimental manipulation. ‘Unsystematic’ Variation - Differences in performance created by unknown factors. - Age*, Gender*, IQ*, Time of day*, Measurement error etc. Randomization (and other approaches) - Minimizes unsystematic variation. *these differences could be controlled for, of course Nomenclature of variables in design Independent Variable (factors) ○ The hypothesised cause ○ A predictor variable ○ A manipulated variable (in experiments) Dependent Variable (measures) ○ The proposed effect ○ An outcome variable ○ Measured not manipulated (in experiments) In the textbook and here we will largely use the predictor/outcome terms, because it is useful to think in those terms Inferential statistics: Null Hypothesis Significance Testing Two hypotheses: (1) The null hypothesis is that there is no effect of the predictor variable on the outcome variable (2) The alternative hypothesis is that this is an effect of the predictor variable on the outcome variable Null Hypothesis Significance Testing computes the probability of the null hypothesis being true - THIS IS REFERRED TO AS THE P-VALUE Chapter 2 in Field 5th Edition Null Hypothesis Significance Testing: An Example Two hypotheses: (1) The null hypothesis is that there is no effect of the group (predictor variable) on the outcome variable (2) The alternative hypothesis is that this is an effect of the group on the outcome variable Null Hypothesis Significance Testing computes the probability of the null hypothesis being true by computing a statistic and how likely it is that the statistic has that value by chance alone Chapter 2 in Field 5th Edition Directional vs non-directional hypotheses Two hypotheses: (1) The null hypothesis is that there is no effect of the group (predictor variable) on the outcome variable (2) The alternative hypothesis is that this is an effect of the group on the outcome variable OR (3) The alternative hypothesis is that this the mean of the outcome variable for group 1 is larger than the mean of group 2 A pause to think about how the statistics can fail you Well, not really fail you, but how they aren’t perfect (just like us) Issues with Null Hypothesis Significance Testing (NHST) It’s all about pitfalls of interpretation (see Field Chapter 3) - Misconception 1: A significant result means that the effect is important - Misconception 2: A non-significant result means that the null hypothesis is true - Misconception 3: A significant result means that the null hypothesis is false Issues with Null Hypothesis Significance Testing (NHST) The problem of All-or-nothing thinking (see Field Chapter 3) a. The evidence is equivocal; we need more research. b. All the mean differences show a positive effect of antiSTATic; therefore, we have consistent evidence that antiSTATic works. c. Four of the studies show a significant result (p < 0.05), but the other six do not. Therefore, the studies are inconclusive: some suggest that antiSTATic is better than placebo, but others suggest there’s no difference. The fact that more than half of the studies showed no significant effect means that antiSTATic is not (on balance) more successful in reducing anxiety than the control. d. I want to go for C, but I have a feeling it’s a trick question. Issues with Null Hypothesis Significance Testing (NHST) p-Hacking and HARKING (see Field Chapter 3) These are researcher degrees of freedom - choices made after the results are in and some analysis has been done p-hacking refers to a selective reporting of significant results Harking is Hypothesising After the Results are Known P-hacking and HARKING and often used in combination All is not lost: EMBERS (as Field puts it in Chapter 3) Effect sizes Meta-analysis Bayesian Estimation Registration Sense EMBERS (as Field puts it in Chapter 3) E is for Effect sizes There a quite a few measures of effect size Get used to using them and understanding how studies can be compared on the basis of effect size A brief example: Cohen’s d EMBERS (as Field puts it in Chapter 3) M is for Meta Analysis Bringing together multiple studies to get a more realistic idea of the effect Can assess effect sizes Rules of thumb for effect sizes EMBERS (as Field puts it in Chapter 3) M is for Meta Analysis Funnel plots - value studies by their sample size and observe bias EMBERS (as Field puts it in Chapter 3) B is for Bayes Bayesian approaches capture probabilities of the data given the hypothesis and null hypothesis The Bayes factor is now quite often computed and stated alongside conventional NHST analysis (and effect sizes) Science is moving in the right direction EMBeRS (as Field puts it in Chapter 3) R is for Registration Telling people what you are doing before you do it Tell people how you intend to analyze the data Largely limits researcher degrees of freedom (HARKING p-hacking) A peer reviewed registered study can be published whatever the outcome The scientific record is therefore less biased to positive findings EMBeRS (as Field puts it in Chapter 3) S is for Sense Knowing what you have done in the context of NHST Understanding the outcomes Adopting measures to reduce researcher degrees of freedom End of Part 1 Part 2 to come after the break The Descriptive Statistics of Outcomes How are the data distributed and how can we assess the distribution? Decision tree - our learning framework 1. What sort of CONTINUOUS (CONT) CATEGORICAL (CAT) measurement? 2. How many predictor TWO variables? ONE TWO (or more) ONE (or more) 3. What type of predictor CONT CAT CONT CAT BOTH CAT CONT CAT CONT BOTH variable? 4. How many levels of MORE THAN categorical predictor? TWO TWO 5. Same (S) or Different (D) participants for each S D S D S D BOTH D D predictor level? independent ANOVA Multiple regression 6. Meets assumptions t-test (independent) measures ANVOA One-way repeated measures ANOVA t-test (dependent) Factorial repeated Factorial ANOVA for parametric tests? Factorial mixed Correlation or Independent Regression YES ANCOVA One-way Pearson ANOVA Logistic Regression Logistic Regression Logistic Regression Log-linear analysis Chi-Squared test Mann-Whitney Kruskal-Wallis Spearman Friedman Willcoxon NO What distribution is needed for parametric tests? The Normal Distribution The ‘Bell Curve’ Symmetrical Two parameters Mean (central tendency) Standard Deviation (dispersion) Let’s take a look at the parameters 🡽 Defined by two parameters: the mean (μ) and the standard deviation (σ). 🡽 Many statistical tests (parametric) cannot be used if the data are not normally distributed Central tendency 🡽 Mean 🡽 The sum of scores divided by the number of scores. 🡽 The value from which the (squared) scores deviate least (it has the least error). 🡽 The mean is a good measure of central tendency for roughly symmetric distributions but can be misleading in skewed distributions since it can be greatly influenced by scores in the tail. Central tendency 🡽 Median 🡽 the middle score when scores are ordered. 🡽 the middle of a distribution: half the scores are above the median and half are below the median. 🡽 Relatively unaffected by extreme scores or skewed distribution and can be used with ordinal, interval and ratio data. Central tendency 🡽 Mode 🡽 most frequently occurring score in a distribution, a score that actually occurred 🡽 it is the only measure of central tendency that can be used with nominal data 🡽 is greatly subject to sample fluctuations and is therefore not recommended to be used as the only measure of central tendency 🡽 many distributions have more than one mode Symmetry and Skew For symmetrical distributions, the mean, median and mode are identical For a positive skewed distribution, usually the mean is greater than the median, which is greater than the mode For a negative skewed distribution, usually the mode is greater than the median, which is greater than the mean Shape - Kurtosis Kurtosis means bulge or bend in greek Lepto is a prefix meaning thin Platy is a prefix meaning flat or wide (think Plateau) Decision tree - our learning framework 1. What sort of CONTINUOUS (CONT) CATEGORICAL (CAT) measurement? 2. How many predictor TWO variables? ONE TWO (or more) ONE (or more) 3. What type of predictor CONT CAT CONT CAT BOTH CAT CONT CAT CONT BOTH variable? 4. How many levels of MORE THAN categorical predictor? TWO TWO 5. Same (S) or Different (D) participants for each S D S D S D BOTH D D predictor level? independent ANOVA Multiple regression 6. Meets assumptions t-test (independent) measures ANVOA One-way repeated measures ANOVA t-test (dependent) Factorial repeated Factorial ANOVA for parametric tests? Factorial mixed Correlation or Independent Regression YES ANCOVA One-way Pearson ANOVA Logistic Regression Logistic Regression Logistic Regression Log-linear analysis Chi-Squared test Mann-Whitney Kruskal-Wallis Spearman Friedman Willcoxon NO How do we know what is normal Section 6.10 in Field (5th ed). Kolmogorov-Smirnov test Shapiro-Wilks test (Useful, but to be used with caution) Plot your data because this helps inform on what decisions you want to make with respect to normality. Field provides quite a balance account of the pros and cons to using graphs and numbers to evaluate normality in Chapter 6, section 6.10. What we have done 1. Looked at how design and inferential statistics play a role in hypothesis testing 2. How a framework can be used to determine the appropriate statistical approach to use 3. Understood that outcomes can come in different types and those types determine how we can apply statistical analyses [referring to the framework] 4. Understood the logic of Null Hypothesis Significance Testing 5. Understood some of the pitfalls of NHST and how they can be overcome 6. Understood how the normality of outcomes allows parametric statistics to be used 7. Understood that descriptive statistics, and some tests, allow the normality of outcomes to be assessed Practical Week 1: Descriptive Statistics - Getting to know the SPSS environment and how to enter data appropriately - Learning how to compute descriptive statistics - Learning how to plot data - Learning how to test assumptions for parametric tests What we will do next week More flesh on the bones of descriptive statistics Greater understanding of hypothesis testing Some more on how to graph and display data effectively