Inferential Statistics PDF
Document Details
Uploaded by Deleted User
ARU
Dr Jason Hodgson
Tags
Summary
This document provides an overview of inferential statistics, explaining various statistical tests used for data analysis and hypothesis testing.
Full Transcript
Inferential Statistics Dr Jason Hodgson [email protected] Sci 012 Week 10 Learning Outcomes 1. Be able to draw reliable conclusions about samples taken from large populations 2. Be able to compare different populations 3. Be able...
Inferential Statistics Dr Jason Hodgson [email protected] Sci 012 Week 10 Learning Outcomes 1. Be able to draw reliable conclusions about samples taken from large populations 2. Be able to compare different populations 3. Be able to describe and explain when to use various inferential statistical methods 4. Be able to perform calculations involved in t-tests and Chi- Squared tests Inferential Statistics Uses samples to infer properties of the population. A tool for understanding the underlying nature of the phenomena we are interested in Allows us to test hypotheses about the population Allows us to infer the relationships between variables Allows us to make predictions Hypothesis Testing Formalised method for understanding how the world works Compares a hypothesis of interest (a statement about how you think the world might work) to one or more alternative hypotheses. The hypothesis of interest states that there is a given relationship between the independent variable and the dependent variable. The null hypothesis states that there is no relationship between the independent variable and the dependent variable. Relationship between variables Hypothesis Testing Calculates the probability that the observed data are due to chance alone (the null hypothesis): the P-value The researcher determines ahead of time the probability required to reject the null hypothesis: the alpha value Common alpha values are 0.05 and 0.01. These state you will reject the null hypothesis (and therefore support the hypothesis of interest) if the null hypothesis has less than a 5% or 1% probability of being true, respectively. If the P-value is less than the alpha value than the null hypothesis is rejected and the alternative hypothesis is supported Hypothesis Testing Type 1 error (false positive): rejecting the null hypothesis when it is true. Type 2 error (false negative): failure to reject a null hypothesis that is false. The probability of type 1 error depends on the chosen alpha value The ability to avoid type 2 errors (statistical power) depends on the sample size and the effect size Tests of Significance Parametric Tests Parametric tests compare means and variances They should only be used when the data are normally distributed The Kolmogorov Smirnov test can be used to assess whether parametric statistics can be used Parametric tests are more powerful than non-parametric tests Commonly Used Parametric Tests T-tests Used to compare sample means with two groups Dependent variable is interval or rational, independent variable is nominal or ordinal with two groups Associated plot: bar plot, box plot, violin plot ANOVA (Analysis of Variance) Looks at the variation within each sample and compares it with the variance between the samples Particularly useful when comparing multiple variables Dependent variable is interval or rational, independent variable is nominal or ordinal with more than two groups Associated plot: bar plot, box plot, violin plot Commonly Used Parametric Tests Correlation Used to measure the dependence between two variables Dependent and independent variables are interval or rational Associated plot: scatter plot Regression Used to measure the dependence between two or more variables Dependent variable is interval or ratio, independent variables can be of any type Associated plot: scatter plot, box plot Non-Parametric Tests When data cannot satisfy the requirements for analysis using a parametric test, we need to use a non-parametric test In general, non-parametric tests compare medians Rather than comparing the value of the raw data, the tests put the data in ranks and compare the ranks T-Tests Biological systems are complex with many confounding factors all playing a role. Experimental design involves controlling conditions and introducing a controlled independent variable while measuring its effect on a dependent variable We must measure if any difference between the test and the control is significant or due to chance T-tests are used for parametric data to compare one or two samples They test the probability that the samples come from a single population with a single mean value T test tables T values give the significance level of the difference between two means for a given sample size and t-score If the calculated value of t is larger than the value on the table for a chosen significant level, the null hypothesis is rejected Example We wish to compare the masses of two samples of 10 eggs from different chicken species. The null hypothesis is that there is no difference between the masses. A t test on the two samples gives a t score of 2.62, written as t=2.62 The two samples of 10 eggs means 18 degrees of freedom. A chosen significance level of 5% (P=0.05) gives a critical t value of t=2.10 The calculated t score of 2.62 is larger than the critical t value, therefore the difference can be considered as significant. One-tailed Vs Two-tailed One-tailed test used when the hypothesis is explicit about the direction of the effect of the dependent variable on the independent variable Two-tailed used when the hypothesis is not explicit about the direction of the effect One-tailed Vs Two-tailed One-tailed: Experimental drug X increases the amount of white blood cells in the patient. Two-tailed: Experimental drug X changes the amount of white blood cells in the patient (might increase or decrease). One-tailed Vs Two-tailed Under normal distribution, 95% of observations are within 1.96 standard deviations of the mean The remaining 5% are equally divided between the tails of the normal distribution The one-sample t test The one-sample t test compares the mean of a single sample with a fixed value, for example the population mean The t value is the number of standard errors that the sample mean is away from the population mean The paired t test The paired t test compares the means of a variable in the same sample under different conditions or a two different times It is calculated by dividing the mean of the differences between each pair by the standard error of the difference The unpaired t test The unpaired t test compares the means of the same variable in two different samples It is calculated by dividing the difference between the means of the two samples by the standard error of the differences Analysis of Variance ANOVA is used to compare multiple samples Comparing three or more means could be done using multiple t tests, however, when a t test gives a P values of 0.02, there is still a 5% possibility that the null hypothesis should not be rejected When performing multiple t tests, this error is present in all calculations Because ANOVA is a single test, the problems of multiple testing do not apply ANOVA compares the variability between samples with the variability within samples One-way ANOVA – used when we are comparing means from more than two samples Repeated Measures ANOVA – used when there are repeated measurements on the same sampling unit (Paired) The F value The ANOVA test statistic, F, is derived from the ratio of the mean between sample variations and the mean within-sample variations The Chi-Squared Test Chi-squared is a measure of the difference between actual and expected frequencies Used for categorical (nominal) data. E.g. The hypothesis smokers are more likely to have cancer than non-smokers. The frequency of an event is the number of times that it occurs The Expected frequency is the frequency if there is no difference between the sets of results (null hypothesis) Calculating Chi-Squared Calculating the significance level The significance level for Chi-squared depends on the degrees of freedom Df = (number of rows -1)(number of columns -1)