L15 Chi-Square PDF
Document Details
Uploaded by Deleted User
Tags
Summary
This document provides lecture notes on chi-square tests, focusing on statistical analysis. It includes practice questions on mitosis and meiosis, and also discusses Mendel's monohybrid crosses with examples related to observed and expected ratios.
Full Transcript
Lecture 15: Chi Square Additional readings: On LEA Practice questions L10-11 – Mitosis & Meiosis question1: diploids (2n) question 2: sister chromatids question 3: in meiosis, sister chromatids are separated in anaphase 1 + the mitosis with tetrads + homologous chromosomes...
Lecture 15: Chi Square Additional readings: On LEA Practice questions L10-11 – Mitosis & Meiosis question1: diploids (2n) question 2: sister chromatids question 3: in meiosis, sister chromatids are separated in anaphase 1 + the mitosis with tetrads + homologous chromosomes and phase 2 in meiosis question 4: interphase question 5: metaphase 1 question 6: diploid (2n) question 7: 2²³ question 8: stressful, changing, not adapted question 9: Observed frequencies in Mendel’s monohybrid crosses The observed ratios are close to 3:1 but not exactly 3:1 Why is this? If you flip a coin 100 times, it is unlikely to observe exactly 50/50 heads/tails Data always includes variation due to chance Are Mendel’s observed ratios close enough? How do we interpret this variability? Statistical tests Statistical tests can help us to interpret data, by giving us the probability of observing specific data, if our hypothesis is true We compare our observed data to our expected data (based on our hypothesis), and ask what the probability is that any deviation from expectations is due to chance alone E.g., If the gene for flower colour really is governed by complete dominance (3:1), what is the probability of observing 705 purple : 224 white (3.15 : 1)? If the probability is high enough (subjective), then our data supports our hypothesis, and we interpret the deviation do to chance alone However, if the probability doesn't pass our critical probability, then there must be another explanation besides chance that produces the deviations in our data. Statistical tests the Chi Squared test will tell you if the difference between your observed data and your expected data is due to chance alone or if we require another explanation Let’s take Mendel’s pod colour as an example Mendel observed a total of 580 plants (428+152= 580) For a 3:1 ratio, we would expect 435 : 145 Observed Expected Green 428 435 Yellow 152 145 What is the probability that these deviations are due to chance alone? We must formulate this as a HYPOTHESIS Hypothesis testing Statistical tests are often comparing two groups of data (either data from two treatments in an experiment, two populations, or in this case observed data against expected data) When you do statistical tests, you always have two competing hypotheses: 1. Null hypothesis (H0) 2. Alternative hypothesis (HA) These hypotheses are the possible outcomes of your test Hypothesis testing H0: The null hypothesis There is no statistically significant difference between the two groups of data It is probable that any deviations between the two groups is due to chance alone This is the hypothesis that is actually tested We will either reject, or fail to reject Ho HA: The Alternate Hypothesis There is a statistically significant difference between the two groups It is improbable that chance alone can explain the deviations, therefore another explanation is required Rejection of Ho never “proves” Ha You never “accept” the alternative hypothesis. Instead we say that our data supports Ha Hypothesis testing In science it's actually almost impossible to prove anything true We can however, prove things to be false! There are many potential hypotheses that could describe a phenomenon. As scientists, we try to eliminate competing hypotheses by proving them false, and the remaining hypothesis survives (for now) as the most likely explanation. For example, to test if there is a difference between two groups, we start with the Null hypothesis. If we reject our null hypothesis, then our data supports the alternative hypothesis. For this reason, Ho and Ha must be mutually exclusive, and cover all possible outcomes of our statistical test (e.g., the groups of data are either statistically different from each other, or they aren’t). Hypothesis testing A statistical test is used to calculate a Test statistic Used to determine whether to reject or fail to reject H 0 We ask, what is the probability of obtaining a certain value of our test statistic the Chi Squared statistical test Many statistical tests have been developed to be applied to different kinds of data sets Chi Squared test is used for frequency data (discrete counts) We can use this test to determine if the data fits the predicted theoretical outcome Chi2 Test statistic Chi 2 – “Goodness of fit” the Chi Squared test statistic is calculated as the the squared difference between your observed and expected data divided by your expected data, summed across all groups X2 = Σ (observed – expected)2 expected The larger the difference between the observed and expected data the larger the X2 value X2 = 0 perfect fit (no difference) X2 is large – poor fit (there is likely a difference) Why squared? (we are only interested in the magnitude of the difference) Dividing by E standardizes the deviation (as proportion of E) How large should X2 be before we reject Ho ? Statistical Significance How large should X2 be before we reject Ho ? All of hypothesis testing is based on the probabilities of certain phenomenon being true based on our observations When we say that something is “statistically significant”, it means it has passed our threshold of probability (significance level) The convention is to use 95% confidence. This means that we reject our Ho when we are 95% sure that there is a difference The significance level or alpha level is the probability of making the wrong decision to reject H0 when the null hypothesis is true (α = 0.05). Statistical Significance How large should X2 be before we reject Ho ? We compare our calculated X2 to a theoretical distribution of X2 test statistics which tells us the probability of obtaining our X2 Compare X2 calc. to X2 crit. at a given significance level (α = 0.05) X2 calc. ≥ X2 crit. Reject H0 X2 calc. < X2 crit. Fail to reject H0 When you observe a X2 calc larger than the X2 crit. , your probability of rejecting a null hypothesis when it is actually true is less that 5% In addition to comparing X2 calc. to X2 crit., we can obtain the actually probability (p value) of obtaining our results by chance alone p ≤ 0.05 Reject H0 p > 0.05 Fail to reject Ho Degrees of Freedom Statistical term -> Indicates how many of your values have the freedom to vary within your calculation A somewhat complicated concept that you will revisit in statistics, that we don’t need to learn in the context of this course For Chi2, degrees of freedom is equal to n-1 where n is the total number of categories you have D.F. = (# rows) - 1 Observed Expected Green 428 435 Probability Yellow 152 145 The probability of obtaining a given X2 changes depending on the degrees of freedom x2 Coin example You toss a coin 100 times and want to determine statistically if the coin is fair (Maybe it is unevenly weighted which increases the probability of heads/tails) Step1: Specify the statistical hypotheses H0: The coin is fair P(Heads) = P(Tails) = 0.5 HA: The coin is not fair P(Heads) ≠ P(Tails) Coin example Step 2: Set up a X2 table & calculate X2 Outcome Obs. Num. Exp. Num. Σ(O-E)2/E Heads 60 50 (60-50)2/50=2 Tails 40 50 (40-50)2/50=2 X2 = 4 Step 3: Determine the appropriate degrees of freedom df = (no. of rows-1) = 2-1= 1 Coin example Step 4A: Determine the critical X2 from a known distribution significance probability α = 0.05 Coin example Step 4A: Determine the critical X2 from a known distribution significance probability α = 0.05 Step 4B: Determine the probability (p value) associated with your calculated X2 (4) 0.05 > p >0.02 Coin example Step 5: Revisit your hypotheses If X2 calc. > X2 crit., reject Ho 4 > 3.841 Therefore, we reject Ho, meaning that the observed coin tosses DO differ significantly than the expected 50:50 heads-tails (p