Introduction to Analysis of Variance (ANOVA) - Statistics Notes
Document Details
![FortunateFermium](https://quizgecko.com/images/avatars/avatar-13.webp)
Uploaded by FortunateFermium
University of Santo Tomas
Tags
Summary
This presentation introduces Analysis of Variance (ANOVA), a statistical method used to evaluate mean differences between two or more treatments or populations. It covers key concepts such as the F-ratio, hypothesis testing, and different types of designs. ANOVA is a powerful tool used for analyzing variance and making comparisons in statistical analysis.
Full Transcript
INTRODUCTION TO ANALYSIS OF VARIANCE OVERVIEW OF ANALYSIS OF VARIANCE Analysis of variance (ANOVA) is an inferential hypothesis-testing procedure used to evaluate mean differences between two or more treatments or populations. OVERVIEW OF ANALYSIS OF VARIANCE ANOVA and t tests both...
INTRODUCTION TO ANALYSIS OF VARIANCE OVERVIEW OF ANALYSIS OF VARIANCE Analysis of variance (ANOVA) is an inferential hypothesis-testing procedure used to evaluate mean differences between two or more treatments or populations. OVERVIEW OF ANALYSIS OF VARIANCE ANOVA and t tests both use sample data to test hypotheses about population means, but t tests are limited to comparing only two treatments. ANOVA can compare two or more at once, and thus can provide more flexibility in designing experiments and interpreting results. OVERVIEW OF ANALYSIS OF VARIANCE The goal of ANOVA is to determine whether or not the mean differences observed among the samples provide enough evidence to conclude that there are mean differences among the populations (i.e., if we should reject the H0). OVERVIEW OF ANALYSIS OF VARIANCE Terminology in ANOVA Factor: A variable that designates the groups being compared. ⚬ Independent variable: A variable manipulated to create the treatment conditions in an experiment. ⚬ Quasi-independent variable: A nonmanipulated variable used to designate groups. Terminology in ANOVA Levels: The individual groups or treatment conditions that make up a factor. Example: ⚬ Factor 1: Therapy technique ⚬ Factor 2: Time ■ Levels: Each group is tested at three different times (repeated measures). Terminology in ANOVA Single-factor design: A design that uses only one independent (or quasi-independent) variable. Independent-measures design: A design that uses separate groups of participants for each treatment condition. Two-factor design (Factorial design): A design that combines two different factors. Statistical Hypotheses for ANOVA The null hypothesis (H0) states that there is no treatment effect; and all population means are the same. The alternative hypothesis (H1) states that there is a treatment effect, and there is a real, significant difference between population means. (Any difference between any two of the populations under study can be an alternative.) OR Type I Errors and Multiple-Hypothesis Tests Although t tests are already used to compare mean differences, ANOVA is advantageous for comparing multiple mean differences at once, to avoid the risk of committing Type I error. This risk increases with the number of tests used to compare two of the multiple treatments being studied. Type I Errors and Multiple-Hypothesis Tests The testwise alpha level is the alpha level you select for each individual hypothesis test. The experimentwise alpha level is the total probability of a Type I error accumulated from all of the separate tests in the experiment. As the number of tests increases, so does the experimentwise alpha level. Type I Errors and Multiple-Hypothesis Tests Example: An experiment uses 3 treatments. You would need 3 separate t tests to compare all the mean differences. If all tests use α =.05, then each test has a 5% risk of a Type I error. These accumulate to produce an inflated experimentwise alpha level. ANOVA can compare them all at once & avoid this. THE LOGIC OF ANALYSIS OF VARIANCE The first step in ANOVA is to determine the total variability for the entire data set, by combining all the scores from separate samples to obtain a general measure of variability. The next step is to break down or analyze the components of the total variability. THE LOGIC OF ANALYSIS OF VARIANCE ANOVA breaks down the total variability into two basic components: between-treatments variance and within-treatments variance. Between- and Within-Treatments Variance Between-treatments variance is a measure of the overall difference between treatment conditions. These differences may be caused by sampling error (unsystematic error), or by treatment effects. Between- and Within-Treatments Variance Within-treatment variance measures variations of scores within each treatment condition. These differences are random and unsystematic, and occur when there are no treatment effects causing the scores to be different. Between- and Within-Treatments Variance Example: In the table Hypothetical data from an experiment examining driving below, scores in the no- performance under three telephone conditions. phone condition are much higher (M = 4) than those in the hand-held condition (M = 1); this indicates there is variance between treatments. There is also variance within treatments, since not all the scores in the no-phone condition are equal. The F-Ratio: The Test Statistic for ANOVA Recall the structure for the t statistic: The test statistic for ANOVA is very similar to the t statistics previously discussed, but ANOVA uses variance to accurately define and measure differences among two or more sample means at once. This test statistic is called the F-ratio. The F-Ratio: The Test Statistic for The F-ratio ANOVA compares the between-treatments and within-treatments variances. It uses this structure for independent-measures ANOVA: The F-Ratio: The Test Statistic for ANOVA A large value for the F-ratio indicates that the sample mean differences (numerator) are larger than would be expected if there were no treatment effects (denominator). The F-Ratio: The Test Statistic for If there are no ANOVA treatment effects, the differences between treatments are entirely caused by random factors. Thus, both the numerator and denominator both measure random differences and should be roughly the same size, so the F-ratio should be around 1.00. The F-Ratio: The Test Statistic for ANOVA When there is a treatment effect, the combination of systematic and random differences in the numerator should be larger than the random differences alone in the denominator, meaning the F-ratio should be larger than 1.00. The F-Ratio: The Test Statistic for Thus, the H0 ANOVA for F-ratio computation would state that there is no evidence of a treatment effect, and the F- ratio is near 1.00. In contrast, the H1 would state that there is a treatment effect, and the resulting F-ratio is large in value. The F-Ratio: The Test Statistic for Because the ANOVA denominator of the F-ratio measures only random and unsystematic variability, it is called the error term. The error term provides a measure of the variance caused by random, unsystematic differences. ANOVA NOTATION AND FORMULAS Variables in ANOVA are denoted differently from the notations that have been discussed so far: k = Number of treatment conditions/levels of a factor n = Number of scores in each treatment N = Total number of scores in the entire study. N = kn. ANOVA NOTATION AND FORMULAS T = Treatment total; the sum of total scores for each treatment condition. T = ΣX. G = Grand total; the sum of all scores in the study. SS = sum of squares M = mean ANOVA NOTATION AND FORMULAS ANOVA NOTATION AND FORMULAS ANOVA makes use of many calculations and formulas: F-ratio: Sample variance (s2): ANOVA NOTATION AND FORMULAS We need to compute the between- and within- treatments variances. We also need to compute the SS and df for the variances, and for the total study. The entire process of ANOVA requires nine calculations. ANOVA NOTATION AND FORMULAS This table shows the structure and sequence of calculations for the ANOVA. Analysis of Sum of Squares (SS) The ANOVA requires that we first compute a total SS and then partition this value into two components: between treatments and within treatments. Analysis of Sum of Squares (SS) The total sum of squares (SStotal) is the sum of squares for the entire set of N scores. The within-treatments sum of squares (SSwithin) is the sum of squares inside each treatment. Analysis of Sum of Squares (SS) The between-treatments sum of squares (SSbetween) is found simply by subtracting SSwithin from SStotal. It can also be obtained with the computational formula which uses the treatment totals (T) instead of treatment means. OR Analysis of Degrees of Freedom (df) The analysis of degrees of freedom (df) follows the same pattern as the analysis of SS: It also involves finding the df for the total set of N scores, then partitioning this value into df between treatments and df within treatments. Analysis of Degrees of Freedom (df) The total df (dftotal) is the df for the entire set of N scores. Analysis of Degrees of Freedom (df) The within-treatments df (dfwithin) is computed by adding the number scores in each treatment and subtracting 1 from each n. To simplify, subtract k (1 * the number of treatments) from N (sum of all n scores). Analysis of Degrees of Freedom (df) The between-treatments degrees of freedom (dfbetween) is obtained by simply counting the number of treatments (k) and subtracting 1. Analysis of SS and df: Reminders Keep in mind the meaning of the labels (subscripts) used for each value: Total: The entire set of scores, or all scores across all treatment conditions. Within: Differences in each separate condition. Between: Differences between conditions. Calculation of Variances (MS) and the F-Ratio In ANOVA, the term mean square (aka mean of squared deviations, or MS) is used in place of the term variance. Calculation of Variances (MS) and the F-Ratio The formulas from earlier are still the same, but simply use MS to refer to between- or within-treatments variance. The F-ratio simply compares MSbetween and MSwithin. Calculation of Variances (MS) and the F-Ratio An ANOVA summary table organizes results of the analysis in one table. This is no longer used in published reports, but is still a concise method for presenting your ANOVA results. HYPOTHESIS TESTING & EFFECT SIZE WITH ANOVA If the null hypothesis is false, the F-ratio should be greater than 1.00. The problem now is to define precisely which values are “around 1.00” and which are “much greater than 1.00.” To answer this, we need to examine the distribution of F-ratios. Distribution of F-Ratios The distribution of F-ratios The distribution of F-ratios with df = 2, 12. Of all the values is cut off at 0 (since all in the distribution, only 5% are larger than F = 3.88 and only variances are positive), and 1% are larger than F = 6.93. then tapers off to the right. The shape of the distribution depends on the df of the two MS values: the larger the df, the more accurate estimate of the population variance, and the more clustered the F-ratios will be at 1.00. The F Distribution Table The F distribution table, much like the t distribution table, shows the critical values for F. To use the table, you must know the df values for the F-ratio (numerator & denominator) and the alpha level for the hypothesis test. The top of the table indicates the df values for the numerator, and the denominator df values are on the leftmost column. The F Distribution Table Example: The numerator of the F-ratio (between treatments) has df = 2, and the denominator (within treatments) has df = 12. This F-ratio is thus said to have df = 2, 12. On the F table, the critical F value in regular text is the critical value for α =.05, and the one in bold is the critical value for α =.01. The F Distribution Table Example of Hypothesis Testing & Effect Size Hypothesis testing with ANOVA follows the same 4-step procedure, but with the computations we just discussed. Example: A study compared 3 study strategies: rereading the material to be tested, answering comprehension questions, and generating & answering one’s own questions. Which strategy can improve exam performance? Example of Hypothesis Testing & Effect Size Step 0: Compute for the summary statistics. This is not a necessary step to hypothesis testing in general, but it simplifies the succeeding ANOVA computations. Example of Hypothesis Testing & Effect Size Step 1: State the hypotheses and select an alpha level. For this test, we will use α =.05. (There is no H0: μ1 = μ2 = μ3 treatment effect.) H1: At least one of the (There is evidence of a treatment effect.) treatment means is different. Example of Hypothesis Testing & Effect Size Step 2: Locate the critical region. We first must determine degrees of freedom for Ms between and MSwithin (numerator & denominator), to perform the analysis of df. dftotal = N – 1 = 18 – 1 = 17 dfbetween = k – 1 = 3 – 1 = 2 dfwithin = Σdfinside each treatment = 5 + 5 + 5 = 15 Example of Hypothesis Testing & Effect Size Step 2: Locate the critical region. The distributions of F- ratios with df = 2, 15 is graphed below. The critical F value for α =.05 is 3.68. Example of Hypothesis Testing & Effect Size Step 3: Compute the F-ratio. 1. Analyze the SS to obtain SSbetween and SSwithin. 2. Use the SS values and df values to calculate the two variances, MSbetween and MSwithin. 3. Use the two MS values to compute the F-ratio. Example of Hypothesis Testing & Effect Size Step 3.1: Analyze the SS. SStotal is the SS for the total set of N = 18 scores. SSwithin combines the SS values from each treatment condition. Example of Hypothesis Testing & Effect Size Step 3.1: Analyze the SS. SSbetween is solved for by subtracting SSwithin from SStotal. Example of Hypothesis Testing & Effect Size Step 3.2: Calculate the mean squares. Example of Hypothesis Testing & Effect Size Step 3.3: Calculate the F-ratio. Example of Hypothesis Testing & Effect Size Step 4: Make a decision. The F value we obtained, F = 7.16, is in the critical region (refer to graph in Step 2). It is very unlikely (p <.05) that we would obtain a value this large if H0 is true. Therefore, we reject H0 and conclude that there is a significant treatment effect. Example of Hypothesis Testing & Effect Size Step 4: Make a decision. All of the components of the analysis (SS, df, MS, F) can be presented together in one summary table. Measuring Effect Size for ANOVA For ANOVA, the simplest and most direct way to measure effect size is to compute the percentage of variance accounted for (η2). This, like r2 for t tests, measures how much of the variability in the scores is accounted for by the differences between treatments. Measuring Effect Size for ANOVA For the example earlier (comparing the 3 study techniques), the effect size is computed as follows: Reporting the Results of ANOVA (APA) “The means and standard deviations are presented in Table 1. The analysis of variance indicates that there are significant differences among the three strategies for studying, F(2, 15) = 7.16, p <.05 [or p =.007], η2 = 0.488.” An Example with Unequal Sample Sizes Generally, ANOVA is most accurate for examining data with equal sample sizes. However, there are circumstances in which it is impossible or impractical to have an equal number of subjects in every treatment condition. In these cases, ANOVA is still valid, especially if the samples are relatively large and when the discrepancy between sample sizes is not extreme. An Example with Unequal Sample Sizes Example: A study investigates the amount of homework required by different academic majors. Biology, English, and Psychology. The researcher randomly selects one course that each student is currently taking and asks the student to record the amount of out-of-class work required each week for the course. The researcher used all of the volunteer participants, which resulted in unequal sample sizes. An Example with Unequal Sample Sizes Step 0: Compute for the summary statistics. Example of Hypothesis Testing & Effect Size Step 1: State the hypotheses and select an alpha level. We will use α =.05. (There is no H0: μ1 = μ2 = μ3 treatment effect.) H1: At least one population (There is evidence of a treatment effect.) is different. Example of Hypothesis Testing & Effect Size Step 2: Locate the critical region. The F-ratio for these data has df = 2, 17. With α =.05, the critical value is 3.59. Example of Hypothesis Testing & Effect Size Step 3.1: Analyze the SS. Example of Hypothesis Testing & Effect Size Step 3.2-3: Calculate the mean squares and the F-ratio. Example of Hypothesis Testing & Effect Size Step 4: Make a decision. Because the obtained F-ratio is not in the critical region, we fail to reject the null hypothesis and conclude that there are no significant differences among the three populations of students in terms of the average amount of homework each week. Assumptions for the Independent-Measures ANOVA The independent-measures ANOVA requires the same three assumptions for the independent-measures t hypothesis test: 1. Observations within each sample must be independent. 2. The populations from which the samples are selected must be normal. 3. There must be homogeneity of variance. POST HOC TESTS A significant F-ratio (reject H0) simply indicates that there is at least one statistically significant mean difference, but it does not tell exactly which means are significantly different and which are not. Post-hoc tests can resolve this problem. POST HOC TESTS Post-hoc tests (or posttests) are additional hypothesis tests that are done after an ANOVA to determine exactly which mean differences are significant and which are not. These are done if you reject the H0 and there are 3 or more treatments (k ≥ 3). Posttests and Type I Errors Post hoc tests allow you to go back through your data and make pairwise comparisons, in which you compare individual treatments two at a time. As established earlier, conducting separate hypothesis tests for each pair of treatments can result in an inflated overall experimentwise alpha level. The 2 following tests are methods of controlling Type I error in post hoc tests. Tukey’s Honestly Significant Difference (HSD) Test Tukey’s HSD test is commonly used in psychological research. This test computes the honestly significant difference (HSD), or the minimum difference between treatment means that is necessary for significance. If the mean difference exceeds the HSD, you conclude that there is a significant difference. Tukey’s Honestly Significant Difference (HSD) Test The formula for Tukey’s HSD test is as follows: Where q = Studentized range statistic* MSwithin = within-treatments mean square n = number of scores * To get the value of q, you must know the values of k, dfwithin (df for error term), and the alpha level, and find the critical value on the q table. Tukey’s Honestly Significant Difference (HSD) Test Example: Refer to the summary statistics below. Using this data, you’ll find that q = 3.67, and the HSD is: Tukey’s Honestly Significant Difference (HSD) Test Example (cont.): The mean difference between any two samples must be at least 3.63 to be significant. Treatment A is significantly different from treatment B (MA – MB = 4.00). Treatment A is significantly different from treatment C (MA – MC = 5.00). Treatment B is not significantly different from treatment C (MB – MC = 1.00). The Scheffè Test The Scheffé test is one of the post hoc tests with the smallest risk of a Type I error. It uses F-ratio to evaluate the significance of difference between two treatments. The numerator of the F-ratio (MSbetween) is calculated using only the two treatments you want to compare. The denominator is the same MSwithin that was used for the overall ANOVA. The Scheffè Test The “safety factor” for the Scheffé test comes from the following two considerations: Although you are comparing only two treatments, the Scheffé test uses the value of k from the original experiment to compute dfbetween (k – 1). The critical value for the Scheffé F-ratio is the same one used to evaluate the F-ratio for the overall ANOVA. The Scheffè Test Example: Refer to the summary statistics below. Start with the smallest mean difference, which compares treatment B (T=54, n=6) and treatment C (T=60, n=6). The Scheffè Test First, compute for SSbetween. Then find MSbetween and F. G = 54 + 60 = 114 dfbetween = 3 – 1 = 2 N = 6 + 6 = 12 The Scheffè Test With df = 2, 15 and α =.05, the critical value for F is 3.68. Therefore, our obtained F-ratio is not in the critical region, and we conclude that there is no significant difference between treatment B and treatment C. Repeat the same procedure for the 2nd largest mean difference (treatments A & B) and the last pair (treatments A & C). MORE ABOUT ANOVA ANOVA computations are relatively complex and seemingly overwhelming, but shifting your attention back to the conceptual goal of ANOVA can help you recall the general purpose of this analysis. A Conceptual View of ANOVA Example: Predict the values of MSbetween and F-ratio using the data presented. Hint 1: SSbetween and MSbetween measure how much difference there is between treatment conditions. Hint 2: Compare the means or totals (T) for each treatment. A Conceptual View of ANOVA Conceptually, the numerator of the F-ratio always measures the difference between treatments. The example showed an extreme set of scores with 0 difference in T values, thus 0 difference in means. This should help you predict that, most likely, there would be no significant difference between the treatments. A Conceptual View of ANOVA Example: An experiment uses two separate samples to evaluate the mean difference between two treatments. The results are summarized here: A Conceptual View of ANOVA Examine first Experiment A. Between-treatments variance: There is a clear 4-point difference between the means of both treatments. Within-treatments variance: In both treatments, most of the scores are close in value to (or clustered around) the mean. A Conceptual View of ANOVA Let’s now consider data from Experiment B. Between-treatments variance: There is also a clear 4-point difference between the treatment means. Within-treatments variance: The scores in each treatment are scattered, indicating large variance. A Conceptual View of ANOVA For experiment A, we’d find that the F-ratio is sufficient to conclude a significant difference between treatments. For experiment B, the large variance within treatments overwhelms the small difference between them. The F-ratio confirms there is no significant difference between treatments. A Conceptual View of ANOVA The general point of the examples is to help you see what happens when you perform an ANOVA: 1. The numerator of the F-ratio (MSbetween) measures the difference between treatment means. Bigger mean differences = bigger F-ratio. 2. The denominator of the F-ratio (MSwithin) measures the variance of the scores inside each treatment (Larger sample variance = smaller F-ratio). A Conceptual View of ANOVA Additionally… 1. The number of scores in the samples also influences the outcome of ANOVA; if other factors are held constant, greater sample size increases the likelihood of rejecting the H0 (but does not affect the effect size). 2. Problems with high variance often can be minimized by transforming the original scores to ranks and then conducting the Kruskal-Wallis test. MSwithin and Pooled Variance In the t-statistic and in the F-ratio, the variances from the separate samples are pooled together to create one average value for sample variance. For independent- measures t statistic, we used pooled variance: MSwithin and Pooled Variance Now, in ANOVA, we are combining two or more samples to calculate: Both pooled variance & mean squares simply add the SS values and divide by the sum of the df values. The result is an average of all the different sample variances. Relationship between ANOVA and t In evaluating the Tests mean difference in an independent- measures study comparing only two treatments, you can use either an independent t test or ANOVA—it would make no difference which you choose. The two methods use many similar calculations and are closely related in other respects. Relationship between ANOVA and t The basic relationship Tests between t statistics and F-ratios can be stated in an equation: The t statistic compares distances: between two sample means (numerator), and the standard error (denominator). The F-ratio compares variances, which is a measure of squared distance. Relationship between ANOVA and t Tests Other points to consider: You will be testing the same hypotheses whether you choose a t test or an ANOVA. The degrees of freedom for the t statistic and the df for the denominator of the F-ratio (dfwithin) are identical. Relationship between ANOVA and t The distribution of t Tests and the distribution of F-ratios match perfectly if you take into consideration the relationship F = t2. Consider the following example. Relationship between ANOVA and t If each t value is Tests squared, then all negative values become positive. As a result, the whole left-hand side of the t distribution is flipped over to the positive side. This creates an asymmetrical, positively skewed distribution—that is, the F distribution. For α =.05, the critical value for t is ±2.101. When these boundaries are squared, 2.101 = 4.41. 2 THANKS FOR LISTENING!