Analysis of Variance (ANOVA) PDF
Document Details
Uploaded by Deleted User
Korea University Business School
Kyung Sam Park
Tags
Related
Summary
This document is a lecture or presentation on Analysis of Variance (ANOVA). It covers topics including F-distribution, F-tests, one-way and two-way ANOVA, and examples.
Full Transcript
Analysis of Variance (ANOVA: 분산분석) (Chapter 12) Kyung Sam Park Professor of LSOM Korea University Business School [email protected] Contents F-distribution Ratio of two group variances F-test Two group variance difference test (두 그룹의 분산차이 검...
Analysis of Variance (ANOVA: 분산분석) (Chapter 12) Kyung Sam Park Professor of LSOM Korea University Business School [email protected] Contents F-distribution Ratio of two group variances F-test Two group variance difference test (두 그룹의 분산차이 검정) ANOVA tests One-way ANOVA (One-factor ANOVA, 일원배치 분산분석) Two-way ANOVA (Two-factor ANOVA, 이원배치 분산분석) Without replication (반복이 없는 이원배치) With replication (반복이 있는 이원배치) 2 F-Distribution Ratio of two group variances: X X1 n1 2 1i 2 n1 1 Where, s F (n1 1, n2 1) 1 i 1 s12, s22 = Variances of groups 1, 2 X X2 2 n2 2s 2i 2 n2 1 n1, n2 = No. of data in groups 1, 2 i 1 Major characteristics: 1) There is a family of F distributions: A particular member of the family is determined by two parameters: the degrees of freedom in the numerator and the degrees of freedom in the denominator. 2) The F distribution cannot be negative: The smallest value of F is 0. 3) It is positively skewed. 4) It is asymptotic. 3 F-Test Independent two group variance difference test (독립된 두 그룹의 분산차이 검정). Woman Man 71 87 78 91 78 77 91 47 85 77 74 63 91 78 81 78 67 88 82 84 96 90 92 91 93 81 82 81 63 58 95 58 79 87 96 92 89 92 40 90 94 89 87 96 80 4 F-Test (continued) Hypothesis H0: 12 = 22 H1: 12 22 s12 49.76 Test statistics F (n1 1, n2 1) 2 = = 0.24 s2 206.10 H0: 12 = 22 Excel output Woman Man Means 87.41176 77.60714 probability Variances 49.75735 206.0992 0.0022 No. of data 17 28 df 16 27 F-statistic 0.241424 p-value 0.002249 0 1 4 F Critical value 0.451978 0.24 Therefore, H1 is accepted since the p-value < 1% 5 F-Test (continued) Hypothesis H0: 12 = 22 H1: 12 22 The test concept: If the F ratio is near unity, H0 will be accepted. Otherwise it may be not. The idea behind the test for variances is that if H0 is acceptable, then their ratio will be approximately 1. Otherwise, the ratio will be much larger or smaller than 1. The F distribution provides a decision rule to let us know when the departure from one is too large to have happened by chance: Depending on the significance level , it may be accepted or rejected. Revisit to t-tests (H0: 1 = 2 H1: 1 2) for the case of independent data If H0 (12 = 22) is accepted, use t-test assuming equal variance (등분산 가정). If H1 (12 22) is accepted, use t-test assuming unequal variance (이분산 가정). 6 Comments on two group variances Possible cases of two group variances: Case Group A Group B F-test result 1 Small Small H0 accepted 2 Big Big H0 accepted 3 Big Small H1 accepted 4 Small Big H1 accepted Cases 1 & 2 (the equal variance case): Do t-test assuming equal variance. Cases 3 & 4 (the unequal variance case): Do t-test assuming unequal variance, or Do not do t-test assuming unequal variance. Consider why the variance is “Big” or “Small”? 7 Comments (continued) Consider Case 3: Case Group A Group B F-test result 3 Big Small H1 accepted If the datasets represent a medication effect (i.e., how long until recovering after taking the medication), Group A may be a “heterogeneous” group, while Group B “homogeneous” group, in terms of individual difference factors like age, etc. In this case, there might be a fairness issue (or sampling problem) to compare the two group means. If so, do not use t-test assuming unequal variance, and then gather new datasets, resulting in the equal variance (H0) accepted from F-test. In addition, now consider the case of paired data!!! The fairness is already guaranteed. No problem!!! 8 Comments (continued) Again, consider Case 3: Case Group A Group B F-test result 3 Big Small H1 accepted If the datasets represent test scores, and Group A was “men” group and Group B “women” group, then Is there a fairness issue (or sampling problem) to compare the two group means? If no, use t-test assuming unequal variance to compare the two group means. Conclusion: Decide appropriately whether or not to use t-test assuming unequal variance, after ruing F-test. 9 One-way ANOVA Example: comparing three means TSS(1,082) is divided H0: 1 = 2 = 3 H1: Not all the means are equal. into 992 and 90. Method A Method B Method C 55 66 47 54 76 51 59 67 46 992 90 56 71 48 Means: 56 70 48 58 (Overall mean) Total variation (TSS: Total Sum of Square): (55 58) 2 (54 58) 2 (48 58) 2 1,082 Treatment variation (SST: SS for Treatment): 4(56 58) 2 4(70 58) 2 4(48 58) 2 992 Random variation (SSE: SS for Error): (55 56) 2 (54 56) 2 (48 48) 2 90 Note that TSS = SST + SSE SST= how much the data are different across the three different columns = how much the three column means are different. SSE = how much the data are different within the same columns. Therefore, as SST is larger, compared to SSE, then the three means would be more different. 10 One-way ANOVA (continued) ANOVA table Variation Sum of Degree of Mean square F square freedom (variance) Treatment SST k1 SST/(k 1) = MST MST/MSE Residual SSE n k SSE/(n k) = MSE Total TSS n 1 where, k = no. of different methods, n = no. of data. Excel output: 변동 제곱합 자유도 제곱평균(분산) F p-value 처리 992 2 992/2 49.6 1.38E–05 잔차 90 9 90/9 Total 1,082 11 p-value < 1%, so H1 may be accepted. Note: ANOVA is an overall test. 11 One-way ANOVA (continued) Hypothesis H0: 1 = 2 = 3 H1: Not all the means are equal. H1 implies various specific cases: (1) 1 2 2 = 3 (2) 1 = 2 2 3 (6) 1 2 3 If we want to know which case is correct, run t-test several times. 12 One-way ANOVA: Another example Relationship of the student scores and the course evaluation A higher score gives a better evaluation? H0: 1 = 2 = 3 = 4 H1: Not all the means are equal. Excellent Good Fair Poor Total 94 75 70 68 90 68 73 70 85 77 76 72 80 83 78 65 88 80 74 68 65 65 Size (n) 4 5 7 6 22 Mean 87.25 78.20 72.86 69.00 75.64 Sum of Mean Variation df F ratio P-value square square Treatment 890.6838 3 296.8946 8.990 0.00074 Residual 594.4071 18 33.02262 Total 1485.091 21 13 One-way ANOVA (continued) Relationship between the student scores & the course evaluations H0: 1 = 2 = 3 = 4 H1: All the means are not equal. Excellent Good Fair Poor Mean 87.25 78.20 72.86 69.00 100 Course 90 Evaluation 80 score 70 60 50 40 30 20 10 0 Student score Excellent Good Fair Poor Since H1 is accepted (so three means are different significantly), we can say there is a strong relationship between the two (or more specifically, as the student score is higher, the course evaluation score is higher). What if H0 is accepted? 14 Comments Different graphs are possibly obtained for some other datasets: 100 80 60 40 20 0 Excellent Good Fair Poor H1 would be accepted. No tendency, fluctuating. H1 would be accepted. If so, as the student score is 100 smaller, the course 80 evaluation score is better. 60 40 20 0 Excellent Good Fair Poor H0 would be accepted. Two are independent. 15 Two-way ANOVA without replication (반복 없음) Comparing mean travel times from two factors: What about routes? What about drivers? Treatment Route 1 2 3 4 mean Block Driver A 18 20 20 22 20 B 21 22 24 24 22.75 C 20 23 25 23 22.75 D 25 21 28 25 24.75 E 26 24 28 25 25.75 mean 22 22 25 23.8 23.2 Total variation (TSS): 139.2 SST (Route): 5(22 – 23.2)2 + 5(22 – 23.2)2 + 5(25 – 23.2)2 + 5(23.8 – 23.2)2 = 32.4 SSB (Driver): 4(20 – 23.2)2 + 4(22.75 – 23.2)2 + … + 4(25.75 – 23.2)2 = 78.2 SSE (Random): TSS – (SST1 + SST2) = 139.2 – (32.4 + 78.2) = 28.6 16 Two-way ANOVA without replication ANOVA Table Variation Sum of df Mean square F square Treatment SST k1 SST/(k 1) = MST MST/MSE Block SSB b1 SSB/(b 1) = MSB MSB/MSE Residual SSE (k 1)(b 1) SSE/(n k – b + 1) = MSE Total TSS n 1 EXCEL output 변동 제곱합 자유도 제곱평균 F P-value 처리 32.4 3 10.8 4.53 0.02407 블락 78.2 4 19.55 8.20 0.00199 잔차 28.6 12 2.383 Total 139.2 19 If =5%, the five drivers’ means are different significantly, and the four routes’ means are, too. 17 Two-way ANOVA with replication (반복 있음) Three sets of null & alternative hypotheses: What about routes? What about derivers? What about the interaction effect (상호교호작용 효과)? Route/ R1 R2 R3 R4 Driver A 18 20 20 22 16 19 22 24 15 21 23 25 B 21 22 24 23 20 23 25 24 19 24 26 22 C 20 23 25 23 18 24 26 24 18 22 27 22 D 25 21 28 25 27 22 29 26 28 23 30 27 E 26 24 28 25 29 21 29 23 30 20 28 22 18 Two-way ANOVA with replication ANOVA Table (EXCEL output) Sum of Mean Variation df F ratio P-value square square Row 243.07 4 60.77 36.83 6.51E-13 Column 165.4 3 55.13 33.41 5.57E-11 Interaction 224.27 12 18.69 11.33 2.14E-09 Residual 66 40 1.65 35.0 Total 698.73 59 30.0 Mean R1 R2 R3 R4 25.0 A 16.3 20.0 21.7 23.7 20.0 R1 B 20.0 23.0 25.0 23.0 R2 R3 C 18.7 23.0 26.0 23.0 15.0 R4 D 26.7 22.0 29.0 26.0 10.0 E 28.3 21.7 28.3 23.3 5.0 0.0 A B C D E 19 Interaction effect (상호교호작용 효과)? Two-way ANOVA with replication (반복이 있는 이원배치 분산분석) Mean travel time Mean travel time Mean travel time B B Driver Driver A B A Driver A R1 R2 Route R1 R2 Route R1 R2 Route No interaction effect Strong (or certain) Weak interaction effect interaction effect 20