Statistical Inference - Two Categorical Variables PDF

Summary

This document explains the chi-square test for independence, a statistical method used to determine if there's a relationship between two categorical variables. It presents a two-way table example of sickle cell trait and resistance to malaria, and outlines the steps to calculate the expected counts and the chi-square test statistic.

Full Transcript

Statistical Inference – Two Categorical Variables Chi-Square Test for Independence 2 Test for independence, How to construct two-way table 1. Is there a relationship bet...

Statistical Inference – Two Categorical Variables Chi-Square Test for Independence 2 Test for independence, How to construct two-way table 1. Is there a relationship between the two categorical variables? … such that the category into which individuals fall for one variable seems to depend on the category they are in for the other variable? ⇒ Chi-square test for independence Constructing a two-way contingency table Sickle cell trait is widespread in some peoples of African lineage. Why so prevalent when it is detrimental to health? Possible link with being a sickle-cell carrier and protection against malaria. 543 African children were checked for the trait and for malaria. 36 out the 136 who showed sickle-cell trait were heavily infected with malaria. Among the 407 without sickle-cell trait, 152 were heavily infected with malaria. a) Construct a two way table of counts for a possible relationship between the presence of sickle cell disease and resistance to malaria. Observed Malaria No malaria TOTAL Sickle-cell trait 36 100 136 No sickle cell trait 152 255 407 TOTAL 188 355 543 b) Do the data provide evidence that there is a relationship between the presence of sickle cell disease and resistance to malaria? Perform a suitable statistical test at α=0.05. 3 Step 1 (Chi-square Test for Independence) The four steps in carrying out a significance test: a l ways! State the null and alternative hypotheses. As Check conditions and then Calculate the test statistic. Find the P-value using the appropriate distribution. Make decision and state your conclusion in the context of the specific setting of the test. Step 1: Determine null and alternative hypotheses H0: The two categorical variables are not related. (are NOT dependent on each other). Ha: The two variables are related. (ARE dependent on each other). In the sickle cell and malaria example, H0: There is NO relationship between sickle-cell trait and incidence of malaria. Ha: There IS a relationship between sickle-cell trait and incidence of malaria. 4 Calculating Expected Counts under H0 (variables not related) To test H0, we will compare the observed counts in the table (the original data) with the expected counts (the counts we would expect if H0 were true) If the observed counts are far from the expected counts, that is evidence against H0 in favor of a real relationship between the two variables. Obtaining the expected counts: Variable Y Variable X Category 1 Category 2 Category A (TA× T1)/T (TA× T2)/T TA Category B (TB× T1)/T (TB× T2)/T TB T1 T2 T 5 Example: Calculating Expected Counts under H0 (variables not related) In the sickle cell and malaria example, Expected counts Malaria No malaria TOTAL Sickle-cell trait 47.09 136 No sickle cell trait 407 TOTAL 188 355 543 The overall proportion of all 543 individuals who have SC trait = 136/543 ⇒ 25.05% Under H0 (variables not related), we expect 25.05% of Malaria group to have SC trait. So the expected counts in Malaria group (188 children) who have SC trait: 25.05% of 188 = 188 x (136/543) = 47.09 Complete the rest in a similar way, Observed (Expected) Malaria No malaria TOTAL Sickle-cell trait 36 (47.09) 100 (88.91) 136 No sickle cell trait 152 (140.91) 255 (266.09) 407 TOTAL 188 355 543 6 Step 2a, 2b (Chi-Square Test for Two-Way Tables) Step 2a: Necessary Conditions Step 2b: Test Statistic (Continued) a) The sample should be a random sample from the population. Again in the sickle cell and malaria example b) Guidelines for large enough sample: All expected counts should be at least 1. At least 80% of the cells, expected count ≥ 5. In the sickle cell and malaria example, we are willing to assume that the data is from a random sample and all values >> 5 so OK. Step 2b: Calculate the Test Statistic. Here, the sum is over all cells in the table. (O = observed, E = Expected) The Chi-square statistic measures the difference between the observed counts and the counts that would be expected if there were no relationship (H0). So, larger difference between O and E ⇒ larger test statistic (one-sided) ⇒ stronger evidence of a relationship 7 Step 3 (Chi-Square Test for Two-Way Tables) Step 3: Find the P-value using the appropriate distribution Use the Chi-square distribution (Table D in Moore) Family of distributions that take only positive values and are skewed to the right Specific chi-square distribution is specified by giving its degrees of freedom. Degrees of freedom: df = (Rows – 1)(Columns – 1) = (r – 1)(c – 1) Here, df does not involve sample size. p-value = probability that chi-square test statistic could have been as large or larger if the null hypothesis were true. (ALWAYS ONE-SIDED) the p-value is between 0.02 and 0.025. 8 Step 4 (Chi-Square Test for Two-Way Tables) Step 4: Decision and Conclusion Whether or not the result is statistically significant is based on the P-value and the chosen level of significance α : if the p-value ≤ α → reject null, H0 concluding there is significant evidence for the alternative that there is a relationship between the variables. if p-value > α → cannot reject null with conclusion that there is not enough evidence to support the alternative hypothesis. In the sickle cell and malaria example: Use significance level at α =0.05, P-value between 0.02 and 0.025, i.e., p-value < 0.05. Decision: We reject H0 Conclusion: The sample is evidence at 5% significance level that there is a relationship between sickle-cell trait and incidence of malaria.

Use Quizgecko on...
Browser
Browser