Document Details

Uploaded by Deleted User

Trinity University of Asia

Caesar Franz C. Ruiz, LPT, M.Sc.

Tags

chi-square tests statistics biology homogeneity tests

Summary

This document is a presentation on chi-square tests, focusing on their properties, applications, and different types, such as homogeneity and association tests, including examples and exercises.

Full Transcript

Chi Square Tests __________ Caesar Franz C. Ruiz, LPT, M.Sc. CASE - BIOLOGY Learning Outcomes At the end of the session, the students will be able to: 1. Describe the properties of the chi-square distribution 2. Differentiate the chi-square test of homogeneity of proport...

Chi Square Tests __________ Caesar Franz C. Ruiz, LPT, M.Sc. CASE - BIOLOGY Learning Outcomes At the end of the session, the students will be able to: 1. Describe the properties of the chi-square distribution 2. Differentiate the chi-square test of homogeneity of proportion and chi-square test of association 3. Interpret the computed value of the chi-square test 4. Discuss the requirements for a valid use of the chi-square test The Chi-Square Test Statistic Used when variables of interest are qualitative variables with mutually exclusive and collectively exhaustive categories The quantitative data used are the frequencies associated with each category of the variables under study. observed frequencies and expected frequencies Compares the observed frequency of elements falling in different categories with the expected frequency if the null hypothesis were true. Large differences between the observed and expected frequencies lead to the rejection of the null hypothesis. Chi Square Test (𝝌𝟐 Test) Three types: Test of Test of Goodness of Homogeneity Association Fit Test Follows the chi square distribution Chi-square distribution: characteristics The lower the degrees of freedom, the more positively skewed the distribution The greater the degrees of freedom, the more symmetrical the distribution. As the df increases the curve becomes more normal in distribution. The mean of a chi-square sampling distribution will always equal the degrees of freedom for the distribution. The total area under each curve is equal to 1 (that is, full probability). Chi-square distribution can range from zero to positive infinity Applicability of the Chi-Square Test applicable to data in a contingency table only if the expected frequencies are sufficiently large. For a 2 x 2 table, the requirement is for all expected frequencies to be greater than or equal to 5. *If this requirement is not satisfied, Fisher’s Exact Probability Test should be used. Two requirements for larger tables 1) All expected frequencies should be greater than or equal to one. 2) Not more than 20% of cells should have expected frequencies less than 5. * If these are not met, merge adjacent categories to increase the expected frequencies in the various cells Chi Square Test of Homogeneity 𝝌𝟐 Test of Homogeneity Used to find whether two or more populations have the same proportions for the different categories of another variable. When there are only two populations and the variable of interest has only two categories, the 𝜒 2 Test of Homogeneity may be used interchangeably with the z-test for two proportions. DATA LAY-OUT: Contingency table Rows represent the categories of one variable Columns represent the categories of the other variable Entries are the observed frequencies Row and column totals are referred to as the marginals Characteristics of the 𝝌𝟐 homogeneity 1. Two or more populations are identified in advance, and an independent sample is drawn from each. 2. Sample subjects or objects are placed in appropriate categories of the variable of interest. 3. The calculation of expected cell frequencies is based on the rationale that if the populations are homogeneous as stated in the null hypothesis, the best estimate of the probability that a subject or object will fall into a particular category of the variable of interest can be obtained by pooling the sample data. 4. The hypotheses and conclusions are stated in terms of homogeneity (with respect to the variable of interest) of populations. Chi Square Test Steps in hypothesis testing 1. State the null (H0) & alternative (H1 or Ha) hypotheses 2. State the level of significance 3. Select the appropriate test statistic 4. Determine the critical region 5. Compute the test-statistic 6. Make a statistical decision 7. Draw conclusions about the population 11 Example Kodama et al. (1991) studied the relationship between age and several prognostic factors in squamous cell carcinoma of the cervix. The following data were collected. Age Group Number of Cell Type (years) Patients Large Cell Nonkeratinizing Keratinizing Cell Small Cell Cell Type Type Nonkeratinizing Cell Type 30-39 34 18 7 9 40-49 97 56 29 12 50-59 144 83 38 23 60-69 105 62 25 18 Total 380 219 99 62 May we conclude that the populations represented by the four age-group samples are not homogeneous with respect to cell type? Use α = 0.05. Example Objective: To determine homogeneity Population: Patients with squamous cell carcinoma of the cervix Variables: Age-group and cell type Test: Chi square test of homogeneity Steps in hypothesis testing (Manual computation) 1. Null and alternative hypothesis H0: The four populations are homogeneous with respect to cell type. Ha: The four populations are not homogeneous with respect to cell type. 2. Level of significance: α = 0.05 (𝑂−𝐸) 2 3. Test statistic: 𝜒2 = σ 𝐸 4. Critical Region: df = (r – 1)(c – 1) = (4-1)(3-1) = (3)(2) = 6 𝜒2 > 12.592 5. Computations: Observed and Expected Frequencies Age Group Number of Cell Type (years) Patients Large Cell Nonkeratinizing Keratinizing Cell Small Cell Cell Type Type Nonkeratinizing Cell Type 30-39 34 18 (19.59) 7 (8.86) 9 (5.55) 40-49 97 56 (55.90) 29 (25.27) 12 (15.83) 50-59 144 83 (82.99) 38 (37.52) 23 (23.49) 60-69 105 62 (60.51) 25 (27.36) 18 (17.13) Total 380 219 99 62 2 2 2 2 𝜒2 = (18−19.59) + (7−8.86) + (9−5.55) + 19.59 8.86 5.55... + (18−17.13) 17.13 = 4.444 Expected frequencies O-E O-E sq O-E sq/E Large Large Cell Small Cell Large Cell Small Cell Large Cell Small Cell Small Cell Keratinizi Keratinizi Keratinizi Cell Keratinizi Nonkerati Nonkerati Nonkerati Nonkerati Nonkerati Nonkerati Nonkerat ng Cell ng Cell ng Cell Nonkerat ng Cell nizing Cell nizing Cell nizing Cell nizing Cell nizing Cell nizing Cell inizing Type Type Type inizing Type Type Type Type Type Type Type Cell Type Cell Type - - 19.594 8.8578 5.5473 1.5947 1.8578 3.4526 2.5431 3.4517 11.920 0.1297 0.3896 2.1488 74 95 68 4 9 32 86 73 66 89 83 86 - 55.902 25.271 15.826 0.0973 3.7289 3.8263 0.0094 13.905 14.640 0.0001 0.5502 0.9250 63 05 32 68 47 2 81 05 69 7 36 85 - 82.989 37.515 23.494 0.0105 0.4842 0.4947 0.0001 0.2344 0.2447 1.34E- 0.0062 0.0104 47 79 74 26 11 4 11 6 65 06 5 18 - 60.513 27.355 17.131 1.4868 2.3552 0.8684 2.2106 5.5472 0.7541 0.0365 0.2027 0.0440 16 26 58 42 6 21 99 65 55 33 86 21 S U 0.1664 1.1489 3.1284 M 93 55 11 4.44386 6. Statistical decision: Since 4.444, the computed 𝜒 value, is less than the 2 critical value of 12.562, do not reject the null hypothesis 7. Conclusion: There is no sufficient evidence to conclude that the four populations are not homogeneous with respect to cell type. Steps in hypothesis testing (using computer output) 1. Null and alternative hypothesis H0: The four populations are homogeneous with respect to cell type. Ha: The four populations are not homogeneous with respect to cell type. 2. Level of significance: α = 0.05 3. Test statistic: Chi-square Test of Homogeneity 4. Critical Region: N/A Hypothesis Testing 5. Computations: N/A 5. Computations: N/A 6. Statistical decision: Since the p-value = 0.6168 is more than α = 0.05, do not reject 7. Conclusion: the null hypothesis. There is no sufficient evidence to conclude that the four populations are not homogeneous with respect to cell type. Exercise In a telephone survey conducted by Garcha (1990), respondents were asked to indicate their level of agreement with the statement “Cigarette smoking should be banned in public places.” The results were as follows Level of Agreement Gender Strongly Agree Agree Neutral Disagree Strongly Disagree Female 40 38 16 37 5 Male 16 25 11 25 11 Can we conclude on the basis of this data that males and females differ with respect to their levels of agreement on the banning of cigarette smoking in public places? Let α = 0.05. Exercise Objective: To determine homogeneity Population: Respondents of a telephone survey Variables: Gender and level of agreement Test: Chi square test of homogeneity Steps in hypothesis testing 1. State the null (H0) & alternative (H1 or Ha) hypotheses 2. State the level of significance 3. Select the appropriate test statistic 4. Determine the critical region 5. Compute the test-statistic 6. Make a statistical decision 7. Draw conclusions about the population 26 Steps in hypothesis testing (using computer output) 1. Null and alternative hypothesis H0: Males and females do not differ with respect to their levels of agreement on the banning of cigarette smoking in public places. Ha: Males and females differ with respect to their levels of agreement on the banning of cigarette smoking in public places. 2. Level of significance: α = 0.05 Chi-square Test of Homogeneity 3. Test statistic: 4. Critical Region: N/A 5. Computations: N/A Since the p-value = 0. 07265 is more than α = 0.05, do not reject 6. the null hypothesis. Statistical decision: 7. Conclusion: We have no sufficient evidence to conclude that males and females differ with respect to their levels of agreement on the banning of cigarette smoking in public places. Chi Square Test of Association Chi Square Test of Association Used to determine if there is a relationship or an association between two qualitative variables No association = independent A quantitative variable may be used but the data need to be transformed into the nominal form before 𝜒2 can be applied. Deals with 1 random sample from a single population Each person has data on each of the two variables Each variable has mutually exclusive and exhaustive categories (≥2) Each observation in the sample can be classified to one and only one category from each variable Chi Square Test of Association Null and alternative hypothesis H0: There is no association between the first and second variable. The first variable is not associated with the second variable. The first and second variable are independent. Ha: There is an association between the first and second variable. The first and second variable are not independent. The first and second variable are associated. Data Structure 2 variables with at least 2 categories each Categories of one variable as rows; Categories of the other variable as columns Variable 2 Category A Category B Total Category A r1 Variable 1 Category B r2 Total c1 c2 n Variable 2 Category A Category B Category C Total Category A r1 Variable 1 Category B r2 Total c1 c2 c3 n Variable 2 Category A Category B Category C Total Category A r1 Category B r2 Variable 1 Category C r3 Category D r4 Total c1 c2 c3 n Chi Square Test of Association Test Statistic (𝑂 −𝐸) 2 𝜒2 = ෍ 𝐸 Where O : Observed frequency 𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙𝑥𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙 E : Expected frequency = 𝑔𝑟𝑎𝑛𝑑 𝑡𝑜𝑡𝑎𝑙 df = degrees of freedom = (r-1)(c-1) Expected Frequencies Frequency to be expected if the null hypothesis were true: That there is no association between the two variables Variable 2 Category A Category B Total 𝑟1 ∗ 𝑐1 𝑟1 ∗ 𝑐2 Category A 𝐸𝐹 = 𝐸𝐹 = r1 𝑛 𝑛 Variable 1 𝑟2 ∗ 𝑐1 𝑟2 ∗ 𝑐2 Category B 𝐸𝐹 = 𝐸𝐹 = r2 𝑛 𝑛 Total c1 c2 n The characteristics of a chi square test of independence (association) that distinguish it from other chi-square tests are as follows: 1. A single sample is selected from a population of interest, and the subjects or objects are cross-classified on the basis of the two variables of interest. 2. The rationale for calculating expected cell frequencies is based on the probability law, which states that if two events (here the two criteria of classification) are independent, the probability of their joint occurrence is equal to the product of their individual probabilities. 3. The hypotheses and conclusions are stated in terms of the independence (or lack of independence) of two variables. Example In a study to determine the effects of a 5-day smoking cessation program on the smoking behavior of selected high school students in Manila, the following data were obtained: Type of School Smokers Non-smokers Total Private 59 98 157 Public 115 562 677 Total 174 660 834 Is there evidence that smoking status is associated with type of school? Use α = 0.01. Example Objective: To determine association Population: High school students in Manila Variables: Type of school and smoking status Test: Chi square test of association Hypothesis Testing 1. Null and alternative hypothesis H0: There is no association between smoking status and type of school. Ha: There is an association between smoking status and type of school. 2. Level of significance: α = 0.01 3. Test statistic: Chi-square Test of Association 4. Critical Region: N/A Hypothesis Testing 5. Computations: N/A SEAMEO TROPMED Philippines Regional Centre for Public Health, COLLEGE OF PUBLIC HEALTH Hospital Administration, Environmental University of the Philippines Manila and Occupational Health Hypothesis Testing 5. Computations: N/A Hypothesis Testing 6. Statistical Decision Since the p-value of < 0.0000001 is less than α = 0.01, reject the null hypothesis. 7. Conclusion: There is an association between smoking status and type of school. Exercise According to Silver and Aiello (2002), falls are of major concern among polio survivors. Researchers wanted to determine the impact of a fall on lifestyle changes. The table below shows the results of a study of 233 polio survivors on whether fear of falling resulted in lifestyle changes. Use α = 0.05. Made Lifestyle Changes Because of Fear of Falling Total Yes No Fallers 131 52 183 Nonfallers 14 36 50 Total 145 88 233 Objective: To determine association Population: Polio survivors Variables: Fall status and lifestyle changes because of fear of falling Test: Chi square test of association Steps in hypothesis testing 1. State the null (H0) & alternative (H1 or Ha) hypotheses 2. State the level of significance 3. Select the appropriate test statistic 4. Determine the critical region 5. Compute the test-statistic 6. Make a statistical decision 7. Draw conclusions about the population Hypothesis Testing 1. Null and alternative hypothesis H0: There is no association between fall status and lifestyle changes because of fear of falling. Ha: There is an association between fall status and lifestyle changes because of fear of falling. 2. Level of significance: α = 0.05 3. Test statistic: Chi-square Test of Association 4. Critical Region: N/A Hypothesis Testing 5. Computations: N/A 5. Computations: N/A Hypothesis Testing 6. Statistical Decision Since the p-value =

Use Quizgecko on...
Browser
Browser