Module 4.3_Lecture_STAT 165 Chapter B45 PDF
Document Details
Uploaded by UserFriendlyLyric8702
University of the Philippines Los Baños
JRSReyes
Tags
Summary
This document is a lecture or tutorial related to statistical analysis and contingency tables. The content discusses tests for independence, probability, relative risk, odds ratios, and examples for categorical data analysis using tables.
Full Transcript
categorical data analysis STAT 165 Chapter B Contingency Tables 4. Test of Independence and Measure of Association Illustration 1 It has been suggested that men and women demonstrate different approaches to purchasing behavior when shopping for clothing for various reasons (Koca and Koc, 2016)....
categorical data analysis STAT 165 Chapter B Contingency Tables 4. Test of Independence and Measure of Association Illustration 1 It has been suggested that men and women demonstrate different approaches to purchasing behavior when shopping for clothing for various reasons (Koca and Koc, 2016). In this regard, it is of interest to determine if the preference for branded clothes (better quality – agree, disagree) is associated with a person’s sex (women, men). Brand name products are better quality Sex Agree Disagree Women 121 65 Men 157 39 STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Independence } analyzing association among variables is at the heart of the multivariate statistical analysis } two variables are statistically independent: § if the conditional distribution of categorical variable B is constant across categorical variable A (vice versa) (intuitively) 1. B Examples: A b1 b2 b3 2. a1 25 40 35 a2 25 40 35 B A B b1 b2 b3 Total A a1 115 184 161 460 b1 b2 b3 a2 70 112 98 280 a1 62.16 62.16 62.16 Total 185 296 259 740 a2 37.84 37.84 37.84 STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Independence } two variables are statistically independent: § if the joint distribution is equal to the product of their marginal distributions (statistically) Example: B A Total b1 b2 b3 a1 115 184 161 460 ✓ a2 70 112 98 280 0.1554 = 0.1554 Total 185 296 259 740 this must be true for all cells STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence } Null and Alternative Hypotheses § Ho: Variables A and B are independent § Ha: Variables A and B are associated Example: Ho: Preference for branded clothes is independent of a person’s sex. Ha: Preference for branded clothes is associated with a person’s sex. STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence } for large samples § for nominal variables o Pearson’s Chi-Square test o Likelihood-ratio Chi-Square test § for complex survey design o Rao-Scott Chi-Square test § for ordinal variables o Mantel-Haenszel test for linear association } for small samples o Fisher’s Exact test for 2 x 2 Tables STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence: nominal by nominal } for large samples § Pearson’s Chi-Square test (n - E ) 2 a b ni.n. j åå ij ij c = 2 c where Eij = i =1 j =1 Eij n.. Karl Pearson Assumptions: 1. large samples taken using simple random sampling 2. all expected frequencies are greater than or equal to 1; and 3. at most 20% of the expected frequencies are less than 5 STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence: nominal by nominal } for large samples § Likelihood Ratio Chi-Square test æ nij ö Gc = 2åå nij ln ç 2 ÷ çE ÷ i j è ij ø o alternative test to Pearson’s Chi-Square test o based on the maximum likelihood ratio o similar assumptions with Pearson’s Chi-Square test STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence: nominal by nominal Example: Test if the preference for branded clothes is associated with a person’s sex. Pearson's Chi-squared test data:.Table X-squared = 10.908, df = 1, p-value = 0.0009577 Expected Counts Branded Sex 1 2 1 135.3613 50.63874 2 142.6387 53.36126 Statistics: X^2 df P(> X^2) Likelihood Ratio 10.98264 1 0.0009196942 Pearson 10.90757 1 0.0009577232 There is sufficient evidence to say that the preference for branded clothes is associated with a person’s sex (p-value = 0.0009). STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence: nominal by nominal } Pearson’s Chi-Square and Likelihood Ratio Chi-Square tests may have the same value when n is large; however, there are cases where they may provide different test statistic values but yield the same conclusion STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence: nominal by nominal } Analysis of Residuals § identify which cells have contributed to the significant Chi-Square statistic nij - Eij eij = standardized Residual) ((Standardized residual) Eij § larger eij entails a larger contribution to the Chi-Square statistic eij dij = (adjusted residual (Adjusted ) Residual) (1- p ) (1- p ) i! !j If | dij | > Zα/2, the residual is significant STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence: nominal by nominal } Analysis of Residuals § All four cells Example: contributed to the significant test Brand name products are better quality statistics Sex § Large Positive: Men Agree Disagree Agree (More men 121 65 who agreed than the hypothesis of E11=135.3613 E12=50.6387 Women independence e11=–1.2344 e12=2.0181 predicts d11=–3.3027 d12=3.3027 § Large Negative: Men Disagree (Fewer men 157 39 who disagreed than E21=142.6387 E22=53.3613 the hypothesis of Men e21=1.2025 e22=–1.9660 independence d21=3.3027 d22=–3.3027 predicts § ORWomen|Men = 0.4624 STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence: nominal by nominal } Analysis of Residuals § the sum of the squared standardized residuals (eij2) is equal to Pearson’s Chi-Square test statistic = –1.23442+2.01812+1.20252+–1.96602 = 10.9075 What if the data were collected from samples that were selected using a complex design (involving stratification, clustering, and unequal probability sampling instead of a simple random sampling)? The effect of stratification, clustering, and unequal methods of selection of sample invalidates the Pearson’s Chi-Square test. STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence: nominal by nominal } for large samples (complex survey design) § Rao-Scott Chi-Square test o a design-adjusted version of Pearson’s Chi-Square test o ni from the Pearson’s Chi-Square test is replaced by where the average generalized design effect for the ith row Example: It is of interest to know if achieving the school-wide growth target (yes, no) is associated with school type (elementary, middle, high school). The data sets contain information for sampled California schools with at least 100 students and various probability samples of the data (two-stage cluster sample of schools, stratified by school type, within districts).. STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence: nominal by nominal } for large samples (complex survey design) § Rao-Scott Chi-Square test Example: Pearson's X^2: Rao & Scott adjustment X-squared = 11.9409, df = 2, p-value = 0.005553 There is sufficient evidence to say that there is an association between achieving the school-wide growth target (yes, no) and school type (elementary, middle, high school) (p-value = 0.0056). STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Illustration 2 Consider the records obtained from hospitals. A mother’s alcohol consumption is the average number of drinks per day and a child’s malformation is congenital sex organ malformation. Mother’s Child’s Malformation Pearson's Chi-squared test Alcohol data:.Table Consumption Absent Present X-squared = 12.1, df = 4, p-value = 0.02 0 17066 48 X^2)= 0.19 >6 37 1 The two tests provide different conclusions since they ignore the ordinality of the mother’s alcohol consumption. When variables are ordinal, a trend association is common. STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence: ordinal by ordinal } Linear Trend tests § test statistics that use the ordinality by treating ordinal variables as quantitative rather than qualitative (nominal scale) § provide greater power for ordinal variables § as the level of X increases, responses on Y tends to increase toward higher levels, or responses on Y tend to decrease toward a lower level o Mantel-Haenszel test for linear association (I x J) o Cochran-Armitage Trend test (I x 2) o Wilcoxon Rank Sum or Mann-Whitney test (2 x J) [STAT 101] STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence: ordinal by ordinal } Linear Trend tests § Mantel-Haenszel test for linear association (I x J) Mc2 = ( n - 1) r 2 where æ a öæ b ö a b ç å ui ni. ÷ ç å v j n. j ÷ è i =1 ø è j =1 ø åå i =1 j =1 ui v j nij - n r= ö ù éê ù 2 é æ a 2 æ b ö êa ç å u n i i. ÷ ú ç å vi n. j ÷ ú úê ú b ê è ø è ø ê å ui ni. - ú ê å v j n. j - 2 i =1 2 j =1 n n ú ê i =1 ú ê j =1 ú Nathan Mantel William Haenszel êë úû ê ú ë û o r is the Pearson r product moment correlation between the row and column variables (assign scores, e.g. midpoint) o has an asymptotic chi-square distribution with df 1 STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence: ordinal by ordinal } Linear Trend tests § Mantel-Haenszel test for linear association (I x J) o its square root follows a standard normal distribution (one-sided, Ha: ⍴ > 0) o treats both variables as ordinal o can be used when one variable is nominal but has only two categories o same with Pearson’s Chi-Square and Likelihood Ratio Chi- Square tests, it does not distinguish between response and explanatory variables (symmetric) STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence: ordinal by ordinal } Linear Trend tests § Mantel-Haenszel test for linear association (I x J) Example: Mother’s Child’s MalformationThere is sufficient evidence to Alcohol say that there is an association Absent Present Consumption between a mother’s alcohol 0 17066 48 consumption and a child’s 6 37 1 Mantel-Haenszel Chi-Square X-squared = 6.5699, df = 1, p-value = 0.01037 Note: Cochran-Armitage Trend test is also applicable STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence: ordinal by ordinal } for small samples § Fisher’s Exact test for 2 x 2 tables æ n1. ö æ n2. ö ç ÷ç ÷ è n11 ø è n21 ø n1. ! n2. ! n.1 ! n.2 ! P ( n11 ) = = æ n.. ö n11 ! n12 ! n21 ! n22 ! n.. ! ç ÷ Ronald Fisher è n.1 ø where p-value = sum of all probabilities that are at most that of n11 o independence corresponds to OR = 1 (Ho: θ = 1) o due to its conservativeness, using a mid p-value is recommended STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence } for small samples § Fisher’s Exact test for 2 x 2 Tables Example: From Fisher’s tea tasting experiment, test if there is an association between Fisher’s colleague’s guess and the actual order of pouring. Poured Guess Poured First Total First Milk Tea Milk 3 1 4 Tea 1 3 4 Total 4 4 8 STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Test of Independence } for small samples § Fisher’s Exact test for 2 x 2 Tables Example: Fisher's Exact Test for Count Data data:.Table p-value = 0.4857 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 0.2117329 621.9337505 There is no sufficient evidence to say that there is an association between Fisher’s colleague’s guess and the actual order of pouring (p-value = 0.4857). STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Illustration 3 Brand name products are better quality Sex Agree Disagree Women 121 65 Men 157 39 Based on statistical results, there is sufficient evidence to say that preference for branded clothes is associated with a person’s sex. Now, how we can determine the strength of their association? STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Measure of Association: Nominal } Measures based on Odds Ratio § Yule’s Q (see discussion last module) § Yule’s Y } Chi-Square based measure § Pearson’s phi, f § Pearson’s contingency, C § Cramer’s V } Proportional Reduction in Error (PRE) based measure § Lambda or Guttman’s Coefficient of Predictability, l § Goodman and Kruskal’s tau, t § The Uncertainty Coefficient, U STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Measure of Association: Nominal (Chi-Square Based) } Pearson’s phi, f § for two variables with two levels each c C2 f= , 0 £ f £ q - 1 , q=min(a,b) n } Pearson’s contingency, C § for two variables with more than 2 levels c C2 C= , 0 £ C 0 : positive monotonic association o g < 0 : negative monotonic association | Coefficient | Characterization 0.01 to 0.09 weak association 0.10 to 0.29 moderate association 0.30 and above strong association might vary according to discipline STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Measure of Association: Ordinal } Goodman and Kruskal’s Gamma, g Example: With a p-value = 0.0404, there is sufficient evidence to say that wage classification and job satisfaction have a significant monotonic association. Class A B C nc = 222 Low (L) 10 1 1 nd = 85 Med (M) 3 4 5 High (H) 3 7 2 The monotonic Mantel-Haenszel Chi-Square association between X-squared = 4.2, df = 1, wage classification and p-value = 0.04042 job satisfaction is positively strong. STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Measure of Association: Ordinal } Kendall’s Tau-b, tb nc - nd tb = , - 1£ t b £ 1 nc + nd + na nc + nd + nb where na = number of pairs tied on A but not B Maurice Kendall nb = number of pairs tied on B but not A § uses correction for tied pairs § symmetric measure of association § typically lower in value than g (more conservative) § used for square tables (same number of rows and columns) STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Measure of Association: Ordinal } Kendall and Stuart’s Tau-c, tc 2q ( nc - nd ) tc= , - 1£ t c £ 1 n(q - 1) § uses correction for tied pairs and makes adjustments for table size § used for largely unequal marginal frequencies § symmetric measure of association § used for rectangular tables (do not have major diagonals) § when a table is square, the same results with tb STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Measure of Association: Ordinal } tb and tc § t = 0 : independent (in ordinal sequence) § t > 0 : positive monotonic association § t < 0 : negative monotonic association | Coefficient | Characterization 0.01 to 0.09 weak association 0.10 to 0.29 moderate association 0.30 and above strong association might vary according to discipline STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Measure of Association: Ordinal } Somer’s d coefficient, dB|A nc - nd dB|A = , - 1 £ dB|A £ 1 nc + nd + na § asymmetric measure of association } Wilson’s e coefficient, eB|A nc - nd eB|A = , - 1 £ eB|A £ 1 nc + nd + na + nb § asymmetric measure of association STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Other Measures of Association: Categorical Variable with Quantitative } Eta Coefficient, η } Point-Biserial Correlation Coefficient, rpb } Biserial Correlation Coefficient, rb } Tetrachoric Correlation Coefficient, rtet STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Other Measures of Association: Categorical Variable with Quantitative } Eta Coefficient, η § can be obtained through the analysis of variance (ANOVA) § statistically significant association if the coefficient has a value above 0.20 § asymmetric measure of association § Y is at scale/interval and X is at nominal/ordinal § it does not require the assumption of a linear relationship § its square has the same interpretation as R-Squared STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Other Measures of Association: Categorical Variable with Quantitative } Eta Coefficient, η Assumptions: 1. the data must be nonlinear or curvilinear (scatter plot); 2. the data must be asymmetric (histogram); and 3. there must be independence of observations; hence, there is no relationship between groups created by the categories of the categorical variable or between the observations in each group (categories) STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Other Measures of Association: Categorical Variable with Quantitative } Point-Biserial Correlation Coefficient, rpb rpb n - 2 æ Y1 - Y0 ö np0(1- p0 ) test statistic: t pb = rpb = ç ÷ 2 è Y øs n -1 1- rpb § a special case of the Pearson r product-moment correlation § asymmetric measure of association § Y is quantitative and X is binary response (without natural ordering with values 0 and 1) STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Other Measures of Association: Categorical Variable with Quantitative } Point-Biserial Correlation Coefficient, rpb Assumptions: 1. there should be no outliers for the continuous variable for each category of the dichotomous variable (boxplot); and 2. quantitative variable within each group created by the binary variable is normally distributed (Wilk-Shapiro test) with equal variances (Levene’s test) STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Other Measures of Association: Categorical Variable with Quantitative } Biserial Correlation Coefficient, rb u2 - æ rpb ö e 2 rb = ç ÷ p0(1- p0 ) where h = and P(Z ³ u) = p1 è h ø 2p test statistic: where § an estimate of the Pearson r product-moment correlation constructed from the point-biserial correlation (always has higher values than rpb) § asymmetric measure of association § Y is quantitative and X has been dichotomized (with natural ordering) STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Other Measures of Association: Categorical Variable with Quantitative } Biserial Correlation Coefficient, rb Assumptions: 1. the quantitative variable is normally distributed (Wilk- Shapiro test); and 2. the dichotomous variable is an artificial dichotomy (i.e., a dichotomy which has a hypothetical quantitative variable underlying it that follows a normal distribution STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Other Measures of Association: Categorical Variable with Quantitative } Tetrachoric Correlation Coefficient, rtet § symmetric measure of association § X and Y have been dichotomized (with natural ordering) Assumptions: the dichotomous variables are an artificial dichotomy (i.e., with underlying quantitative variables which follow a bivariate normal distribution STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Other Measures of Association: Categorical Variable with Quantitative } Tetrachoric Correlation Coefficient, rtet § if the two variables have been polychotomized (at ordinal scale), use Polychoric Correlation Coefficient instead STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Other Measures of Association: Categorical Variable with Quantitative } Interpretation | Coefficient | Characterization 0.01 to 0.20 very weak association 0.21 to 0.40 weak association 0.41 to 0.60 moderate association 0.61 to 0.80 strong association 0.81 to 0.99 very strong association for all other measures of association; might vary according to discipline STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Other Measures of Association: Categorical Variable with Quantitative Activity B.4.4: It is of interest to know if the final standing of the statistics majors in Stat Theory Course (A) is associated with their final standing in the Computational Statistics Course (B). A random sample of statistics majors was selected and obtained the following data: Score Passed Failed Passed 9 15 Failed 13 10 It is known the final standing in A and B follow a bivariate normal distribution. Compute and interpret for rtet. STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Other Measures of Association: Categorical Variable with Quantitative Activity B.4.5: It is of interest to know if an association exists between the sex of the student and his/her UPCAT grade admission test score. Using the data from a random sample of students, the following results were obtained. Eta Coefficient Test nominal by interval eta 0.261 n = 30865 1. Is eta coefficient valid to answer the objective of the problem? 2. Is there a significant association between sex of the student and his/her UPCAT grade admission test score? 3. What is the strength of association between the two variables? 4. Compute and interpret STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños categorical data analysis STAT 165 Chapter B Contingency Tables 5. Association in Three-Way Tables Illustration 1 A University is being accused of sex bias during admission to graduate school. The data on the sampled applicants were summarized using a three-way contingency table. Results Discipline Sex Admitted Not Admitted Natural Female 0 16 Sciences Male 4 139 (NS) Social Female 53 414 Sciences Male 11 37 (SS) Female 53 430 Total Male 15 176 STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Illustration 1 } the choice of predictor and the control variables are important in any research } in determining the relationship between a response variable Y and an explanatory variable X, controlling covariates Z that can influence the relationship should be done } In the example, Y is the admission score, X is the sex of the applicant, and Z is the chosen discipline STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Three-Way Contingency Tables } display counts for three variables: X, Y, and Z } Partial Table § displays the XY relationship at fixed levels of Z § shows the effect of X on Y while controlling for Z § two-way cross-sectional slices of a three-way table cross-classify X and Y at separate levels of Z } Marginal Tables § two-way contingency table that results from combining the partial tables § a two-way table relating X and Y § contains no information about Z STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Three-Way Contingency Tables Results Discipline Sex Admitted Not Admitted Natural Female 0 16 Sciences Male 4 139 (NS) Partial Social Female 53 414 Tables Sciences Male 11 37 (SS) Female 53 430 Total Male 15 176 Marginal Table STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Three-Way Contingency Tables } Conditional Association § the effect of X on Y conditional on fixing Z at some level Example: Using the partial tables o when the applicants chose natural sciences, the admission was 2.80% – 0.00% = 2.80% more frequent for a male than female o when the applicants chose social sciences, the admission was 22.92% – 11.35% = 11.57% more frequent for a male than female o hence, controlling for the applicant’s chosen discipline by keeping it fixed, the percentage of admission was higher for males than for females STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Three-Way Contingency Tables } Conditional Association Example: Using the marginal table o ignoring the applicants’ chosen discipline, the percentage of admission was lower for males (7.85%) than for females (10.97%) The example illustrates Simpson’s Paradox § this happens when the result of a marginal association can have a different (reverses) direction from the conditional association STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Three-Way Contingency Tables } Conditional Odds Ratio § the odds ratio between X and Y at fixed levels of Z Example: Estimated OR with NS as discipline Estimated OR with SS as discipline The sample odds of The sample odds of admission to graduate admission to graduate school for NS applicants school for SS applicants were less likely for females were less likely for females than males. than males. STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Three-Way Contingency Tables } Marginal Odds Ratio § the odds ratio between X and Y in which Z is ignored rather than controlled Example: The sample odds of admission to graduate school were 45% higher for females than males. This illustrates Simpson’s Paradox. STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Three-Way Contingency Tables } Conditional Independence, given Z § if X on Y are independent in each partial table (conditional odds ratio for any level of Z is equal to 1.0) } Marginal Independence § if the marginal odds ratio is equal to 1.0 Note: Conditional independence of X and Y does not imply marginal independence STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Three-Way Contingency Tables } Homogeneous Association § occurs when the conditional odds ratio for all levels of Z (where K is the number of categories in Z) are equal with X and Y being both binaries § when there is a homogeneous association in XY, there is also a homogenous XZ association and a homogeneous YZ association § this means that there is no interaction between the two variables in their effects on the third variable STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Three-Way Contingency Tables Activity B.5.1: Consider the table about the response, drug treatment, and clinic of the patients. Response 1. Compute the conditional Clinic Treatment odds ratios between Success Failure treatment and response at the two levels of the A 18 12 clinic 1 B 12 8 2. Given the clinic, are the response and treatment A 2 8 conditionally 2 independent? B 8 32 3. Compute the marginal odds ratio. Is there A 20 20 Total marginal independence B 20 40 between treatment and response? STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Three-Way Contingency Tables } Homogeneous Association Example: For X = smoking (yes, no), Y = lung cancer (yes, no), and Z = age (65). Smoking has a weak effect on lung cancer for young people, but the effect strengthens considerably by age. § when there is no homogenous association, the conditional odds ratio for any pair of variables change across levels of the third variables STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Three-Way Contingency Tables } Statistical Inference § Cochran-Mantel-Haenszel test (CMH test) o to test the conditional independence between X and Y, controlling for Z (to compare the odds ratio of several two-way contingency tables) o analyze retrospective studies of Y to compare two groups X and adjust for a control variable Z o Ho: William Cochran STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Three-Way Contingency Tables } Statistical Inference § Cochran-Mantel-Haenszel test (CMH test) Assumptions: 1. observations are independent of each other; and 2. all observations are identically distributed o if some and some , it is not appropriate to use o analyze retrospective studies of Y to compare two groups X and adjust for a control variable Z STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Three-Way Contingency Tables } Statistical Inference § Estimate of the Common Odds Ratio, θ o Mantel-Haenszel Estimator Note: the estimate of its standard error is complex; however, most statistical software generates a confidence interval of θ STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Three-Way Contingency Tables } Statistical Inference Example: Test if conditional independence exists between the sex of the applicants and admission result to graduate school (no sex discrimination), controlling for the chosen discipline. Mantel-Haenszel chi-squared test with continuity correction Mantel-Haenszel X-squared = 4.779, df = 1, p-value = 0.02881 alternative hypothesis: true common odds ratio is not equal to 1 95 percent confidence interval: 0.1991173 0.8519141 sample estimates: common odds ratio 0.4118627 There is sufficient evidence to say that there is no conditional independence between the two variables controlling for chosen discipline. STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Three-Way Contingency Tables } Statistical Inference § Estimate of the Common Odds Ratio, θ o if a homogenous association exists, is useful as an estimate of θ; otherwise, it can be used as a summary statistic of the K conditional (partial) associations provided that the odds ratios have the same direction STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Three-Way Contingency Tables } Statistical Inference § Breslow-Day test o used to test for the homogeneous association o a chi-squared test comparing the observed counts to estimated expected frequencies that have a common odds ratio o it works only for large samples o Ho: STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños Three-Way Contingency Tables } Statistical Inference § Breslow-Day test Assumptions: 1. samples should be relatively large for each group (level of Z); and 2. at least 80% of the expected cell counts should be greater than 5 o if conditional independence is rejected with the CMH test, a test for the homogeneous association should still be done. STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños R Software: R Commander } Pearson Chi-Square test/ Fisher’s Exact test § Statistics > Contingency Tables > Two-way table… or enter and analyze two-way table… Note: In the Statistics tab, check the following: (i) Chi- square test of independence or Fisher’s exact test and (ii) Print expected frequencies (to check the Chi-square assumptions about the expected frequencies) STAT 165. Categorical Data Analysis | JRSReyes (2024) | INSTAT. UP Los Baños R Software: R Software } Likelihood Ratio Chi-Square test § library(MASS) § data