Quantitative Research Methods Lecture 8 PDF
Document Details
Uploaded by ConscientiousEvergreenForest1127
Toronto Metropolitan University
2024
Michael E. Campbell
Tags
Related
- Quantitative Research Methods in Political Science Lecture 3 PDF
- Quantitative Research Methods In Political Science Lecture 4 PDF
- Quantitative Research Methods In Political Science Lecture 5 (10/03/2024) PDF
- Quantitative Research Methods in Political Science Lecture 6 PDF
- Quantitative Research Methods in Political Science Lecture 3 PDF
- Quantitative Research Methods In Political Science Lecture 8 PDF
Summary
This lecture presents quantitative research methods in political science, focusing on measures of association for nominal and ordinal variables. It covers bivariate association and explores the relationship between job satisfaction and productivity of factory workers and the relationship between student club membership and academic achievement.
Full Transcript
Quantitative Research Methods in Political Science Lecture 8: Measures of Association for Nominal and Ordinal Variables Course Instructor: Michael E. Campbell Course Number: PSCI 2702 (A) Date: 11/07/2024 What are Measu...
Quantitative Research Methods in Political Science Lecture 8: Measures of Association for Nominal and Ordinal Variables Course Instructor: Michael E. Campbell Course Number: PSCI 2702 (A) Date: 11/07/2024 What are Measures of Association? Measures of association: “provide information about the strength and, where appropriate, the direction of relationships between variables in our data set – information that is more directly relevant for assessing the importance of relationships and testing the validity of our theories” (Healey, Donoghue, and Prus 2023, 250). Measures of association are the most powerful tool for documenting cause-and-effect relationships Can also help us make predictions – i.e., if variables are related, we can predict a score on one variable based on the score of another… Bivariate Association Two variables are associated if “the distribution of one of them changes under the various categories or scores of the other” (Healey, Donoghue, and Prus 2023, 252). For example, here we have a bivariate table of 173 factory workers… If we look at the columns, we can determine if association between the variables exists… The columns shows the pattern of scores on Y for each score on X Bivariate Association Cont’d This table displays the relationship between productivity and job satisfaction… Job Satisfaction = X Level of Productivity = Y Research Question: “Does job satisfaction effect the level of productivity?” Bivariate Association Cont’d If we look at the columns… 30 of 60 workers who reported “low” job satisfaction rank low on productivity 25 of the 61 who reported “moderate” job satisfaction rank moderate on productivity 27 of the 52 who reported “high” satisfaction rank high on productivity Bivariate Association Cont’d If we look at the columns… 30 of 60 workers who reported “low” job satisfaction rank low on productivity 25 of the 61 who reported “moderate” job satisfaction rank moderate on productivity 27 of the 52 who reported “high” satisfaction rank high on productivity Bivariate Association Cont’d If we look at the columns… 30 of 60 workers who reported “low” job satisfaction rank low on productivity 25 of the 61 who reported “moderate” job satisfaction rank moderate on productivity 27 of the 52 who reported “high” satisfaction rank high on productivity Bivariate Association Cont’d Looking across columns, we can see the effect of X on Y The within column frequencies are called “conditional distributions of Y” – because they display the distribution of scores on the dependent variable for each score on the independent variable So, from what we observe in the table, productivity and job satisfaction are associated The distributions of scores on Y (productivity) changes across the various conditions of X (satisfaction) To fully investigate the potential relationships between two variables, we need to ask: 1. Does association exist? Questions to Ask 2. If an association exists, how strong is it? 3. If association does exist, what are the pattern and/or the direction of the association? Does Association Exist? Association can be detected with chi square (any chi square score above 0.00 represents some level of association - but not necessarily statistical significance) We can also observe the “conditional distributions of Y” The best way to look at these distributions is to look at observed frequencies in percentages (not raw scores) Does Association Exist? Cont’d The best way to do this is to compute and compare column percentages (because it standardizes column totals on a scale of 100) This makes it easier to detect the conditional distributions of Y As we see here: 50% who ranked low on satisfaction, ranked low on productivity 40.98% who ranked moderate on satisfaction, ranked moderate on productivity 51.92% who ranked high on satisfaction, ranked high on productivity Does Association Exist? Cont’d If association did not exist, the conditional distributions of Y would not change across columns and the distribution of Y would be the same for each condition of X As you see here, the conditional distributions of Y are the same and we see that Looking for association between age group levels of productivity do not and productivity… change at all as the age group varies Perfect non-association is found How Strong is Association? Once we have established two variables are associated, we need to determine how strong the association is… Strength of association ranges from two extremes: 1. “Perfect non-association,” where conditional distributions of Y do not change at all (as seen on previous slide) 2. “Perfect association,” which occurs when each value of the dependent variable (Y) is All cases in each column are in a single associated with only one value of the independent variable (X) (see right) cell and there is no variation in Y for the given value of X How Strong Is Association? Cont’d When we find perfect association, it points towards a causal relationship – i.e., variation in one variable is responsible for variation in another We are unlikely to find perfect non-association or perfect association – we are more likely to find something between these two extremes This is why we use measures of association – they allow us to describe the strength of association between two variables with precision Strength of association can be described as: (1) strong, (2) moderate, (3) weak What is the Pattern and/or Direction of Association? The last thing we need to do when analyzing association is to determine which values or categories of one variable are associated with which values or categories of another… We see a pattern here 1. Low satisfaction associated with low productivity 2. Moderate satisfaction with moderate productivity 3. High satisfaction with high productivity This shows us directional association! What is the Pattern and/or Direction of Association? Associations between variables can be positive or negative… Positive association occurs when high scores on one variable are associated with high scores on the other (or when low scores are associated with low scores) In other words, when two variables move in the same direction it is a positive association – i.e., they will increase together or decrease together What is the Pattern and/or Direction of Association? Cont’d Conversely, a negative association occurs when the variables vary in opposite directions – i.e., high scores on one variable are associated with low scores on another (or vice-versa) If there is an increase in one variable and a decrease in another (or vice versa), then you have a negative relationship As we see here, as education increases, television viewership decreases… Measures of Association Measures of Association “characterize the strength (and for ordinal and interval- ratio variables, also the direction) of bivariate relationships in a single number” (Healey, Donoghue, and Prus 2023, 262). Measures of association will change depending on the variables level of measurement Today, we will look at: Phi and Cramer’s V for Nominal level variables Gamma, Somers’ d, Kendall’s tau-b, and Kendall’s tau-c for ordinal level data Note: if you have variables measured at different levels, convention states that you would use the MOA for the variable measured at the lowest level Chi Square When working with variables at nominal level, Based- researchers tend to use measures of association based on the value of chi square Measures If chi square is anything but zero, it shows that of there is some level of association between Association variables (but this does not mean it is statistically significant) The values of chi square can be transformed into other statistics that measure the strength of association between variables Last week, we looked at the likelihood of students in accredited programs finding employment as social workers after graduation We found a chi square value of 10.78 at alpha of 0.05 (indicating statistical significance) But knowing statistical significance tells us nothing about the strength of association… Chi Square-Based Measures of Association Cont’d We see here that the conditional distributions of Y (i.e., the conditional distributions of employment status) change as a function of X (i.e., accredited status) To assess the strength of association, we can compute Phi ()or Cramer’s V Chi Square-Based Measures of Association Cont’d Phi () Is a chi square based measure of association used for nominal variables It is appropriate for 2x2 tables The formula for Phi is: In this equation: = the chi square value n = the sample size Phi Cont’d Phi ranges from 0.00 (no association) to 1.00 (perfect association) The closer to 1.00, the stronger the relationship Therefore, With a chi square of 10.78 and a And with this, you now know the sample size of 100, our strength of association between the calculation would look like this… variables… Paired with observation of the column percentages, you can discern the direction of the relationship as well… Cramer’s V Cramer’s V is used for tables larger than 2x2 (i.e., tables with more than two columns and/or two rows) If we use Phi with tables larger than 2x2, it can exceed 1.00 (making interpretation difficult) The formula for Cramer’s V is: In this equation: = the chi square value n = the sample size = the minimum value of rows r – 1 (number of rows minus 1) and c – 1 (number of columns minus 1) Cramer’s V Cont’d In this example, we’re looking at the relationship between membership in student clubs and academic achievement (n = 75) There is an obtained Chi Square of 32.14, which is significant at 0.05 level Cramer’s V Cont’d We have a Cramer’s V score of 0.46 – indicating the strength of association Moreover, we can look at the column percentages to get a better idea of what’s going on… We see that members: Members of sports clubs have moderate academic achievement Members of non-sports clubs have high academic achievement Members of no clubs have low academic achievement Interpreting Strength If Phi () or Cramer’s V ranges from 0.00 to 0.10 – the association is weak If Phi () or Cramer’s V ranges from 0.11 to 0.30 – the association is moderate If Phi () or Cramer’s V are greater than 0.30 – the association is strong Measures Two types of ordinal variables: of 1. Continuous Ordinal Variables: ordinal variables with many scores that resemble interval-ratio level variables (e.g., a feeling thermometer) Association 2. Collapsed Ordinal Variables: ordinal level variables with just a few categories (no more than five or six) for Ordinal values or scores Level Today, we are interested in measures of association for Collapsed Ordinal Variables Variables For collapsed ordinal level variables, we use the following Measures of Association: 1. Gamma (G) 2. Somers’ d () 3. Kendall’s tau-b () 4. Kendall’s tau-c () Measures of Association Remember, we need to answer three questions for Ordinal when using measures of association: Level 1. Does association exist? Variables Cont’d 2. If an association does exist, how strong is it? 3. If association does exist, what are the pattern and/or the direction of the association? The Logic of Pairs Gamma, Somers’ d, Kendall’s tau-b, and Kendall’s tau-c all measure the strength and direction of association They do this by comparing each respondent to every other respondent These comparisons are called “pairs” – because they examine respondents rankings on both X and Y To find the total number of unique pairs, we use the following formula… Total number of unique pairs = The Logic of Pairs Cont’d Pairs can be divided into five sub-groups: 1. Similar Pair: a pair of respondents is similar if the respondent with the larger value on the independent variable also has a larger value on the dependent variable 2. Dissimilar Pair: a pair is dissimilar if the respondent with the larger value on the independent variable has a smaller value on the dependent variable 3. Tied on the IV (X): a pair is tied on the independent variable if both respondents have the same score on the independent variable but not the dependent variable 4. Tied on DV (Y): a pair is tied on the dependent variable if both respondents have the same score on the dependent variable but not the dependent variable 5. Tied on both variables: a pair is tied on both variables if they have the same independent and dependent variable scores The Logic of Pairs Cont’d Let’s say we are trying to determine a cause of burnout among elementary school teachers… Each variable has three response categories: We hypothesize that years of 1. 1.00 coded as low service is a potential cause of burnout 2. 2.00 coded as moderate 3. 3.00 coded as high We therefore have two variables: 1. Years of Service (X) 2. Level of Burnout (Y) The Logic of Pairs Cont’d Sample Size (n) = 5 The Logic of Pairs Cont’d To find the number of unique pairs, we use our formula… Total number of unique pairs = Total number of unique pairs = Total number of unique pairs = 10 We have 10 unique pairs The Logic of Pairs Cont’d If we examine the data closer, we can see the types of pairs we have… The Logic of Pairs Cont’d The Logic of Pairs Cont’d 1. Steven is ranked above Camil on Length of Service, and he is also ranked above Camil on burnout Therefore Steven-Camil is a Similar Pair 2. Joesph and Steven are ranked the same on the DV, but not the IV Therefore, Joesph-Steven are Tied on the Dependent Variable 3. Camil and Joseph have the same score on the IV, but not the DV Therefore, Camil-Joseph are Tied on the Independent Variable With knowledge of these pairs, we can compute our measures of 4. Karina and Steven have the same score on association… each variable Therefore, Karina-Steven are Tied on Analyzing Measures of Association for Ordinal Level Variables Gamma, Somers’ d, tau-b, and tau-c all “measure the strength of association between variables by considering the number of similar versus dissimilar pairs” (Healey, Donoghue, and Prus 2023, 268). Where these measures differ is how they treat tied pairs When the number of similar and dissimilar pairs is equal, the value of these statistics will be 0.00 (no association between X and Y) However, as the number of similar pairs increases relative dissimilar pairs (or vice-versa), the value of the statistic will approach 1.00 As the statistics approach 1.00, the strength of association increases When every pair is similar or dissimilar, the value of these statistics will be 1.00 (perfect association between X and Y) Interpreting Strength for Ordinal Level Measures Interpreting strength of association for Gamma, Somers’ d, tau-b, and tau-c is much like it was for Phi and Cramer’s V However… Direction of Relationship When we use measures of association for nominal level variables (phi or Cramer’s V), they only measure the strength of association They tell us nothing about directionality Measures of association for ordinal level variables are more sophisticated and can also tell us about the direction of the relationship (i.e., positive or negative) Direction of Relationship Cont’d Gamma, Somers’ d, tau-b, and tau-c have scores that range from -1.00 to +1.00 To determine directionality, look at the sign (+ or -) A plus (+) sign tells you it is a positive relationship (i.e., both variables are increasing or decreasing in value together) A minus sign (-) tells you that it is a negative relationship (i.e., as one variable increases, the other decreases, or vice-versa) Positive Versus Negative Relationships Visualized Above is a positive relationship Above is a negative relationship As values on X increase, values on Y To be a negative relationship as values on increase one variable increase, values on the other would decrease (or vice-versa) It would still be positive if values on X decreased and values on Y decreased A negative relationship occurs when variables move in different directions as A positive relationship means the variables they vary are moving in the same direction as they vary… Positive Versus Negative Relationships Visualized If tables are constructed with column variables increasing from left to right and the row variable increasing from top to bottom (as is convention) you can see the direction of the relationship by looking at the table A positive relationship will present itself with scores falling along a diagonal from upper left to lower right A negative relationship will present itself with a diagonal line moving from the upper right to the bottom left Gamma Gamma is the number of similar pairs as a proportion of all pars excluding ties… The formula for Gamma is: In this equation: = number of pairs of respondents ranked the same on both variables – i.e., similar pairs = number of pairs of respondents ranked the same on both variables – i.e., dissimilar pairs Ranges from -1.00 to +1.00 0.00 represents “No relationship” Gamma Cont’d Taking our previous example about years of service and burnout, we see there are 6 similar pairs and no dissimilar pairs Gamma Cont’d G = +1.00 This tells us it is a perfect, positive relationship… As the number of years worked increases, the level of burnout also increases But is it a perfect association? Remember, Gamma ignores tied pairs! The problem with Gamma is that it ignores tied pairs… This means Gamma can exaggerate the strength of association between two ordinal variables Somers’ d, Tau-b, and Tau-c were developed to Limitations correct for this of Gamma These are extensions of Gamma – so the numerator remain the same (i.e., ) However, the denominators are more complex Whereas tau-b and tau-c include in their calculations pairs tied on the IV and DV, Somers’ d includes only pairs tied on the DV Kendall’s Tau-b The formula for tau-b is: = In this equation: = the number of similar pairs = the number of dissimilar pairs = the number of ties pairs on the independent variable (X) = the number of pairs tied on the dependent variable (Y) Kendall’s Tau-b Cont’d In our example, there are 6 similar pairs and 0 dissimilar pairs There is 1 pair tied on the IV and 2 pairs tied on the DV Therefore… = = = +.80 +.80 indicates a strong and positive relationship between years worked and burnout Limitations of Tau-b Tau-b can only reach a value of ±1.00 when the IV and DV have the same number of categories This means your table should have the same number of rows and columns (see right)… But what if we had a 2x3 table? Tau-c Tau-c is designed so that it can reach a maximum of 1.00, even if the number of categories on the variables is unequal… The formula for tau-c is: In this equation: = the number of similar pairs = the number of dissimilar pairs = the number of respondents = the minimum value of the number of categories on the independent variable and the number of categories on the dependent variable Tau-c Cont’d Ranges from -1.00 to +1.00 0.00 represents “no relationship” -1.00 represents a perfect negative relationship +1.00 represents a perfect positive relationship Although the table in our example (i.e., burnout by years worked) has an equal number of categories on each variable, we can nonetheless compute tau-c Tau-c Cont’d To calculate this, we take our six similar pairs and a minimum value of 3 (the value is 3 because both variables have 3 categories) If the number of categories was lower on one variable, you would use that… The most important thing to remember with tau-c is that you generally use it This indicates a strong and when the number of response categories on your variables is unequal positive relationship between (you can spot this easily by looking if years worked and burnout your table has an unequal number of rows and columns) Somers’ d The formula for Somers’ d is: In this equation: = number of similar pairs = number of dissimilar pairs = number of pairs tied on the dependent variable (Y) Somers’ d Cont’d Ranges from -1.00 (perfect negative association) to +1.00 (perfect positive association) What makes Somers’ d special is that it is an asymmetric measure This means you can compute two different scores You can compute , where Y is treated as the dependent variable You can also compute , where X is treated as the dependent variable Therefore, Somers’ d will provide different predictive power between two variables depending on which variable you treat as X and which as Y Somers’ d Cont’d If we take our previous example, we see that there are 6 similar pairs, and 2 pairs tied on Y (i.e., level of burnout) Somers’ d Cont’d With , we see that there is a strong and positive relationship between our variables… As years worked increases, burnout also increases Like tau-b and tau-c the statistic is smaller than Gamma – this is because it accounts for tied pairs on the DV, whereas Gamma does not consider tied pairs at all