Document Details

WellPositionedAcademicArt

Uploaded by WellPositionedAcademicArt

Tags

statistical interaction regression analysis dummy variables econometrics

Summary

This lecture covers the concept of statistical interaction in regression analysis, focusing on how the effect of one predictor variable on the dependent variable depends on the value of another predictor variable. The lecture explains how to include interaction terms in regression models and how to interpret them.

Full Transcript

What if you have a variable in your model in addition to your dummies? How do we interpret the output? Still the same way, only you can no longer interpret the constant/intercept as being the average value of the reference category. It is now the average value of the reference category for age = 0....

What if you have a variable in your model in addition to your dummies? How do we interpret the output? Still the same way, only you can no longer interpret the constant/intercept as being the average value of the reference category. It is now the average value of the reference category for age = 0. In the model above, age has been added as a control variable. So, when keeping the effect of age constant, skateboarders score on average 4.656 higher on the pain threshold scale compared to the reference category ”Non-athlete ". What is the average pain threshold of a 42-year-old skateboarder? 𝑦 = 𝑎 + 𝑏 𝑥 + 𝑏 𝑥 +𝑏 𝑥 + 𝑏 𝑥 Where: 𝑎 = 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡, 𝑥 = 𝑠𝑘𝑎𝑡𝑒𝑏𝑜𝑎𝑟𝑑𝑒𝑟, 𝑥 = 𝐵𝑀𝑋𝑒𝑟, 𝑥 = 𝐼𝑛𝑙𝑖𝑛𝑒𝑟, 𝑥 = 𝑎𝑔𝑒 𝑦 = 4.411 +(4.656*1)+(3.421*0)+(2.257*0)+(-0.042*42)= 7.303 Summary: Dummy Variables in Regression Dummy variables are used for categorical/nominal independent variables. Dummy variables are dichotomous. You always include one dummy variable less in your model than you can create based on the original categorical/nominal variable. The dummy variable that you do not include in your model is the reference category. All dummy variables included in the model are to be interpreted in reference to the reference category. In a model with only one dummified independent variable, the constant/intercept is the value of the reference category. Statistical interaction The effect of an independent variable on the dependent variable may be influenced by another independent variable. For example: Certain industries (e.g., tech, finance) offer higher returns on education compared to others (e.g., retail, transportation). Industry of employment (X2) Education (X1) Income (Y) Interaction between: X1 (Education continuous variable in years) X2 (industry  dummy variable: 1 if the respondent employed in a knowledge-intensive industry, and 0 if they are employed in a labor-intensive industry) 8 Example 1 : Education and income (industry as a control variable) Education (X1) Income (Y) Industry (X2) The effect of a one-step increase on the education scale is the same for all values of the variable ‘industry’. I.e., we get a set of parallel 'partial' regression lines for the two values of ‘industry’ for both knowledge-intensive industries and labor-intensive industries the effect of education on income is the same. 9 Example 2 : Education and income (industry as a moderator/interaction variable) Industry (X2) Education (X1) Income (Y) The effect of a one-step increase on the educational scale is not equal across the different values of the variable ‘industry’. I.e., we have a set of non-parallel 'partial' regression lines for the two values of ‘industry'. labor-intensive when interaction is considered, the effect of education on income is NOT the same for knowledge-intensive industries and labor-intensive industries. 10 How do we display an interaction effect in prediction equation? To see if the effect of levels of education on income differs between males and females, we add a so-called interaction term of educational level (continuous var) and industry type (dummy) to the regression equation. 𝑦 = 𝑎 + 𝑏 𝑥 + 𝑏 𝑥 +𝑏 𝑥1𝑥 An interaction term of educational level and industry is the product of the two variables. You create this term by multiplying the values of 𝑥 and 𝑥 for each case (in SPSS). interaction term = (X )(xz) , 11 How do we interpret a model with interaction term? 𝑦 = 𝑎 + 𝑏 𝑥 + 𝑏 𝑥 +𝑏 𝑥1𝑥 O Main or simple effects: sig Interaction effect Effect at a specific value ↳ edu (109 339) when. Whendu o= 12 industry 0 /indus (2119 294) - =.. How do we interpret a model with interaction term? When estimating a model with an interaction term, we should pay attention to two things: 1) the p-value for the interaction effect. If it is not significant, it would make sense to drop the interaction term from the model and return to a model without interaction. 2) The coefficients of main effects have a special meaning. ↳ think dummy variables , constant ref cat. & O =. 13 How do we interpret a model with interaction term? 𝑦 = 𝑎 + 𝑏 𝑥 + 𝑏 𝑥 +𝑏 𝑥1𝑥 14 Interpretation – Effect of education Remember the visualization challenge in multiple regression? We had to assign specific values to 𝑥 ​to visualize the regression line for 𝑥 ​. Similarly, in an interaction model, we assign values to 𝑥 to understand the effect of 𝑥 ​. Key Difference: In non-interaction models, assigning values to 𝑥 only affects the intercept (creating parallel lines). In interaction models, assigning values to 𝑥 impacts both the intercept and the slope of the line, leading to different effects of 𝑥 depending on the value of 𝑥 ​. Let’s start by assigning 𝑥 = 0 (industry =0) and look at the model: 𝑖𝑛𝑐𝑜𝑚𝑒 = 888.1 + 109.3 ∗ 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 + 2119.3 ∗ 0 + 99.3 ∗ 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 ∗ 0 𝑖𝑛𝑐𝑜𝑚𝑒 = 888.1 + 109.3 ∗ 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 16 Interpretation – Effect of education 𝑖𝑛𝑐𝑜𝑚𝑒 = 888.1 + 109.3 ∗ 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 The value 109.3 represents the slope of the regression line for individuals employed in labor - intensive industries (i.e., when industry = 0). This can also be interpreted as the effect of education on income for individuals in labor-intensive industries. While this is often referred to as the ‘main or simple effect' of education, it's important to note that these terms may not be entirely appropriate. This is because, in an interaction model, there is no inherently main or complicated effect. Instead, this value represents the effect of education when industry = 0, which is just one specific condition within the model. edu represents in come of labour-intensive , no. constant :.) income = 888. 1 + 109 3 (edu) + 2119 3.. (ind) + 99 3 (edvind. 17 888 1. + 109 3(0 + 2119 5(0)+ 99... 5(0) Interpretation – Effect of education Let’s assign 𝑥 = 1 (industry =1) and look at the model: 𝑖𝑛𝑐𝑜𝑚𝑒 = 888.1 + 109.3 ∗ 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 + 2119.3 ∗ 1 + 99.3 ∗ 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 ∗ 1 𝑖𝑛𝑐𝑜𝑚𝑒 = 888.1 + 2119.3 + (109.3 + 99.3) ∗ 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 𝑖𝑛𝑐𝑜𝑚𝑒 = 3007.4 + 208.6 ∗ 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 The value 208.6 represents the slope of the regression line for individuals employed in knowledge-intensive industries (i.e., when industry = 1). 18 Interpretation – Effect of education More generally: income = 𝑎 + 𝑏 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 + 𝑏 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 + 𝑏 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 ∗ 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 If industry = 0; income = 𝑎 + 𝑏 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 If industry = 1; income = 𝑎 + 𝑏 + (𝑏 +𝑏 ) 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 19 Plotting the interaction 20 Interpretation – Effect of industry Let’s assign 𝑥 = 0 (education = 0) and look at the model: 𝑖𝑛𝑐𝑜𝑚𝑒 = 888.1 + 109.3 ∗ 0 + 2119.3 ∗ 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 + 99.3 ∗ 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 ∗ 0 𝑖𝑛𝑐𝑜𝑚𝑒 = 888.1 + 2119.3 ∗ 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 The value 2119.3 represents the slope of the regression line for individuals with no education (i.e., when education = 0). This indicates the effect of industry on income for individuals without education, meaning their salary is expected to be 2119.3 euros higher in knowledge- intensive industries compared to labor-intensive industries. 21 Interpretation – Effect of industry Let’s assign 𝑥 = 1 (education =1) and look at the model: 𝑖𝑛𝑐𝑜𝑚𝑒 = 888.1 + 109.3 ∗ 1 + 2119.3 ∗ 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 + 99.3 ∗ 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 ∗ 1 𝑖𝑛𝑐𝑜𝑚𝑒 = 997.4 + 2218.6 ∗ 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 The value 2218.6 represents the effect of industry for individuals with 1 year of education (i.e., when education = 1). This means that among people with 1 year of education, their salary is expected to be 2218.6 euros higher in knowledge- intensive industries compared to labor-intensive industries. 22 Interpretation – Effect of industry More generally: income = 𝑎 + 𝑏 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 + 𝑏 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 + 𝑏 𝑒𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 ∗ 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 If education = 0; income = 𝑎 + 𝑏 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 If education = 1; income = 𝑎 + 𝑏 + (𝑏 +𝑏 ) 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 If education = 2; income = 𝑎 + 𝑏 + (𝑏 +2𝑏 ) 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 If education = 3; income = 𝑎 + 𝑏 + (𝑏 +3𝑏 ) 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 23 Plotting the interaction 24 Plotting the means 25 In short… An interaction effect can be interpreted in the context of any of the main effects that comprise it. The effect of education on income differs between people who work in knowledge-intensive industries and those who work in labor-intensive industries, or the income difference between these two industries varies across different levels of education. The choice of the main effect should be guided by the research question formulated at the outset. In this example, we were interested in how the effect of education on income is influenced by industry. Therefore, we would ideally choose education as the main effect for interpreting the interaction effect. 26 Centering the explanatory variables Main effects in an interaction may not be meaningful, especially for continuous variables. They represent the effect of one predictor when the other predictor equals zero (e.g., 0 years of education). This can lead to misleading interpretations. Solution: Center the Variables Centering means subtracting the mean of a variable so that its mean becomes zero. Benefits of centering: Main effects reflect the relationship between predictors at their mean values. Easier and more meaningful interpretation of coefficients. Next Week: A deeper dive into centering in interaction models! 27 Example: Immigrant status and income (industry as a moderator) We expect that the effect of being an immigrant on income differ across knowledge-intensive and labor-intensive industries: Industry (X2) Immigrant (X1) Income (Y) Interaction between: X1 (immigrant)dummy/binary/dichotomous variable) and X2 (industry (ref:labor-intensive) dummy/binary/dichotomous variable) 28 Example: Immigrant status and income (industry as a moderator) Still 4 % chance Is the interaction term significant? ~ that Ho is true Yes. (p is 0.043 smaller than 0.05 (conventional alpha level) 1 = immigrant # industry intensive 0 non-immigrant O not intensive = : 29 Example: Immigrant status and income (industry as a moderator) 𝑖𝑛𝑐𝑜𝑚𝑒 = 2150 + 376.7 ∗ 𝑖𝑚𝑚𝑖𝑔𝑟𝑎𝑛𝑡 + 4152.4 ∗ 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 − 967.6 ∗ 𝑖𝑚𝑚𝑖𝑔𝑟𝑎𝑛𝑡 ∗ 𝑖𝑛𝑑𝑠𝑢𝑡𝑟𝑦 -EXAM : write w/ variabA 30 Example: Immigrant status and income (industry as a moderator) I𝑛𝑐𝑜𝑚𝑒 = 2150 + 376.7 ∗ 𝑖𝑚𝑚𝑖𝑔𝑟𝑎𝑛𝑡 + 4152.4 ∗ 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 − 967.6 ∗ 𝑖𝑚𝑚𝑖𝑔𝑟𝑎𝑛𝑡 ∗ 𝑖𝑛𝑑𝑢𝑠𝑡𝑟𝑦 Constant: 2150 By filling in the value 0 for all variables: industry = 0 and immigrant = 0. In other words, the value of the intercept is the average income for non-immigrants in labor-intensive industries. Main effect immigrant: (b coeff ) ,. In labor-intensive industries, immigrants have a salary 376.7 higher than non-immigrants. 𝑖𝑛𝑐𝑜𝑚𝑒 = 2150 + 376.7 ∗ 𝑖𝑚𝑚𝑖𝑔𝑟𝑎𝑛𝑡 Interaction effect: In knowledge-intensive industries, immigrants have a salary 590.9 lower than non-immigrants. 𝑖𝑛𝑐𝑜𝑚𝑒 = 2150 + 376.7 ∗ 𝑖𝑚𝑚𝑖𝑔𝑟𝑎𝑛𝑡 + 4152.4 ∗ 1 − 967.6 ∗ 𝑖𝑚𝑚𝑖𝑔𝑟𝑎𝑛𝑡 ∗ 1 = 6302.4 + (-590.9) * immigrant 31 Example: Immigrant status and income (industry as a moderator) main effect : interaction effect : 𝑖𝑛𝑐𝑜𝑚𝑒 = 2150 + 376.7 ∗ 𝑖𝑚𝑚𝑖𝑔𝑟𝑎𝑛𝑡 𝑖𝑛𝑐𝑜𝑚𝑒 = 6302.4 + (-590.9) * immigrant 32

Use Quizgecko on...
Browser
Browser