Lecture 2 Research Methods Applied Empirical Economics PDF
Document Details
Leiden University
2024
Egbert Jongen
Tags
Summary
This lecture introduces randomized controlled trials (RCTs) and regression analysis in applied economics, focusing on understanding selection bias and the use of control variables. The content also includes examples to demonstrate the application of these methodologies.
Full Transcript
Research Methods: Applied Empirical Economics Lecture 2: Randomized Controlled Trials and Regression Egbert Jongen | Leiden University September 2024 1 Lecture goals Understand selection bias and how randomization eliminates selection bias...
Research Methods: Applied Empirical Economics Lecture 2: Randomized Controlled Trials and Regression Egbert Jongen | Leiden University September 2024 1 Lecture goals Understand selection bias and how randomization eliminates selection bias Become familiar with notation and some helpful statistics Know the main elements of a regression equation and the meaning of OLS Know the formula for omitted variable bias 2 Outline of the lecture Randomized controlled trials Break Regression Questions or comments 3 Randomized controlled trials Randomized controlled trials (RCTs) are often considered the gold standard for measuring the causal effect of a `treatment’ Randomized trials get rid of selection bias First we look at a video and then go over an important RCT from the book: the RAND experiment Introduce concepts and notation afterwards 4 Introduction to randomized trials 5 Another example: The Rand experiment 6 Health outcomes for insured and uninsured: no randomization 7 Health outcomes for insured and uninsured: no randomization Some health insurance No health insurance 8 Health outcomes for insured and uninsured: no randomization Some health insurance No health insurance Healthier with some insurance 9 Health outcomes for insured and uninsured: no randomization Some health insurance No health insurance Healthier with some insurance But apples and oranges 10 Health outcomes with randomization: demographics 11 Health outcomes with randomization: demographics Limited Generous insurance insurance 12 Health outcomes with randomization: demographics Limited Generous insurance insurance Apples and apples 13 Health outcomes with randomization: before treatment Limited Generous insurance insurance Apples and apples (well …) 14 Ancient wisdom (Heller – Catch 22) Just because you’re not paranoid doesn’t mean they are not after you! 15 Health outcomes with randomization: use of health care +45%! 16 Health outcomes with randomization: health indicators ≈ 0! 17 Notation 18 Introducing some notation Suppose there are two potential outcomes Y for individual i: - Y1i health status with vaccination - Y0i health status without vaccination Causal effect of vaccination: Y1i – Y0i Fundamental problem: only one of the two paths is observed 19 What do we measure? Suppose that we subtract the average in outcomes for non- vaccinated from the vaccinated, do we get a causal effect of vaccination? 20 What do we measure? Suppose that we subtract the average in outcomes for non- vaccinated from the vaccinated, do we get a causal effect of vaccination? Difference in group means = Avgvac[Y1i|Di=1] – Avgnon-vac[Yoi|Di=o] 21 What do we measure? Suppose that we subtract the average in outcomes for non- vaccinated from the vaccinated, do we get a causal effect of vaccination? Difference in group means = Avgvac[Y1i|Di=1] – Avgnon-vac[Yoi|Di=o] Average over individuals Average over individuals that are vaccinated that are not vaccinated 22 What do we measure? Suppose that we subtract the average in outcomes for non- vaccinated from the vaccinated, do we get a causal effect of vaccination? Difference in group means = Avgvac[Y1i|Di=1] – Avgnon-vac[Yoi|Di=o] Health outcome Health outcome with vaccination without vaccination 23 What do we measure? Suppose that we subtract the average in outcomes for non- vaccinated from the vaccinated, do we get a causal effect of vaccination? Difference in group means = Avgvac[Y1i|Di=1] – Avgnon-vac[Yoi|Di=o] Individual had Individual did not vaccination have vaccination 24 What do we measure? Suppose that we subtract the average in outcomes for non- vaccinated from the vaccinated, do we get a causal effect of vaccination? Difference in group means = Avgvac[Y1i|Di=1] – Avgnon-vac[Yoi|Di=o] Suppose that treatment has the same effect κ for everybody: Y1i – Y0i = κ 25 Selection bias We measure: Difference in group means = Avgvac[Y1i|Di=1] – Avgnon-vac[Yoi|Di=o] Use Y1i – Y0i = κ = Avgvac[Y0i+ κ|Di=1] – Avgnon-vac[Yoi|Di=o] Use κ is the same for everybody = Avgvac[Y0i|Di=1] + κ – Avgnon-vac[Yoi|Di=o] Reordering = κ + Avgvac[Y0i|Di=1] – Avgnon-vac[Yoi|Di=o] causal effect selection bias 26 Randomization eliminates selection bias The difference in group means captures the causal effect if: Avgvac[Y0i|Di=1] = Avgnon-vac[Yoi|Di=o] This is exactly what randomization does! (and the other methods try to mimic) 27 Some useful statistics 28 Some useful statistics Estimated treatment coefficient 29 Some useful statistics Estimated standard error 30 Some useful statistics t-value = estimated coefficient estimated standard error = 285/72 ≈ 4 31 Some useful statistics t-value 2: Reject null hypothesis of no treatment effect 32 Some useful statistics 95% confidence interval: [coefficient – 2*SE, coefficient + 2*SE] 33 Break! 34 Regression 35 Regression RCT not always possible Ceteris probably not paribus for treatment and control group Regression allows us to control for observable differences Can be powerful if we can control for key differences Also ‘foundational’ for other methods 36 Basic regression equation Yi = α + βQi + γAi + εi 37 Variables Yi = α + βQi + γAi + εi Dependent Treatment Control variable variable variables 38 Parameters Yi = α + βQi + γAi + εi Intercept Treatment Effect control effect variable 39 Error term Yi = α + βQi + γAi + εi Error term 40 How to find the parameters? Illustration 41 How to find the parameters? OLS! To find the parameters that give the best fit we perform OLS OLS = ordinary least squares - `Ordinary’ because we do not weigh the observations - `Least’ because we minimize - `Squares’ are the prediction errors squared OLS finds the parameters that minimizes the sum of the prediction errors squared 42 Why it is important to add controls Income Y School quality Q 43 Why it is important to add controls Regression line: ^ ^ ^ Yi = α + βQi Income Y School quality Q 44 Why it is important to add controls ^ γ Higher ability students (Ai = 1) Income Y Regression line: ^ ^ ^ Yi = α + βQi + ^γAi Lower ability students (Ai = 0) Much smaller School quality Q 45 An example: Private vs. public schools 46 Return of attending a private `school’ (university) 47 Return of attending a private `school’ (university) Big effect with dissimilar groups 48 Return of attending a private `school’ (university) Small effect with similar groups 49 Omitted variable bias 50 Omitted variable bias There will always be omitted variables - Not a problem per se - Residuals will be bigger (but still have expectation zero) - Estimate of treatment effect less precise, but still unbiased But there will be omitted variable bias when: 1. Omitted variable is correlated with treatment variable, and 2. Omitted variable has a direct effect on the dependent variable 51 OVB: OV needs to be correlated with treatment variable High ability students Income Y more likely to go to high quality school (0,0) School quality Q 52 OVB: OV needs to be correlated with dependent variable High ability students more likely to earn higher income Income Y for a given school quality (0,0) School quality Q 53 Omitted variables bias formula “Long regression”: Yi = αl + βlQi + γAi + εli “Short regression”: Yi = αs + βsQi + εsi Effect of quality of school Qi on income Yi in short regression = effect of Qi on Yi in long regression true effect + (relationship between ability Ai and Qi) * (effect of Ai in long regression) 54 Omitted variables bias formula “Long regression”: Yi = αl + βlQi + γAi + εli “Short regression”: Yi = αs + βsQi + εsi Effect of Qi (quality of school) on Yi income in short regression = effect of Qi on Yi in long regression + (relationship between Ai (ability) and Qi) X (effect of Ai in long regression) omitted variable bias 55 Omitted variables bias formula Suppose: Ai = π0 + π1Qi + ui Where π1 = difference in probability that a high ability student goes to a good university with a low ability student Then: Yi = αl + βlQi + γAi + εli Fill in Ai from above Yi = αl + βlQi + γ(π0 + π1Qi + ui) + εli Reordering Yi = (αl+γπ0) + (βl+ γπ1)Qi + (γ ui + εli) 56 Omitted variables bias formula Then: Yi = αl + βlQi + γAi + εli Yi = αl + βlQi + γ(π0 + π1Qi + ui) + εli Yi = (αl+γπ0) + (βl+ γπ1)Qi + (γ ui + εli) So: αs = αl+γπ0 βs = βl+ γπ1 εsi = γ ui + εli 57 Omitted variables bias formula Then: Yi = αl + βlQi + γAi + εli Yi = αl + βlQi + γ(π0 + π1Qi + ui) + εli Yi = (αl+γπ0) + (βl+ γπ1)Qi + (γ ui + εli) So: αs = αl+γπ0 βs = βl+ γπ1 ‘Short regression’ has omitted variable bias εsi = γ ui + εli 58 Some final thoughts 59 Finally: use appropriate care when interpreting the results We can only reject hypotheses (Popper) Statistical significance ≠ economic significance What is the mechanism at work? 60 Questions or comments? 61 Next time: Instrumental variables! 62