ECB3AMT Applied Microeconometric Methods PDF
Document Details
Uploaded by Deleted User
Jacopo Mazza
Tags
Summary
This presentation covers applied microeconometric methods, focusing on regression and causality. It explores topics such as correlation versus causality, the Rubin causal model, endogeneity, policy evaluation, and the fundamental problem of causal inference.
Full Transcript
ECB3AMT Applied Microeconometric Methods 1. Regression Dr. Jacopo Mazza Content of this Chapter 1. Correlation versus causality 2. Why should we care about causality? 3. The Rubin causal model 4. Endogeneity in the regression model 5. Example Topic 1 Regression...
ECB3AMT Applied Microeconometric Methods 1. Regression Dr. Jacopo Mazza Content of this Chapter 1. Correlation versus causality 2. Why should we care about causality? 3. The Rubin causal model 4. Endogeneity in the regression model 5. Example Topic 1 Regression 2 Correlation does not imply causality Source: https://www.tylervigen.com/spurious- correlations Topic 1 Regression 3 Correlation does not imply causality Source: https://www.tylervigen.com/spurious- correlations Topic 1 Regression 4 Lack of correlation does not imply lack of a causal effect Since June 1 2020 face masks mandatory in public transport in NL Source: RIVM figures for Utrecht (region) No apparent change in Covid-19 cases, actually even an increase in the autumn of 2020 Can we conclude that face masks had no effect? No! We do not know what would have happened had there been no rule to wear masks In fact, recent research suggests large effects of face masks (IZA Discussion Topic 1 Regression 5 Paper No. 13319) Lack of correlation does not imply lack of a causal effect No clear correlation between vaccination rates and infection numbers (partly positive, partly negative) Can we conclude that vaccinations had no effect? No! We do not know what would have happened had there been no vaccinations Studies do show that vaccinations are effective. It’s just that other things happen Topic 1 Regression 6 simultaneously. Example 1: Reverse Causality Europeans in the middle ages believed lice to improve health Why? They observed that sick people do not have lice, whereas healthy no peoplesick do lice However, causality is reversed: lice are sensitive to body temperature no and leave fever sick hosts lice Topic 1 Regression 7 Example 2: Selection and omitted variables Health status of people who have (not) been in hospital in the past 12 months Group Group Size Average Health Status In hospital 7,774 3.21 (0.014) Not in hospital 90,049 3.39 (0.003) Difference 0.72 (0.012) National Health Interview Survey (NHIS), 2005. Health status from 1 „bad“ to 5 „excellent“. Standard error in parentheses. Do hospitals make people sick? E.g. due to germs, etc.? Alternative explanation: people who go to the hospital are different Selection bias Omitted variable bias Topic 1 Regression 8 Causal relationships X affects Y X Y Reverse causality: Y affectsX X Y Simultaneity: X affects Y and Y affects X X Y Omitted Variable O affects both X and X Y Y O Topic 1 Regression 9 Definitions Reverse causality: Reverse causality means that X and Y are associated, but not in the way you would expect. Instead of X causing a change in Y, it is really the other way around: Y is causing changes in X. Simultaneity: Simultaneity is where the explanatory variable is jointly determined with the dependent variable. In other words, X causes Y but Y also causes X. Omitted variable bias: An omitted variable bias occurs when a variable X affects both the treatment A and the outcome variable B but is not (adequately) taken into account. Selection bias: Selection bias occurs when the subjects who select or who are selected into treatment differ from the subjects who don‘t. Topic 1 Regression 10 Next Topic 1. Correlation versus causality 2. Why should we care about causality? 3. The Rubin causal model 4. Endogeneity in the regression model 5. Example Topic 1 Regression 11 Many questions in economics and business are about causal relationships What is the effect of prices on sales? How does a marketing campaign affect sales? How do active labor market policies affect individuals‘ outcomes? How does trade with China affect the Dutch labor market? Do robots destroy jobs? What is the effect of the COVID-19 measures? Topic 1 Regression 12 Causal Terminology Correlation Causality Relationship between X X causes Y and Y X affects Y, the effect of X Association between X on Y and Y X raises/reduces Y Prediction of Y etc. etc. Make sure to use precise terminology in statistical analyses! If you use causal terminology in the Data Science Lab course or your Bachelor Thesis for analyses that are not causal, you make a mistake! Topic 1 Regression 13 Policy Evaluation What is policy evaluation? Systematic assessment of the change in an outcome variable that can be ascribed to a specific policy measure/intervention. Why does it matter? 1. Actual effects of measures / interventions => evidence- based policy 2. Allocation of (financial) resources 3. Accountability of decision makers Topic 1 Regression 14 Policy Evaluation Key element of any evaluation: construction of an adequate counterfactual situation “What would have happened in the absence of the intervention?” The counterfactual situation is unobservable => the problem of evaluation Key aim: isolating (identifying) the causal effect of an intervention How? Comparing the actual situation to the counterfactual situation Steps 1. Defining the unit of observation 2. Defining the outcome variable 3. Selecting an evaluation parameter 4. Selecting an evaluation strategy (identification strategy) Topic 1 Regression 15 Steps 1. Defining the unit of observation e.g. individuals (health status), regions (regional policy), firms (investment subsidies), households, schools, … 2. Defining the outcome variable Measure for assessing the success, e.g. employment status, wage, profits, sales, etc. 3. Selecting the evaluation parameter Average Treatment Effect (ATE) Average Treatment Effect on the Treated (ATT) Local Average Treatment Effect (LATE) Intention to Treat Effect (ITT) … Depends on the research question and the unit of observation Topic 1 Regression 16 Steps 4. Selecting an evaluation strategy (identification strategy) Constructing the counterfactual situation using a suitable evaluation method and credible identifying assumptions (= identification strategy) Requires an identifying assumption (no identification without assumption!) A parameter is identified if it converges to the true population parameter as the sample size increases. Cannot be tested Frequently used evaluation strategies Controlled randomized experiments Instrumental Variables (IV) Regression Discontinuity Design (RDD) Differences-in-Differences (DiD) [Panel-econometric methods with fixed effects (FE)] [Matching, Matching with Differences-in-Differences] [Synthetic Control Group Designs, Synthetic Differences-in-Differences] Topic 1 Regression 17 Next Topic 1. Correlation versus causality 2. Why should we care about causality? 3. The Rubin causal model 4. Endogeneity in the regression model 5. Example Topic 1 Regression 18 Ceteris Paribus – “other things equal” To answer causal questions, we must compare individuals, regions, countries, etc. under equal conditions (ceteris paribus) One unit has been treated, the other not – all other things are equal. Only comparisons under ceteris-paribus assumptions have a causal interpretation Often units of observation differ not only in the treatment status but in many other observable and unobservable characteristics Topic 1 Regression 19 The Rubin causal model : Treatment Variable Examples Worker i participates in a further training measures Student i receives a study grant Company i receives innovation funding Region i receives EU-investment funds Topic 1 Regression 20 Potential and observable outcomes Potential outcomes : potential outcome without treatment. : potential outcome with treatment. Exists for both groups independent of the treatment status Observed outcomes We either observe or ! We do not observe the counterfactual outcome Topic 1 Regression 21 Individual and average causal effect Individual causal effect Causal effect of the treatment for unit i: Average causal effect Average of the individual causal effects across the units n Topic 1 Regression 22 Individual and average causal effect Assume 7 firms consider adopting a new marketing campaign Assume that we know both potential outcomes (profits) 1 3.500 3.400 100 2 2.750 2.800 -50 3 3.000 3.300 -300 4 2.500 2.000 500 5 1.500 1.250 250 6 1.200 1.300 -100 7 3.200 3.000 200 average 2.521 2.436 85 Topic 1 Regression 23 Individual and average causal effect Assume that firms only adopt the campaign if this raises their profits 1 3.500 3.400 100 1 2 2.750 2.800 -50 0 3 3.000 3.300 -300 0 4 2.500 2.000 500 1 5 1.500 1.250 250 1 6 1.200 1.300 -100 0 7 3.200 3.000 200 1 average 2.521 2.436 85 Topic 1 Regression 24 Individual and average causal effect In reality, we only see the realized values 1 3.500 3.400 ? 1 2 2.750 2.800 ? 0 3 3.000 3.300 ? 0 4 2.500 2.000 ? 1 5 1.500 1.250 ? 1 6 1.200 1.300 ? 0 7 3.200 3.000 ? 1 average 2.675 2.467 208 Topic 1 Regression 25 Fundamental problem of causal inference We only observe an individual either treated or untreated, but never both Impossible to identify causal effects without further assumptions about potential outcomes Fundamental Problem of Causal Inference (Holland, 1986) It is impossible to observe and for the same unit i at the same time. Therefore it is impossible to measure the individual causal effect of D on Y. We do not know the counterfactual situation, i.e. what would have happened had individual i not received the treatment Evaluation is a problem of missing data Topic 1 Regression 26 Comparing (conditional) means Observed conditional means Treated (new campaign): Untreated (no new campaign): Comparing conditional means (same as previous example) Topic 1 Regression 27 Causal effect and selection bias Assume that the individual causal effect is identical for all individuals, i.e. We can simplify: Using this in Plug this into the comparison of conditional means Topic 1 Regression 28 Causal effect and selection bias Equation highlights the selection bias: Difference in conditional means = average causal effect + selection bias The problem of comparing conditional means is that the treatment status correlates with the individual treatment effect Firms who are more likely to profit from the campaign are more likely to implement it, etc. In this course, we attempt to remove the correlation between the treatment status and the individual treatment effect only then we can estimate causal effects! Topic 1 Regression 29 Important assumption: STUVA Stable Unit Treatment Value Assumption (SUTVA) Effect of the treatment does not depend on who else gets the treatment (the effect must be stable) i.e. no aggregation effects (macro effects)! Example: effect of marketing campaign on profits If one firm adopts the campaign, this results in (i) higher consumer demand and/or (ii) reallocation of consumer demand towards this one firm What happens if all firms adopt the campaign? (i) higher aggregate demand and potential feedback effects (growth) (ii) no more comparative advantage („rat race“). Causal effects often only have “marginal” or “local” interpretation Topic 1 Regression 30 Content of this Chapter 1. Correlation versus causality 2. Why should we care about causality? 3. The Rubin causal model 4. Endogeneity in the regression model 5. Example Topic 1 Regression 31 Regression model Population regression model We need two assumptions for a causal interpretation of 1. (Exogeneity) Correct specification of the regression model no omitted variables No dependency between observed and unobserved variables The violation of this assumption is called Endogeneity () 2. are i.i.d. These are strong assumptions! Topic 1 Regression 32 Visualization of the assumptions We can use OLS to estimate a causal effect, if 𝑋 𝑌 𝜀 OLS does not provide a causal effect if 𝑋 𝑌 𝑋 𝑌 and/or 𝜀 𝜀 Topic 1 Regression 33 Regression and potential outcomes Observed outcome: In a regression model: Treatment group (): Control group (): measures the causal effect only if the assumptions are met but these assumptions often are violated Confounder: an omitted (often unobserved) variable that correlates with both, treatment status and the outcome variable (e.g. motivation, etc.). Omitted Variable Bias: the bias in your estimated treatment effect (i.e. the difference between the true and the estimated treatment effect) when you do not control for an important confounder. Reverse causality: causes. or simultaneity: and affect each other mutually Topic 1 Regression 34 Foundations of OLS Aim of OLS is to find the best fit of the data to the model Choose and to minimize “unexplained” part Minimize sum of squared deviations (SQD) Topic 1 Regression 35 Foundations of OLS By solving the minimization problem we get the estimator for. Plugging in and : Topic 1 Regression 36 Consistency of OLS Expected value of , i.e.. Under the assumption that , OLS is consistent and provides the true population parameter as the sample size increases Topic 1 Regression 37 Asymptotic bias If the assumption is violated, OLS is inconsistent. we cannot test the assumptions because we do not observe the error term (only the residual ) Asymptotic bias Conclusion If we omit an important variable from the model (e.g. motivation, etc.) then we usually do not receive consistent estimates. Topic 1 Regression 38 Determining the direction of the bias Assume that the true population model is: is management quality of firm. fulfills all assumptions, i.e. , and. We are typically unable to measure (i.e. it is unobservable), thus it falls into the error term Topic 1 Regression 39 Determining the direction of the bias The expected value of then is: The bias depends on the covariance between the omitted variable and the effect of the omitted variable on the outcome variable We can extend the intuition to a model with more than one explanatory variable. Topic 1 Regression 40 Determining the direction of the bias Assume we estimate. What would be the expected bias if we did not control for management quality? Example : higher management quality raises the probability to adopt the marketing campaign. : higher management quality leads to higher profits. We probably have an upward bias i.e. we overestimate the effect of the marketing campaign on profits Topic 1 Regression 41 Endogeneity Endogeneity problem (evaluation problem) An endogeneity problem arises if there is systematic variation in the error term () That is: there is something in the error term that matters for both, outcome () and treatment status () Endogeneity () is the opposite of exogeneity () E.g.: We mistakenly assign the higher profits solely to the marketing campaign and ignore the role of management quality, leading to a biased result. In the rest of this course, we focus on methods to cope with selection bias / endogeneity problems / evaluation problems. All the methods have in common that we try to construct a valid control group (or counterfactual situation) for the treated units. I.e. we try to estimate what would have happened in the absence of the treatment. We discuss the strengths and weaknesses of these methods. Topic 1 Regression 42 Next Topic 1. Correlation versus causality 2. Why should we care about causality? 3. The Rubin causal model 4. Endogeneity in the regression model 5. Example Topic 1 Regression 43 University Choice and Earnings (this is from Chapter 2 of the book) Public vs. private universities – role for earnings? High tuition fees at private universities Problem: people don‘t randomly go to private (or public) universities people differ in many characteristics that matter for both, university choice and earnings Idea: Make people as comparable as possible using regression College & Beyond (C&B) Data Topic 1 Regression 44 Topic 1 Regression 45 Regression Regress earnings on “private school“ dummy and controls: Where : group of students who applied to (and were admitted to) comparable universities – attempt to establish ceteris paribus Topic 1 Regression 46 Further No No Ye No No Ye controls s s Topic 1 Regression 47 Further No No Ye No No Ye Controls s s Topic 1 Regression 48 Omitted Variable Bias Formula from lecture: Here: – effect of private school – effect of unobserved factors on earnings (ability, motivation, …) – role of unobserved factors for private school choice ÞUpward bias because >0 and >0 Do we now control for everything? Probably no … Do Sensitivity/Robustness checks! Of course we never observe FYI: The book shows how the formula computes the correct bias for observed variables Topic 1 Regression 49 Next Topic 2. Randomization Topic 1 Regression 50