Summary

This document is a lecture presentation on correlation versus causation, focusing on econometrics and labor economics. It covers topics such as differences between correlation and causation, the evaluation problem, and selection bias. Examples of omitted variable bias and reverse causality are included.

Full Transcript

Correlation vs. Causation Dr. Francesco Maria Esposito University of Birmingham LH Labour Economics Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 0 / 19 T...

Correlation vs. Causation Dr. Francesco Maria Esposito University of Birmingham LH Labour Economics Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 0 / 19 Topics covered today 1 Differences between correlation and causation 2 The evaluation problem 3 OLS estimator and selection bias Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 1 / 19 Learning objectives By the end of this lecture, you should be able to: Understand the difference between the two concepts of correlation and causation Recognise the importance of causality Argue in favour or against ”causation claims” you have heard in the real world Describe the difficulties in estimating causal effects and understand why the OLS may not produce causal estimates Replicate mathematically the derivation for the selection bias Explain what the selection bias is and when it arises Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 2 / 19 Material for this lecture Basic material: Face to face slides/lectures (including topics that may arise from questions, discussions, or digressions) Advanced/optional selected readings: Leamer E. (1983) ”Let’s take the con out of econometrics”, The American Economic Review, Vol. 73, N.1 DiNardo J. and J.S.Pischke (1997) ”The returns to computer use revisited: have pencils changed the wage structure too?”, The Quarterly Journal of Economics, Vol. 112, N.1 Angrist J. (1998) ”Estimating the labor market impact of voluntary military service using social security data on military applicants”, Econometrica, Vol. 66, N.2 Angrist J. (2010) ”The credibility revolution in empirical economics: how better research designs is taking the con out of econometrics”, Journal of Economic Perspectives, Vol. 24, N.2 Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 3 / 19 Why are we studying econometrics? As (applied) labour economists , we want to provide insight into the way the labour market works, often with a view to informing public policy But we can only do this if we can identify causal effects Changing the covariate of interest would lead to (rather than is simply associated with) a change in the outcome (on average) Let’s now watch the two YouTube clips below: Video-clip 1 Video-clip 2 Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 4 / 19 Differences between correlation and causation 1 According to causation a change in one variable is the reason for (rather to is associated to) the change in the other Correlation claim: When X goes up, also Y goes up Causation claim: If X goes up, then Y goes up 2 Correlation does not have a direction, causation does Saying a change in X causes a change in Y is very different than saying a change in Y causes a change in X Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 5 / 19 Other examples 1 One day I heard on the radio some journalists saying that a research found that television viewing increases mortality This is a clear causation claim. Do you think that TV (on average) may cause death? Why or why not? What effect did researches capture instead? (Omitted variable bias problem) 2 Students who received private tutoring have performed worse with respect to their peers This is a also strong causation claim. Do you think that tutoring (on average) reduces students’ performance? Why or why not? What effect did researches capture instead? (Reverse causality problem) 3 Laid off workers suffer for mental distress Does losing job cause mental distress? Or does suffering of mental distress raise the probability of being laid off (Reverse causality problem) Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 6 / 19 Main concerns when estimating causal effects 1 Omitted variables bias There is an unobserved third factor (or factors) that explains the observed relationship In the previous example n.1 factors such as age and health status were missing. The reason for death is that older people and sick people are more likely to die. And those people are also the ones that watch more TV (on average) 2 Reverse causality A (causal) relationship exists, but it runs in the opposite direction to the one that is inferred In the previous example n.2 is not that tutoring reduces student performance but is that weak students are more likely to have personal tutoring In the previous example n.3 both effects may exist and as labour economists we do not simply want to observe correlation but actually estimate causation and its direction Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 7 / 19 The evaluation problem The challenge of estimating causal effects is often referred to as the evaluation problem Imagine individuals can be treated or not treated: D ∈ 0, 1 E.g., people that watch TV and people that do not, students that have a personal tutor and students who do not, laid off workers and non laid off workers, etc. Ideally we would like to compare: y1i : potential outcome if individual i is treated y0i : potential outcome if individual i is not treated Unfortunately, we never observe both y1i and y0i , i.e. we never observe the same person i receiving the treatment and not receiving the treatment. Either they do or they do not We have a missing data problem and hence we need to ”fill in” with information from elsewhere Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 8 / 19 The evaluation problem (cont’d) For treated individuals: We observe y1i And we want to know also y0i , i.e., what would be their outcome if they were non treated For non treated individuals: We observe y0i And we want to know also y1i , i.e., what would be their outcome if they were treated Challenge is to provide a good proxy for the unobserved outcome in each case, which we call the counterfactual Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 9 / 19 Ordinary least squares (OLS) estimator Remember that the treatment variable is a dummy D ∈ 0, 1 What do we observe for a generic individual i? yi = Di y1i + (1 − Di )y0i yi = Di y1i + y0i − Di y0i yi = y0i + Di (y1i − y0i ) This is starting to look a lot like an OLS regression: yi = α + δDi + ϵi where α is the expected outcome in the absence of treatment: E [y0i ] δ is the difference in outcomes (if treated versus if not), i.e., our coefficient of interest: (y1i − y0i ) ϵ is the error: y0i − E [y0i ] Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 10 / 19 Selection bias For treated individuals: E [yi |Di = 1] = α + δ + E [ϵi |Di = 1] For untreated individuals: E [yi |Di = 0] = α + E [ϵi |Di = 0] Hence, the difference is: E [yi |Di = 1] − E [yi |Di = 0] = δ + E [ϵi |Di = 1] − E [ϵi |Di = 0] where δ is the treatment effect and the remaining part is the selection bias Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 11 / 19 Selection bias (cont’d) For there to be no selection bias, we would need: E [ϵi |Di = 1] = E [ϵi |Di = 0] Or equivalently (E [ϵi ] does not depend on Di ): E [Di ϵi ] = 0 or cov (Di , ϵi ) = 0 Since the covariance formula is: cov (Di , ϵi ) = E [(Di − E [Di ])(ϵi − E [ϵi ])] = E [Di ϵi ] − E [Di ]E [ϵi ] and given that E [ϵi ] = 0 Hence, no selection bias is the same as saying: E [y0i |Di = 0] = E [y0i |Di = 1] this directly follows from the equations in the previous slide Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 12 / 19 Selection bias (cont’d) Hence, for OLS to produce causal estimates (no selection bias) we must assume (equivalently): Either: E [ϵi |Di = 1] = E [ϵi |Di = 0] Or equivalently: E [y0i |Di = 0] = E [y0i |Di = 1] Which means that the outcomes of untreated individuals must be good proxy (appropriate counterfactual) for what would have happened to the treated individuals in the absence of treatment The problem arises because individuals are different and take decisions (endogenously) for different reasons that we may not observe, i.e., the ones that take the treatment are not appropriate counterfactuals for the ones that do not take it (if the treatment is endogenous, i.e., individuals choose whether to be treated or not) Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 13 / 19 Selection bias (cont’d) The assumption may be more likely to hold if we control for observable characteristics of individuals, i.e. yi = α + δDi + β1 x1i + β2 x2i + · · · + βn xni + ϵi This is known as the selection on observables or conditional independence assumption: E [y0i |X , Di = 0] = E [y0i |X , Di = 1] Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 14 / 19 An example: the returns to education Suppose the true model is: y = α + βS + γA + ϵ (in real world also other factors matter) where y are log wages; S are years of schooling; A is ability If we could (perfectly) observe (and control for) A, then β would give us unbiased estimate of return to an extra year of schooling But what if we cannot (perfectly) observe (and control for) ability? If we were to estimate the model: y = α + βS + u Then we know that: u = γA + ϵ The OLS estimator will be biased by the omission of ability. since u is correlated with S and y Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 15 / 19 Selection bias in the example Assume ability is the only unobserved factor correlated with education, i.e., cov (S, ϵ) = 0 cov (y ,S) cov [α+βS+γA+ϵ,S] β= var (S) = var (S) cov (α,S) cov (S,S) cov (A,S) cov (ϵ,S) β= var (S) + β var (S) + γ var (S) + var (S) β= β + γ cov (A,S) var (S) or more generally: β = β + γ cov (u,S) var (S) where the second term is exactly the selection bias Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 16 / 19 Estimating the causal effects OLS is more likely to produce causal estimates if we have access to rich data (i.e., when we have more X But there may still something we cannot control for (unobservables) and as a consequence treated individuals are different from untreated ones, i.e., there is no good counterfactual Participants may select into the programme (or be selected in) on the basis of unobservable characteristics Or they may select in (or be selected in) on the basis of their (unobservable) expected gains from treatment Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 17 / 19 Understanding the direction of the bias 1 Scenario 1: All employees in a firm are offered the chance to participate in a voluntary job training programme. We are interested in estimating the impact of the programme on wages Likely more motivated employees apply, which leads to an upward bias 2 Scenario 2: Students with very low grades are forced to attend extra classes. We are interested in estimating the impact of these classes on grades Likely low grades students have lower abilities, which leads to a downward bias Usually, when estimates are biased downward, research outputs are still interesting because researchers can (at least) estimate a lower bound of the effects of the programme Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 18 / 19 To conclude OLS provides unbiased estimates if and only if there is only selection on observables (that we can control for) But what can we do to estimate causal effects if we are worried about selection on unobservables or unobserved heterogeneity? Dr. Francesco Maria Esposito Correlation vs. Causation LH Labour Economics 19 / 19

Use Quizgecko on...
Browser
Browser