Causal Analysis Instrumental Variables Part 1 PDF

Document Details

AppreciatedUranium

Uploaded by AppreciatedUranium

University of Bern

2024

Michael Gerfin

Tags

instrumental variables causal analysis econometrics statistics

Summary

This document is a lecture or presentation on causal analysis using instrumental variables. It provides an overview of the topic and discusses applications and examples in the field of economics. The document is for an audience at a postgraduate level at the University of Bern, Switzerland during Spring 2024.

Full Transcript

Causal Analysis Instrumental Variables: Part 1 Michael Gerfin University of Bern Spring 2024 Contents 1. Introduction 2. IV and 2SLS 3. Applications 4. IV Details 2 / 40 Introduction Introduction IV story in two iterations: first with constant effects, then in a framework with heterogeneous potentia...

Causal Analysis Instrumental Variables: Part 1 Michael Gerfin University of Bern Spring 2024 Contents 1. Introduction 2. IV and 2SLS 3. Applications 4. IV Details 2 / 40 Introduction Introduction IV story in two iterations: first with constant effects, then in a framework with heterogeneous potential outcomes (causal effects are also heterogeneous then) Initial focus on constant effects is to explain the mechanics of IV (how it works and when it fails) But first: Why do IV? The answer is... sometimes the regression we’ve got is not the regression we want bias (selection, omitted variables,...) 3 / 40 Introduction IV and agnostic regression Coming back to previous example : causal link between college and wages written as Yi = α + τ Di + ηi Imagine a vector of controls (“ability”), part of which are observed (Xi ), part are unobserved (Ui ) The regression we want, perhaps causal, can be written Yi = α + τ Di + Xi′ γ + Ui If Di and Ui are correlated then controlling for X is not sufficient for identifying τ the regression we get is not the regression we want The error term Ui is the random part of potential outcomes left after controlling for Xi 4 / 40 Introduction CIA does not identify AT E Here, the causal effect is not identified by assuming CIA (conditioning on X). We cannot close the back-door path D ← U → Y (sometimes called “selection on unobservables”) Remember the identification logic in a DAG: manipulate the treatment (switch the dummy on or off) and observe what happens to Y. The causal effect is identified if nothing else changes on a non-causal path that would confound the change in Y. 5 / 40 IV and 2SLS 1. Introduction 2. IV and 2SLS 3. Applications 4. IV Details 6 / 40 IV and 2SLS IV and OVB Let’s ignore observable elements of ability for the moment Write potential outcome Y0i = α + Ui Assuming a constant effect τ we have Y1,i − Y0,i = τ A variable with the following properties allows to identify τ even if D is correlated with U 1 2 3 correlated with the causal variable of interest (Di ) uncorrelated with the omitted variables we need to control for (independence): Cov(U, Z) = 0 not part of the causal model of interest (exclusion restriction) A variable with these properties is called an Instrumental Variable (IV) The exclusion restriction in words states that the only reason Y is correlated with Z is through D 7 / 40 IV and 2SLS IV in a DAG Assumption 1 is shown by the arrow from Z to D Assumption 2 is shown by the absence of an arrow between Z and U Assumption 3 is shown by the absence of an arrow between Z and Y 8 / 40 IV and 2SLS IV mechanics Rewrite the independence assumption as follows Cov(U, Z) = 0 Cov(Y − τ D, Z) = 0 Cov(Y, Z) − τ Cov(D, Z) = 0 Cov(Y, Z) τ= Cov(D, Z) Cov(Y, Z)/V ar(Z) = Cov(D, Z)/V ar(Z) “RF” (regression of Y on Z) = “1st”(regression of D on Z) The IV estimator is the sample analog of this expression 9 / 40 IV and 2SLS IV in a DAG There is causal chain from Z through D to Y What’s going on here? In the IV framework, not D is manipulated, but Z. When we change Z we record the changes in D (first stage, π1 ) and Y (reduced form, π1 × τ ). The ratio of these changes is the causal effect. There is no open back-door path Z → D ← U → Y because D is a collider on this path and not controlled for. 10 / 40 IV and 2SLS The Wald estimator Without covariates, the causal constant-effects model is Yi = α + τ Di + Ui where Ui and Di may be correlated. When Zi is a dummy variable that equals 1 with probability p, we can show that Cov(Yi , Zi ) = {E[Yi |Zi = 1] − E[Yi |Zi = 0]}p(1 − p) with an analogous formula for Cov(Di , Zi ). It follows that Cov(Yi , Zi ) E[Yi |Zi = 1] − E[Yi |Zi = 0] = Cov(Di , Zi ) E[Di |Zi = 1] − E[Di |Zi = 0] 11 / 40 IV and 2SLS Two-stage least squares (2SLS) In practice, we do IV by doing 2SLS, which allows us to add exogenous covariates and to use multiple instruments In our example, a causal model, without covariates, is Yi = α + τ Di + Ui Where does two-stage least squares come from? Write 1st as the sum of fitted values plus first-stage residuals: Di = π̂0 + π̂1 Zi + ξˆ1i = D̂i + ξˆi Substitute first-stage fitted values for Di , D̂i , in causal model: Yi = α + τ D̂i + [Ui + τ ξi ] and use OLS to estimate the “second stage” 12 / 40 IV and 2SLS Two Stage Least Squares (2SLS) 1. Stage Di = π0 + π1 Zi + ξi D̂i = π̂0 + π̂1 Zi π̂1 = Cov(D, Z)/V ar(Z) 2. Stage Y = α + τ D̂i + [Ui + τ ξi ] = α + τ D̂i + ei = α + τ (π̂0 + π̂1 Zi ) + ei = (α + τ π̂0 ) + τ (π̂1 Zi ) + ei Cov(Y, π̂1 Z) π̂1 · Cov(Y, Z) = V ar(π̂1 Z) π̂12 · V ar(Z) V ar(Z) Cov(Y, Z) 1 Cov(Y, Z) = = π̂1 V ar(Z) Cov(D, Z) V ar(Z) Cov(Y, Z) = Cov(D, Z) τ= 13 / 40 IV and 2SLS Two-stage least squares (2SLS) Intuitively, what does 2SLS do? It decomposes D into two parts: 1 2 the part that is uncorrelated with U , D̂, which is a function of Z and X, both assumed to be uncorrelated with U the part that is correlated with U , ξˆ1i For identification in the second stage, only the uncorrelated part D̂ is used In other words, 2SLS removes the correlation between the potential outcomes and the treatment The precise connection between IV and potential outcomes will be discussed in the second part of the IV section 14 / 40 IV and 2SLS Two-stage least squares (2SLS) Where does a good instrument come from? Ideally, it is as good as randomly assigned Random assignment, e.g. lotteries, with partial compliance Random events like date of birth, combined with legislation referring to age Gender of children If the instrument is non-random we need a further assumption: conditional on X potential outcomes are independent of Z 15 / 40 IV and 2SLS 2SLS with random instrument and partial compliance Despite random assignment some agents choose not to get treated. This decision is a function of observables X and unobservables, shown by the dashed arrow. In this case we do not need to adjust for X because the back-door path D ← X → Y is closed anyway (D is collider on path Z → D ← X → Y ). The same argument holds for Z→D↔Y A reason to adjust for X is to reduce residual variance in Y , making estimates more precise 16 / 40 IV and 2SLS 2SLS with non-random instrument Here the instrument is a function of X. In this case we need to adjust for X because the back-door path Z ← X → Y is open. Adjusting for X allows to identify the casual effect. 17 / 40 Applications 1. Introduction 2. IV and 2SLS 3. Applications 4. IV Details 18 / 40 Applications Overview of applications Angrist (1990), “Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records”, AER (in part 1) Angrist and Krueger (1991), “Does Compulsory Schooling Attendance Affect Schooling and Earnings?”QJE (in part 2) Angrist and Evans (1998), “Children and Their Parents’ Labor Supply: Evidence from Exogenous Variation in Family Size”, AER (in part 2) Abadie, Angrist, and Imbens (2002) “Instrumental Variables Estimates of the Effect of Subsidized Training on the Quantiles of Trainee Earnings,” Econometrica (simplified example using the data in part 1 and 2) Abadie (2003), “Semiparametric Instrumental Variable Estimation of Treatment Response Models” Journal of Econometrics (simplied example using the data in part 1) 19 / 40 Applications Effect of military service on earnings (Angrist 1990) How were men who served in the Vietnam-era affected by their service? Let Di indicate veterans Key variables Zi = randomly assigned draft-eligibility in the 1970-72 draft lotteries Di = a dummy indicating Vietnam-era veterans The causal effect of Vietnam-era military service is the difference in average earnings by draft-eligibility status (the draft-eligibility reduced form) divided by the difference in the probability of service (the draft-eligibility first stage) 20 / 40 Applications Military service and earnings (Angrist 1990) Table 4.1.3 IV Estimates of the Effects of Military Service on the Earnings of White Men born in 1950 Earnings Veteran Status Earnings year Mean Eligibility Effect Mean Eligibility Effect 1981 (1) (2) (3) (4) 16,461 -435.8 (210.5) 1971 3,338 -325.9 (46.6) 1969 2,299 -2.0 (34.5).267.159 (.040) Wald Estimate of Veteran Effect (5) -2,741 (1,324) -2050 (293) Note: Adapted from Table 5 in Angrist and Krueger (1999) and author tabulations. Standard errors are shown in parentheses. Earnings data are from Social Security administrative records. Figures are in nominal dollars. Veteran status data are from the Survey of Program Participation. There are about 13,500 individuals in the sample. MIT 14.387 SPRING 2010 21 / 40 Applications Fertility and labor supply (Angrist & Evans 1998) Effect of children on labor supply of mothers The decision to have children is endogenous and probably correlated with preferences regarding work Angrist & Evans (1998) analyze the effect of having a third child on mothers’ labor supply D=1 if mother has three children, D = 0 if she has 2 children As possible instruments z they Z = 1 if the second birth was a twin birth Z = 1 if the first two children have the same sex In both cases A&E argue that the instruments are as good as randomly assigned 22 / 40 Applications Fertility and labor supply (Angrist & Evans 1998) 23 / 40 Applications Randomized trial with partial compliance Analyze the effect of training provided under the Job Training Partnership Act (JTPA), a large publicly-funded training program Individuals in the randomly assigned JTPA treatment group were offered training, while those in the control group were excluded for a period of 18 months Only 60 percent of the treatment group actually received training, but the randomized treatment assignment provides an instrument for treatment status Use data analyzed by Abadie, Angrist and Imbens (2002) y is 30-months earnings, d is participation in training, z is randomly assigned treatment eligibility 24 / 40 Applications Descriptive Statistics I 25 / 40 Applications JTPA in a DAG 26 / 40 Applications Descriptive Statistics II 27 / 40 Applications Wald estimator z=0 z=1 d=0 d=1 d¯.01 ȳ 18’494.62 19’520 17’485 21’4550 (a) ȳ|z = 1 − ȳ|z = 0 = 19′ 520 − 18′ 404 = 1′ 116 (= π1 × τ ) ¯ = 1 − d|z ¯ = 0 =.62 −.01 =.61 (= π1 ) (b) d|z τ = 1′ 116/.61 = 1′ 830 (c) ȳ|x = 1 − ȳ|x = 0 = 21′ 485 − 17′ 485 = 3′ 970 (biased) 28 / 40 Applications JTPA analysis: OLS. reg y d , robust Linear regression Number of obs F(1, 5100) Prob > F R-squared Root MSE y Coef. d _cons 3970.212 17485.29 Robust Std. Err. 555.1287 351.3664 t 7.15 49.76 Coef. d hsorged hispanic black married wkless13 age2225 age2629 age3035 age3644 age4554 class_tr ojt_jsa f2sms _cons 3753.648 4015.431 250.9286 -2354.207 6545.679 -6581.672 5945.867 7205.01 5736.843 4603.918 2125.385 -1644.469 454.4841 2101.602 9810.577 Robust Std. Err. 536.3313 570.7427 883.456 625.851 628.5225 565.9342 1419.189 1456.585 1432.993 1442.486 1544.613 752.8068 621.3878 550.6325 1541.213 t 7.00 7.04 0.28 -3.76 10.41 -11.63 4.19 4.95 4.00 3.19 1.38 -2.18 0.73 3.82 6.37 5,102 51.15 0.0000 0.0100 19444 P>|t| [95% Conf. Interval] 0.000 0.000 2881.921 16796.46. reg y d hsorged hispanic black married wkless13 age2225 age2629 > class_tr ojt_jsa f2sms, robust Linear regression Number of obs F(14, 5087) Prob > F R-squared Root MSE y = = = = = P>|t| 0.000 0.000 0.776 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.169 0.029 0.465 0.000 0.000 5058.502 18174.12 age3035 age3644 age4554 /// = = = = = 5,102 38.35 0.0000 0.0909 18657 [95% Conf. Interval] 2702.208 2896.529 -1481.025 -3581.144 5313.505 -7691.146 3163.645 4349.477 2927.561 1776.025 -902.7205 -3120.294 -763.7035 1022.125 6789.137 4805.089 5134.332 1982.883 -1127.269 7777.854 -5472.197 8728.088 10060.54 8546.126 7431.81 5153.491 -168.6433 1672.672 3181.079 12832.02 29 / 40 Applications JTPA analysis: 2SLS without covariates. ivregress 2sls y (d=z), first robust First-stage regressions Number of obs F( 1, 5100) Prob > F R-squared Adj R-squared Root MSE d Coef. z _cons.6116735.0111568 Robust Std. Err..008696.0025457 t 70.34 4.38 Instrumental variables (2SLS) regression y Coef. d _cons 1825.459 18383.21 Instrumented: Instruments: Robust Std. Err. 927.9389 462.9362 z 5,102 4947.68 0.0000 0.3418 0.3417 0.4003 P>|t| [95% Conf. Interval] 0.000 0.000.5946256.0061661 Number of obs Wald chi2(1) Prob > chi2 R-squared Root MSE 1.97 39.71 = = = = = = = = = = =.6287213.0161475 5,102 3.87 0.0492 0.0071 19469 P>|z| [95% Conf. Interval] 0.049 0.000 6.732185 17475.87 3644.186 19290.55 d z 30 / 40 Applications JTPA analysis: 2SLS with covariates. ivregress 2sls y (d=z) hsorged hispanic black married wkless13 age2225 age2629 age3035 age3644 age4554 /// > class_tr ojt_jsa f2sms, robust Instrumental variables (2SLS) regression Number of obs = 5,102 Wald chi2(14) = 482.79 Prob > chi2 = 0.0000 R-squared = 0.0880 Root MSE = 18659 y Coef. d hsorged hispanic black married wkless13 age2225 age2629 age3035 age3644 age4554 class_tr ojt_jsa f2sms _cons 1592.937 4075.108 335.1946 -2349.052 6647.189 -6574.585 5941.196 7113.773 5685.077 4510.579 2004.085 -1351.742 425.173 2101.86 10641.25 Instrumented: Instruments: Robust Std. Err. 893.2956 572.4642 886.2038 624.4687 626.4701 566.407 1417.888 1454.796 1431.701 1441.645 1543.366 754.8366 621.6454 551.0114 1567.037 z 1.78 7.12 0.38 -3.76 10.61 -11.61 4.19 4.89 3.97 3.13 1.30 -1.79 0.68 3.81 6.79 P>|z| 0.075 0.000 0.705 0.000 0.000 0.000 0.000 0.000 0.000 0.002 0.194 0.073 0.494 0.000 0.000 [95% Conf. Interval] -157.8901 2953.099 -1401.733 -3572.988 5419.33 -7684.722 3162.186 4262.425 2878.995 1685.007 -1020.858 -2831.195 -793.2296 1021.897 7569.913 3343.764 5197.118 2072.122 -1125.116 7875.048 -5464.448 8720.206 9965.121 8491.159 7336.151 5029.027 127.7104 1643.576 3181.822 13712.58 d hsorged hispanic black married wkless13 age2225 age2629 age3035 age3644 age4554 class_tr ojt_jsa f2sms z 31 / 40 Applications Effect of 401(k) plans on savings 401(k) plans are pension savings plans in which employers match the workers contribution with given percentage Data used are called 401KSUBS. describe nettfa p401k e401k inc age agesq marr fsize storage display value variable name type format label variable label nettfa p401k e401k inc age agesq marr fsize float byte byte float byte int byte byte %9.0g %9.0g %9.0g %9.0g %9.0g %9.0g %9.0g %9.0g net total fin. assets, $1000 =1 if participate in 401(k) =1 if eligble for 401(k) annual income, $1000s age^2 age^2 =1 if married family size 32 / 40 Applications Descriptive Statistics. sum nettfa e401k inc age agesq marr fsize if p401k Variable Obs Mean Std. Dev. Min Max nettfa e401k inc age agesq 2562 2562 2562 2562 2562 38.47296 1 49.81514 41.51327 1816.471 79.27108 0 26.81424 9.651726 838.3487 -283.356 1 10.14 25 625 1536.798 1 192.99 64 4096 marr fsize 2562 2562.6955504 2.920375.4602638 1.468098 0 1 1 13. sum nettfa e401k inc age agesq marr fsize if p401k==0 Obs Mean Std. Dev. Min Variable Max nettfa e401k inc age agesq 6713 6713 6713 6713 6713 11.66722.160137 35.22425 40.91494 1784.944 55.28923.3667604 21.64917 10.53225 916.4837 -502.302 0 10.008 25 625 1462.115 1 199.041 64 4096 marr fsize 6713 6713.6030091 2.871592.4893105 1.547197 0 1 1 13 33 / 40 Applications OLS. reg nettfa p401k. reg nettfa Coef. p401k _cons 26.80574 11.66722 Std. Err. 1.459166.7668973 t 18.37 15.21 P>|t| [95% Conf. Interval] 0.000 0.000 23.94546 10.16393 P>|t| [95% Conf. Interval] 29.66603 13.17051 nettfa p401k inc age agesq marr fsize nettfa Coef. p401k inc age agesq marr fsize _cons 13.52705.9769312 -2.311125.0386992 -8.369471 -.7856499 10.04212 Std. Err. 1.394046.0282551.5017084.005774 1.639216.4970725 10.12265 t 9.70 34.58 -4.61 6.70 -5.11 -1.58 0.99 0.000 0.000 0.000 0.000 0.000 0.114 0.321 10.79441.921545 -3.294583.0273809 -11.58269 -1.760021 -9.800509 16.25968 1.032317 -1.327666.0500175 -5.156248.1887215 29.88474 34 / 40 Applications 2SLS. ivregress 2sls nettfa (p401k=e401k) Instrumental variables (2SLS) regression nettfa Coef. p401k _cons 26.77116 11.67677 Instrumented: Instruments: Std. Err. 1.896862.8367317 Number of obs = z 14.11 13.96 P>|z| [95% Conf. Interval] 0.000 0.000 23.05338 10.03681 30.48894 13.31674 p401k e401k. ivregress 2sls nettfa (p401k=e401k) inc age agesq marr fsize Instrumental variables (2SLS) regression Number of obs = nettfa Coef. p401k inc age agesq marr fsize _cons 9.418828.99719 -2.238551.0378519 -8.355871 -.8189625 9.007582 Instrumented: Instruments:. 9275 Std. Err. 1.85786.0288992.5022227.0057801 1.639369.4972173 10.12829 z 5.07 34.51 -4.46 6.55 -5.10 -1.65 0.89 P>|z| 0.000 0.000 0.000 0.000 0.000 0.100 0.374 9275 [95% Conf. Interval] 5.777489.9405485 -3.222889.0265232 -11.56898 -1.793491 -10.84351 13.06017 1.053831 -1.254213.0491806 -5.142766.1555656 28.85867 p401k inc age agesq marr fsize e401k 35 / 40 Applications First stage. reg p401k e401k inc age agesq marr fsize Source SS df MS Model Residual 1105.7232 748.584728 6 9268 184.287199.080770903 Total 1854.30792 9274.19994694 p401k Coef. e401k inc age agesq marr fsize _cons.6883064.001334 -.0048197.0000532 -.0004663.0000588.0566798 Std. Err..0062974.0001389.0024765.0000285.0080732.0024486.0499041 t 109.30 9.60 -1.95 1.87 -0.06 0.02 1.14 Number of obs F( 6, 9268) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.000 0.000 0.052 0.062 0.954 0.981 0.256 = 9275 = 2281.60 = 0.0000 = 0.5963 = 0.5960 =.2842 [95% Conf. Interval].6759621.0010616 -.0096741 -2.68e-06 -.0162916 -.004741 -.0411432.7006507.0016063.0000348.0001091.0153589.0048586.1545027 36 / 40 Applications Instrument. reg e401k inc age agesq marr fsize Source SS df MS Model Residual 174.111763 2036.71368 5 9269 34.8223526.219733918 Total 2210.82544 9274.238389632 e401k Coef. inc age agesq marr fsize _cons.0052263.0326673 -.0003769.005487 -.0118661 -.4482027 Std. Err..0002226.0040706.0000468.0133157.0040368.0821791 t 23.48 8.03 -8.05 0.41 -2.94 -5.45 Number of obs F( 5, 9269) Prob > F R-squared Adj R-squared Root MSE P>|t| 0.000 0.000 0.000 0.680 0.003 0.000 = = = = = = 9275 158.48 0.0000 0.0788 0.0783.46876 [95% Conf. Interval].0047899.0246881 -.0004687 -.0206146 -.0197791 -.6092918.0056627.0406466 -.0002851.0315886 -.0039531 -.2871137 Obviously, instrument is not as good as randomly assigned. We may argue that it is as good as randomly assigned conditional on X (CIA assumption with respect to instrument, not treatment) 37 / 40 IV Details 1. Introduction 2. IV and 2SLS 3. Applications 4. IV Details 38 / 40 IV Details 2SLS Mistakes: Manual 2SLS Manual 2SLS estimate the first stage yourself and generate fitted values of endogenous variable plug the fitted values into the second stage equation and run OLS OLS standard errors from the manual second stage will not be correct the fact that the fitted values are estimated is not taken into account There are other risks as well 39 / 40 IV Details IV bias As stated (but not proven) IV is biased Intuitively, the bias is a consequence of the fact that the first stage is estimated Just-identified 2SLS (e.g. the Wald estimator) is approximately unbiased The just-identified sampling distribution has no moments, but just-identified 2SLS is approximately centered where it should be unless the instruments are really weak The reduced form is unbiased: if you can’t see the relationship you’re after in the reduced form, it ain’t there 40 / 40

Use Quizgecko on...
Browser
Browser