Causality and Experiments Lecture PDF

Summary

This document is a lecture on causality and experiments focusing on different methods to analyze data. It discusses internal and external validity and explores practical examples related to economics. The lecture also delves into concepts like attrition and general equilibrium effects.

Full Transcript

Causality and Experiments 2 Magnus Carlsson Analyzing data from experiments: the difference-in-means estimator With randomization, we saw that ATE = ATET = E 𝑌1𝑖 |𝐷𝑖 = 1 − E 𝑌0𝑖 |𝐷𝑖 = 0 We can estimate this by simply comparing sample means: ∑𝑛𝑖=1 𝐷𝑖 𝑌𝑖 ∑𝑛𝑖=1 1 −...

Causality and Experiments 2 Magnus Carlsson Analyzing data from experiments: the difference-in-means estimator With randomization, we saw that ATE = ATET = E 𝑌1𝑖 |𝐷𝑖 = 1 − E 𝑌0𝑖 |𝐷𝑖 = 0 We can estimate this by simply comparing sample means: ∑𝑛𝑖=1 𝐷𝑖 𝑌𝑖 ∑𝑛𝑖=1 1 − 𝐷𝑖 𝑌𝑖 ෣ = 𝐴𝑇𝐸𝑇 𝐴𝑇𝐸 ෣ = 𝑛 − ∑𝑖=1 𝐷𝑖 ∑𝑛𝑖=1 1 − 𝐷𝑖 where 𝑌𝑖 = 1 − 𝐷𝑖 𝑌0𝑖 + 𝐷𝑖 𝑌1𝑖 , which is always observed. Regression and randomization Alternatively, one could specify the regression model: 𝑌𝑖 = 𝛽0 + 𝛽1 𝐷𝑖 + 𝑈𝑖 (1) Randomization ensures that 𝐷𝑖 is independently distributed of the unobserved factors in 𝑈𝑖. The intuition should be clear; since it is random if one gets treated, treatment status could not be systematically related to any other unobserved factors. This also means that the zero conditional mean assumption (recall…?) is automatically fulfilled! More on this later! Internal and external validity of experiments When evaluating the findings from any experiment (and also any non- experimental analysis), it is useful to make a distinction between internal and external validity. Internal validity: Does the experiment provide an estimate of the causal effect in the population of interest? External validity: The extent to which the estimated causal effect can be generalized to other populations, economic settings or related treatments. Often, there exists a trade off between internal and external validity Threats to the internal validity: partial compliance with treatment Sometimes in an experiment only a fraction of the individuals who are offered the treatment take it up. Moreover, some members of the comparison group may receive the treatment. The failure of subjects in the treatment and control group to follow the randomized treatment protocol is called partial compliance. With partial compliance the treatment group may no longer be a random sample from the larger treatment population; instead the treatment group has an element of self-selection. Threats to the internal validity: partial compliance with treatment Note, however, that some experiments do not intend to treat everybody in the treatment group. Such experiments are called encouragement designs. Here, the offer of treatment is randomized and people are free to seek treatment. Randomization then only affects the probability that the individual is exposed to treatment, rather than the treatment itself. In such designs, partial compliance with the treatment is fine and the resulting estimates are called intention-to-treat (ITT) estimates (see Duflo et al., section 6.2). Threats to the internal validity: partial compliance with treatment Too see how ITT-estimates work, denote Z the variable that is randomly assigned in the experiment, while D is the actual treatment. For instance, Z could be an offer to get nutrition supplements and D denotes actually taking the supplements By randomization, we know that 𝐸 𝑌0𝑖 |𝑍𝑖 = 1 = 𝐸 𝑌0𝑖 |𝑍𝑖 = 0 and that 𝐸 𝑌𝑖 |𝑍𝑖 = 1 − 𝐸 𝑌𝑖 |𝑍𝑖 = 0 equals the causal effect of Z. However, this in not the same as the causal effect of the treatment, D, since 𝐷 ≠ 𝑍. Threats to the internal validity: partial compliance with treatment Sometimes, the ITT could be the parameter of interest, rather than the treatment effect itself! The reason is that some policies will never reach everybody and in such cases, policy-makers may be more interested in ITT- estimates If interested in the effect of the treatment itself, however, partial compliance is a problem. Also, external validity may be compromised. As we will later see, in such cases, one can still use randomization as an instrumental variable and still recover something called a Local Average Treatment Effect (LATE) Internal validity: attrition Another threat to internal validity is when individuals drop out of the experiment. This is not a problem, if the drop-out is completely random (only statistical power will be reduced). If drop-out is non-random, and related to treatment, however, we have a problem. Such dropout can be expected to result from optimizing behavior, where those who gain little from treatment may be more likely to dropout. Internal validity: attrition Note that even if attrition rates are equal between treatment and control group, the attrition may be produced differently. Consider a medical experiment, where attrition due to the death may obviously be reduced in the treatment group At the same time, attrition may increase in the treatment group, since some subjects feel more healthy and therefore decides to leave the experiment In applied work one needs to carefully monitor and compare dropouts in the treatment and control group External validity of experiments: nonrepresentative treatments and populations All “treatments” are different and there is inevitably something of a leap of faith in assuming that a particular treatment or policy will have the same impact in other places at other times. Individuals in experiments may not be a random sample of the population of interest. Example of when a non-generalizable sample might arise is when the participants are volunteers, like in many medical studies. Even with randomization to control and treatment group, it is likely that the volunteers are more motivated than the overall population and treatment may have a greater effect among them. External validity: Hawthorne and John Henry effects Such effects arise when being in an experiment in itself causes the treatment or comparison group to change their behavior. Subjects improve or modify an aspect of their behavior being experimentally measured simply in response to the fact that they are being studied, not in response to any particular experimental manipulation. Like a placebo effect! Note, however, that internal validity is ok, but that the causal effect now also includes the placebo effect External validity: Hawthorne effects. Changes in behavior among the treatment group are called Hawthorne effects For instance, some may get excited by the attention given or be grateful and perform better than usual. Hawthorne refers to a study in the 1920s, where researchers studied how manipulating the level of light at the Hawthorne factories affected productivity They found that productivity increased no matter in which way the light was manipulated... External validity: John Henry effects Changes in behavior among the comparison group are called John Henry effects For instance, the comparison group may feel offended to be a comparison group and react by also altering their behavior The name refers to the behavior of a legendary American steel worker in the 1870s who, when he heard his output was being compared with that of a steam drill, worked so hard to outperform the machine that he died in the process... The concern in experiments is that Hawthorne and Henry effects may not generalize to other settings External validity: general equilibrium effects Because experiments are often limited in scope, the effects do not reflect any general equilibrium effects that would occur if the experiment was scaled up Turning an experiment into a large-scale program may change the economic environment in a way that the small-scale experiment did not. For instance, consider the small-scale experiment on school vouchers in Colombia, as described by Duflo et al (p. 67). What would be the effect of implementing this voucher system in the entire schooling system of Colombia? External validity: general equilibrium effects One general equilibrium effect may be that the vouchers increase pressure on public schools and that these schools improve performance. Another effect, however, is that the vouchers pull the most motivated children and their parents out of public schools, which may reduce the pressure on public schools to perform None of these effects are possible to measure in a small scale experiment, but will have crucial influences on the large-scale effect of the program Examples of experiments in economics Experiments increasingly common in economics. Three types: Lab experiments; e.g. trust games in the lab where the researcher can randomly assign the characteristics of the game Field experiments; taking experiments to the field Incidental experiments; e.g. draft lotteries, random assignment of judges in court etc. Now; take a look at some examples Example of field experiments: discrimination Bertrand and Mullainathan (2004). Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination. American Economic Review, 2004, 94(4). Question; how to measure discrimination on the labour market? Say we would like to estimate the “effect” of being black on labour market outcomes. The treatment effect is: 𝐸[𝑌1𝑖 − 𝑌0𝑖 |𝐵𝑖 = 1], (2) Where 𝐵𝑖 = 1 denotes being black and 𝐵𝑖 = 0 being white. Example of experiments in economics (1): field experiment Early empirical evidence came from running an earnings regressions and including a dummy variable for race and control variables However, one suspects that there are variables not observed by the researcher that could explain the results Another approach was audit studies, where actors of different race were send to interviews and trained to behave similarly But actors mean small number of observations and we are not sure that they really acted the same Example of field experiments: discrimination Bertrand and Mullainathan avoided these problems by sending out large number of “fake” job applications to real jobs. ln these applications, they manipulated perceptions of race by randomly assigning the applications with distinctively ethnic names They implemented this by randomizing distinctively black and white- sounding names to otherwise identical CV:s Discrimination was measured by comparing callback rates from employers for those with black and white-sounding names Example of field experiments: discrimination Thus, instead of manipulating race, they manipulated perceptions of race. With the treatment now being “name” (𝑁𝑖 ), the probability of a callback for resume i given a white-sounding (𝑌0𝑖 ) or black-sounding name (𝑌1𝑖 ) can be written as: 𝐸 [𝑌0𝑖 ] = 𝛼𝑖 (3) 𝐸 [𝑌1𝑖 ] = 𝛼𝑖 + γ𝑁𝑖 , (4) Where 𝑁𝑖 = 1 means treated with a black-sounding name. The treatment effect is then: 𝐸 𝑌1𝑖 − 𝑌0𝑖 |𝑁𝑖 = 1 = 𝛼𝑖 + γ𝑁𝑖 − 𝛼𝑖 = γ (5) Example of field experiments: discrimination To see how randomization helps, consider again the formula for selection bias on slide 24, but adjusted for treatment with “names”: 𝐸 𝑌1𝑖 − 𝑌0𝑖 |𝑁𝑖 = 1 + 𝐸 𝑌0𝑖 |𝑁𝑖 = 1 − 𝐸 𝑌0𝑖 |𝑁𝑖 = 0 (6) which can be written as: 𝐸 𝑌1𝑖 − 𝑌0𝑖 |𝑁𝑖 = 1 + 𝑎 𝑏𝑖𝑎𝑠 𝑡𝑒𝑟𝑚. (7) ln this case, the bias term would be the differences in potential call-backs for resumes with a black and white-sounding names Example of field experiments: discrimination But since we now that the resumes are identical with the exception of names, we know that 𝐸 𝑌0𝑖 |𝑁𝑖 = 1 = 𝐸 𝑌0𝑖 |𝑁𝑖 = 0 and the “bias term” therefore cancels out ln other words, if we would have “treated” (the counterfactual) the resume with a black-sounding name with a white name instead, the outcome would have been what the outcomes for the resumes with a white-sounding name are. Hence, we know that any differences in the probability of callback between black-sounding and white-sounding resumes will be due to the random name assignment and no other factor Example of field experiments: anonymity in giving Soetevent, A. (2005). Anonymity in Giving in a Natural Context — A Field Experiment in 30 Churches” Journal of Public Economics, 89(11-12), 2301-2323 Causal question: How does anonymity affect giving monetary contributions to good causes? Do social incentives play in? The paper randomized anonymity of giving to offerings (kollekt) in thirty Baptist churches in the Netherlands. ln this particular environment, one expects strong social ties to exist between congregation members To examine the role of anonymity, the collection bags were randomly replaced with open collection baskets. Field experiments: Anonymity in giving Field experiments: Anonymity in giving Initially, contributions to the services’ offerings increase by 10% when baskets are used The positive effect of using baskets peters out over the experimental period. Additional data on the coins collected show that in both offerings, people switch to giving larger coins when baskets are used (also when total amount was the same) Results support the hypothesis that social incentives like receiving approval from others play an important role in giving and are triggered by the removal of anonymity. Incidental experiments: women as policy makers Chattopadhyay and Duflo (2004) ‘Women as Policy Makers: Evidence from a Randomized Policy Experiment in India” Econometrica, Vol. 72, 1409—1443 Causal question: Do female politicians implement different policies than male politicians? Two conditions are necessary in order to find such an effect: Men and women have different preferences The identity of the policymaker must affect outcomes Incidental experiments: women as policy makers Since the mid 1990s, one third of Village Council head positions in India have been randomly reserved for a woman: ln these councils only women could be elected to the position of head. They conducted a detailed survey of all investments in local public goods in a sample of villages in two districts, Birbhum in West Bengal and Udaipur in Rajasthan, and compared investments made in reserved and unreserved villages. As villages were randomly selected to be reserved for women, differences in investment decisions can be confidently attributed to the reserved status of those villages. Results suggest that leaders invest more in infrastructure that is directly relevant to the needs of their own genders. Summary of social experiments An ideal randomized trial is usually considered to be the gold standard order to establish causality We can use this gold standard to evaluate how close non-experimental estimators may come to mimicking a “true” experiment Even when randomization is not possible, a good starting point for any research project is to think what would have been the perfect experiment to run One should then try to come as close as possible to this experiment, using non-experimental data