Lecture on Panel Data and Fixed Effects

Summary

This lecture, presented by Magnus Carlsson, explores panel data and fixed effects. It discusses the estimation of causal effects, methods for controlling omitted variables, and the application of models such as random effects and fixed effects approaches. It offers different estimation techniques and provides insights into interpreting the results.

Full Transcript

Panel data and fixed effects Magnus Carlsson Fixed effects, difference-in-differences, and panel data To estimate causal effects, experiments, IV or RD methods are preferred methods ln many applications, however, experiments are impossible and there are simply no good instruments or disco...

Panel data and fixed effects Magnus Carlsson Fixed effects, difference-in-differences, and panel data To estimate causal effects, experiments, IV or RD methods are preferred methods ln many applications, however, experiments are impossible and there are simply no good instruments or discontinuities to exploit An alternative is then methods that allow us to at least control for certain types of omitted variables These methods allow us to control for the unobserved factors that are fixed over time or space Fixed effects, difference-in-differences, and panel data: roadmap 1. Fixed effects and panel data 1. Panel data: random vs fixed effects 2. Fixed effects estimation with panel data 3. Pitfalls 4. Fixed effects estimation with other data structures 2. Difference-in-differences 1. Estimation 2. Pitfalls and sensitivity checks Panel data and fixed effects This lecture deals with panel data and fixed effects estimation Panel data follows outcomes and characteristics of individuals at multiple points in time Typically, the sample of individuals N is relatively large, while the number of time periods T over which these individuals are observed is generally short. Fixed effects refer to one way of analyzing panel data but the fixed effects approach can be applied to other data structures as well This includes family-level data, twin data, etc., without a time dimension The simplest case: running OLS on panel data One can always analyze panel data by just pooling observations over time and running an OLS (and treating all observations as independent): π‘Œπ‘–π‘‘ = 𝛼 + 𝑋𝑖𝑑 𝛽 + πœ€π‘–π‘‘ 𝑖 = 1, … , 𝑁 𝑑 = 1, … , 𝑇, which contains N individuals, which are observed over T periods. The pooled model only provides consistent estimators for 𝛼 and 𝛽 if the zero conditional mean assumption E πœ€π‘–π‘‘ 𝑋𝑖1 , … , 𝑋𝑖𝑇 = 0 is satisfied. Violation of this assumption causes the estimators to be biased and inconsistent. The panel data model Consider the basic linear panel data model, where we instead follow individuals over time: π‘Œπ‘–π‘‘ = 𝛼 + 𝑋𝑖𝑑 𝛽 + πœπ‘–π‘‘ 𝑖 = 1, … , 𝑁 𝑑 = 1, … , 𝑇,. ln panel data models, we assume that the error term πœπ‘–π‘‘ can be divided into two parts that enter additively, so that: πœπ‘–π‘‘ = πœ‚π‘– + πœ€π‘–π‘‘ (1) Note that the πœ‚π‘– this expression occurs without a time subscript! ln this specification, the error term is thus divided into one part that does not vary over time, πœ‚π‘– , and one part that does vary over time πœ€π‘–π‘‘ The panel data model What is then πœ‚π‘– ? It reflects factors that are unobserved by the econometrician and that do not vary over time for the individual This may include factors such as genes, early childhood environment, parental background, certain personality traits, β€œdeep” preferences, etc. The assumptions we make about πœ‚π‘– determines what type of panel data model to use The panel data model: random effects Panel data models are analyzed as either random effects models or fixed effects models ln the random effect model, we assume that: E πœ‚π‘– 𝑋𝑖1 , … , 𝑋𝑖𝑇 = 0 (2) Here, we assume that the unobserved factors that are fixed over time are independent of the value of the X variables for all time periods Is the random effects assumption realistic? The panel data model: random effects Note that the random effects assumption is similar to the zero conditional mean assumption The assumption says that unobserved, time-invariant, factors such as ability, preferences, parental background are independent of all included X variables For instance, if Y denotes earnings and X schooling, it says that such unobserved factors are independent of the level of schooling But this is exactly what we (often) do not believe and what we want to address! The panel data model: fixed effects In the fixed effect model, we relax the assumption and allow for: E πœ‚π‘– 𝑋𝑖1 , … , 𝑋𝑖𝑇 β‰  0 (3) Here, we assume that the unobserved factors that are fixed over time are not independent of the value of the X variables for all time periods Without an experiment, this is the more realistic case As we will see, even if we allow for this particular type of break of the zero conditonal mean assumption, the fixed effects model may still give us consistent estimates of the causal effect! The fixed effects model To get more specific, consider the fixed-effect model: π‘Œπ‘–π‘‘ = 𝛼 + 𝑋𝑖𝑑 𝛽 + πœ‚π‘– + πœ€π‘–π‘‘ , where 𝑋𝑖𝑑 is a vector of exogenous regressors and πœ€π‘–π‘‘ is independent over time and across individuals. By the fixed effects assumption, we do not rule out correlation between πœ‚π‘– and 𝑋𝑖𝑑. As an example, πœ‚π‘– could represent unobserved ability (at least the part of it that does not vary over time) Estimating the fixed effects model ln the specification above, the πœ‚π‘– is a constant (or β€œfixed”) for each 𝑖 and therefore looks like a β€œdummy” variable for each 𝑖! What if we could get rid of the πœ‚π‘– ? Thus, by including a dummy variable for each 𝑖 in the regression, we could control for πœ‚π‘– But note that there are as many πœ‚π‘– parameters as individuals, which could mean thousands of πœ‚π‘– to estimate! Estimating the fixed effects model Even if we cannot estimate all the πœ‚π‘– parameters, we can get rid of them using within estimation. We use a trick, where it turns out that including a dummy variable for each 𝑖 is algebraically the same as estimation in deviations from means To implement the trick, we first calculate the individual-specific averages over time, so that: π‘Œπ‘– = 𝑋𝑖 𝛽 + πœ‚π‘– + πœ€π‘– where 1 𝑇 1 𝑇 1 𝑇 1 𝑇 π‘Œπ‘– = σ𝑑=1 π‘Œπ‘–π‘‘ 𝑋𝑖 = σ𝑑=1 𝑋𝑖𝑑 πœ€π‘– = σ𝑑=1 πœ€π‘–π‘‘ πœ‚π‘– = σ𝑑=1 πœ‚π‘–π‘‘ 𝑇 𝑇 𝑇 𝑇 Estimating the fixed effects model β€œthe trick” Next subtract π‘Œπ‘– from π‘Œπ‘–π‘‘ : π‘Œπ‘–π‘‘ βˆ’ π‘Œπ‘– = 𝑋෨𝑖𝑑 𝛽 + πœ‚π‘– + πœ€π‘–π‘‘ βˆ’ 𝑋𝑖 𝛽 βˆ’ πœ‚π‘– βˆ’ πœ€π‘– = 𝑋𝑖𝑑 βˆ’ 𝑋𝑖 𝛽 + (πœ€π‘–π‘‘ βˆ’ πœ€π‘– ). Note now that πœ‚π‘– = πœ‚π‘– since πœ‚π‘– is always the same across time periods. This implies that we get the specification π‘Œΰ·¨π‘–π‘‘ = 𝑋෨𝑖𝑑 𝛽 + πœ€π‘–π‘‘ ǁ 𝑖 = 1, … , 𝑁 t = 1, … , 𝑇 with π‘Œΰ·¨π‘–π‘‘ = π‘Œπ‘–π‘‘ βˆ’ π‘Œπ‘– 𝑋෨𝑖𝑑 = 𝑋𝑖𝑑 βˆ’ 𝑋𝑖 πœ€π‘–π‘‘ ǁ = πœ€π‘–π‘‘ βˆ’ πœ€π‘– The within estimator π›½αˆ˜π‘€π‘–π‘‘β„Žπ‘–π‘› is then obtained by applying OLS. ln this specification, we got rid of πœ‚π‘– , i.e. the fixed effects! Interpreting the fixed effects model Removing πœ‚π‘– , i.e. the fixed effects, means that we implicitly control for all individual-specific factorsβ€”whether observable or unobservableβ€”that are constant over time Thus we have removed a potentially large source of omitted variables bias We can do this even though we may not ever be able to observe or measure these unobserved and time-constant individual-specific factors. We now interpret the estimated effect, 𝛽, as the effect of a within-unit change in treatment. For this reason, the FE estimator is also called the within estimator. Some remarks on the fixed effects estimator The parameters 𝛽 are identified due to (within) variation in 𝑋𝑖𝑑 over time. Estimators for πœ‚π‘– and 𝛽 are consistent if the asymptotics imply that 𝑇 becomes large. If instead 𝑇 is fixed and 𝑁 goes to infinity, only π›½αˆ˜π‘€π‘–π‘‘β„Žπ‘–π‘› is consistent, but πœ‚π‘– is not (so called incidental parameters). If 𝑁 is not too large, one could simply include dummy variables for each individual and estimate the original model by OLS. This provides the within estimators and πœ‚ΖΈ 𝑖 in a single step. An alternative way to β€œkill” the fixed effects: the first-differences estimator Instead of the within estimation procedure, one could also use first- differences over time: π‘Œπ‘–π‘‘ βˆ’ π‘Œπ‘–π‘‘βˆ’1 = 𝑋𝑖𝑑 𝛽 + πœ‚π‘– + πœ€π‘–π‘‘ βˆ’ π‘‹π‘–π‘‘βˆ’1 𝛽 βˆ’ πœ‚π‘– βˆ’ πœ€π‘–π‘‘βˆ’1 = 𝑋𝑖𝑑 βˆ’ π‘‹π‘–π‘‘βˆ’1 𝛽 + (πœ€π‘–π‘‘ βˆ’ πœ€π‘–π‘‘βˆ’1 ) 𝑑 = 2…,𝑇 or Ξ”π‘Œπ‘–π‘‘ = Δ𝑋𝑖𝑑 𝛽 + Ξ”πœ€π‘–π‘‘ , where taking first-differences eliminates πœ‚π‘– from the model. If we perform OLS we obtain the first-difference estimator π›½αˆ˜π‘“π‘‘π‘–π‘“π‘“. What additional assumptions are needed for the FE or first-difference model? So far we only addressed the assumptions about πœ‚π‘–. But what about πœ€π‘–π‘‘ , i.e. unobserved factors that are allowed to vary over time? For both the first-differences and within estimator to provide consistent estimates, we now need the regressors to be strictly exogenous: 𝐸 πœ€π‘–π‘‘ 𝑋𝑖1 , … , 𝑋𝑖𝑇 , πœ‚π‘– = 0 𝑖 = 1…,𝑁 𝑑 = 1…,𝑇 The strict exogeneity assumption The strict exogeneity assumption is a version of the zero conditional mean assumption lt says that the part of the error term that is allowed to vary over time, πœ€π‘–π‘‘ , must be unrelated to the value of the treatment indicator or other control variables in any time period It would typically fail, if there is some time-specific unobserved shock that affect both the outcome and our X variable of interest What looks like an effect of X on Y may then simply reflect the influence of this shock The fixed effects model: Angrist and Pischke In the book (MHE), the authors use potential outcomes notation and use the example of the effect of unionship on wages Let π‘Œπ‘–π‘‘ equal the (log) earnings of worker 𝑖 at time 𝑑 and let 𝐷𝑖𝑑 denote his union status. The observed π‘Œπ‘–π‘‘ is either π‘Œ0𝑖𝑑 or π‘Œ1𝑖𝑑 , depending on union status. Suppose further that: 𝐸(π‘Œ0𝑑 |𝐴𝑖 ; 𝑋𝑖𝑑 , 𝑑, 𝐷𝑖𝑑 ) = 𝐸(π‘Œ0𝑑 |𝐴𝑖 ; 𝑋𝑖𝑑 , 𝑑) (4) The fixed effects model: Angrist and Pischke The expression above says that the potential outcome as untreated is independent of actual treatment status, conditional on unobserved worker ability, 𝐴𝑖 , and other observed covariates 𝑋𝑖𝑑 , and time 𝑑 ln other words, union status is as good as randomly assigned conditional on these factors If this is true, we would be able to get a consistent estimate of 𝐷𝑖𝑑 if we could control for, or somehow account for 𝐴𝑖. With a fixed effects model, we can accomplish this, as long as the unobserved factors is constant over time. Example: Freeman (1984), returns to union membership Freeman (1984) estimates the effect of union membership on wages Ideally, we would like to observe each individual’s potential outcome with and without union membership ln general, could we get at the potential outcomes by just observing the wages of members and non-members? If not, maybe the unobserved differences between members and non-members are constant over time? β€œWhatever makes us special is timeless” (Angrist and Kreuger 1998) Example: Freeman (1984), returns to union membership Example: Freeman (1984), returns to union membership Why are Freeman’s fixed effects estimates smaller than his cross- sectional estimates? Two explanations: 1. The fixed effect estimates are closer to the β€œtrue” causal effect of union membership. This would suggest that the effect was overestimated in the cross-sectional estimates 2. There are measurement errors in the union status variable. Unfortunately, the role of measurement errors normally becomes exaggerated in the fixed-effects model. The measurement error problem brings us to the potential pitfalls of the fixed effects method. Pitfalls of the fixed effects approach The measurement error problem The intuition for the measurement errors problem is that fixed effects models restricts the variation in the Xs to within individuals This also means that the fraction of the variation that reflects measurement errors may increase It can be shown that the downward bias that results from β€œclassical” measurement error is greater in fixed effects models than in OLS The downward bias gets stronger, the stronger the correlation is between the x-variables in different periods Pitfalls of the fixed effects approach Impossible to estimate the effect of time-invariant regressors. Why? Because the deviation from the individual-specific mean will always be zero for such a variable We can therefore not estimate the effect of time-invariant factors such as gender, ethnicity, education (at least as an adult), etc. With a random effects specification we can estimate the time-invariant factors, but the problem is that the underlying assumptions of the RE-model are unrealistic in most applications Pitfalls of the fixed effects approach The effect is only identified for those who actually change treatment status The ones who do not change treatment status do not contribute to the estimates, since we are relying on within variation. Problem is that the sample who actually changes status in the causal variable of interest may be selective This may make it difficult to compare the OLS estimates with the FE estimates, since OLS exploits all observations Pitfalls of the fixed effects approach Violation of the strict exogeneity assumption ln many applications, the strict exogeneity assumption may be criticized. Selection into treatment may be based on unobserved factors that do vary over time, such as shocks, which would violate the strict exogeneity assumption. Thus, we have: 𝐸 πœ€π‘–π‘‘ 𝑋𝑖1 , … , 𝑋𝑖𝑇 , πœ‚π‘– β‰  0 (7) Fixed effects without a time dimension: exploiting family data such as siblings and twins The fixed effects approach does not require a time dimension! As long as important unobserved variables are shared by some group of individuals, they can be cancelled out using a fixed effects approach: Examples: 1. Twins: identical twins (monozygotic) twins share genetics and family background 2. Siblings: shares some genetics and family background Example 1 of fixed effects with twins: The effects of birth weight over the life cycle (Bharadwaj, Lundborg, Rooth (2015). What is the effect of early childhood health over the life cycle? The role of birth weight over the life cycle is examined in order to answer questions about the persistence of health inequalities at birth. With detailed register data on income, welfare payments etc, the extent to which lower birth weight children ever catch up to their heavier counterparts is examined. Example 1 of fixed effects with twins: Note that identifying the effect of birth weight is normally very difficult If one compares the birth weight of babies born in different families, there is a concern that those with low birth weight differ in unobservable ways from those with high birth weight For instance, those with low birth weight are more likely to be born in poorer families or in less healthy families To the extent that such factors are unobserved, it is likely that the birth weight coefficient partly or fully simply pick up such unobserved background factors We thus have an omitted variables problem (or β€œendogeneity” problem) Example 1 of fixed effects with twins: ln order to deal with omitted variables, data on the birth weight of almost all twins born in Sweden between 1926-1958 is exploited. These data are linked to register data on incomes and education from 1968 and onwards. Empirical specification: π‘Œπ‘–π‘—π‘‘ = 𝛽𝑑 π΅π‘Šπ‘–π‘— + πœ‚π‘— + πœ€π‘–π‘—π‘‘ , where π‘Œπ‘–π‘—π‘‘ is log income of twin 𝑖 in twin pair 𝑗 at time 𝑑, 𝛽𝑑 is the coefficient of the birth weight variable π΅π‘Šπ‘–π‘— , πœ‚π‘— are twin-pair-specific unobserved factors, and πœ€π‘–π‘—π‘‘ is unobserved factors of twin 𝑖 Example 1 of fixed effects with twins: ln this specification, it is (partly) possible to deal with the omitted variables problem The reason is that the fixed effect πœ‚π‘– , will account for the influence of genetics, the background and behaviors of the parents, and any environmental problems shared by twins, etc. To see this clearly, we can take the difference between twins within twin pairs (where the other twin is 𝑖 β€² ): π‘Œπ‘–π‘—π‘‘ βˆ’ π‘Œπ‘– β€² 𝑗𝑑 = 𝛽𝑑 π΅π‘Šπ‘–π‘— βˆ’ π΅π‘Šπ‘– β€² 𝑗 + πœ‚π‘— βˆ’ πœ‚π‘— + (πœ€π‘–π‘—π‘‘ βˆ’πœ€π‘– ′𝑗𝑑 ) = 𝛽𝑑 π΅π‘Šπ‘–π‘— βˆ’ π΅π‘Šπ‘– β€² 𝑗 + (πœ€π‘–π‘—π‘‘ βˆ’πœ€π‘– β€² 𝑗𝑑 ) Example 1 of fixed effects with twins: The key identifying assumption needed in order to give 𝛽𝑑 a causal interpretation is that π΅π‘Šπ‘–π‘— βˆ’ π΅π‘Šπ‘– β€² 𝑗 is uncorrelated with πœ€π‘–π‘—π‘‘ βˆ’ πœ€π‘– β€² 𝑗𝑑 Is this a strict assumption? Maybe, if we for instance think that parental investments in their kids is a function of birth weight. Or if differences in birth weight are related to differences in cognition. If parents try to compensate for the lower birth weight of one twin, the estimated effect is the effect of birth weight that remains despite parents’ best attempts to compensate Example 1 of fixed effects with twins: results Example 2: Critical periods in the development of cognitive skills and health (van den berg, Lundborg 2014) Very difficult to study the causal effect of poor circumstances during childhood on later life outcomes Periods during childhood in which poor conditions have particularly bad consequences for development are called β€œcritical periods” Some famous evidence comes from studies Romanian orphans, who were rescued from incredibly bad conditions by adoption parents from Western countries These results may reflect self-selection, however, if adoption parents picked out the least β€œdamaged” children from the orphanages Example 2: Critical periods in the development of cognitive skills and health (van den berg et al (2014) Van den Berg et al. (2014) exploit data on immigrant brothers who migrated to Sweden from different countries The study exploits that the brothers migrated from more or less poor conditions to a richer country The brothers entered Sweden at the same point in time, calendar-wise, but at different development stages (ages) By comparing the outcomes of brothers who entered at the same time (fixed effects) but at different stages, we may be able to identify β€œcritical periods” in the development height Critical periods in the development of height at age 18: results Example 2: Critical periods in the development height By imposing brother fixed effects, it is possible to difference out everything unobserved at the family level This includes factors such as reasons for migrating, parental background, language, which all may be important omitted variables Brother fixed effects also address the self-selection of the families who migrate, since the unobserved selection factors are at the family level! The results show larger effects if migrating from poorer countries, as one would expect Summary fixed effects A fixed effects estimator allows us to control for certain types of omitted variables The method allows us to control for the unobserved factors that are fixed over time (time-invariant) or space The method cannot account for the influence of unobserved factors that vary over time (or space) A useful application of fixed effects estimators is on siblings or twin data Appendix: Panel data and fixed effects Comparing the first-differences and the within estimators If 𝑇 = 2, the within estimator and first-difference estimator are the same. To see this note that the within model computes individual means: 1 𝑋𝑖 = (𝑋𝑖1 + 𝑋𝑖2 ) (5) 2 And, in mean deviation form, we have: 1 1 1 ΰ·¨ 𝑋𝑖2 = 𝑋𝑖2 βˆ’ 𝑋𝑖 = 𝑋𝑖2 βˆ’ 𝑋𝑖1 + 𝑋𝑖2 = 𝑋𝑖2 βˆ’ 𝑋𝑖1 , (6) 2 2 2 which will always be half the first-difference. So effectively the same model. Comparing the first-differences and the within estimators If π‘ˆπ‘–π‘‘ is uncorrelated over time, the within estimator is more efficient than the first-difference estimator. If the π‘ˆπ‘–π‘‘ are serially correlated, the first-difference estimator is more efficient. If strict exogeneity is violated, the first-difference estimator and the within estimator become both inconsistent and have different probability limits.

Use Quizgecko on...
Browser
Browser