Econometrics - Final PDF
Document Details
Uploaded by WelcomeTropicalIsland8062
Copenhagen Business School
Tags
Summary
This document provides an overview of econometrics concepts, including adjustments for clustered standard errors, endogeneity, omitted variables, proxy variables, and functional form misspecification. It also covers topics such as simultaneity, measurement error, and instrumental variable estimation. Finally, it details IV Estimation of the Multiple Regression Model, and Two Stage Least Squares (2SLS) and potential issues with those procedures.
Full Transcript
Econometrics - final remember: Adjustments for Clustered Standard Errors (cluster(id)): The regression uses the cluster(id) option, which adjusts standard errors for within- student correlation. This accounts for potential dependence of observations within each cluster (i.e., the same student obs...
Econometrics - final remember: Adjustments for Clustered Standard Errors (cluster(id)): The regression uses the cluster(id) option, which adjusts standard errors for within- student correlation. This accounts for potential dependence of observations within each cluster (i.e., the same student observed across two terms). Why?: Because students are observed more than once (repeated measures), their observations might not be independent. Standard errors are corrected to account for this dependence, ensuring valid inference. Endogeneity x is endogenous if it is correlated with u. Omitted Variables If the omitted variable is correlated with at least one independent variable, this can cause a bias for all estimates. Let's say y^= B0^+B1^x^+B2^x2^ -> we omit x2: y'=B0'+B1'x1' but x2 is correlated to x1: x2'=δ0+δ1x1 -> B1^=β1' +β2^δ1 then Bias of B1'=E(B1')-B1^= B2δ1 Proxy Variable for Unobserved Explanatory Variables we would run the regression x3' = δ0 + δ3x3 + ν3, we should obtain δ3 > 0 where x3' is the unobserved variable, and x3 is its substitute (the proxy) In addition to assuming that u and x1, x2, x3' are uncorrelated, we need that u and x3 are uncorrelated The error ν3 is uncorrelated with x1, x2 and x3 -> E(x3'|x1, x2, x3) = E(x3'|x3) -> E(x3'|x3) =δ0+δ3x3 proxy must be uncorrelated with any other x By combining y= β0 + β1x1 + β2x2 + β3x3 + u x3 = δ0 + δ3x3' + ν3 we obtain: y = (β0 + β3δ0) + β1x1 + β2x2 + β3δ3x3' + u + β3ν3 Functional form misspecification: omission of a relevant variable x1² Solution: Test for functional form (RESET) y= β0 + β1x1 +... + βk xk + u Then we estimate y= β0 + β1x1 +... + βk xk + δ1ˆy2 + δ2ˆy3 + u and perform an F-test for joint significance of ˆy2ˆy3 and H0 : δ1 = δ2 = 0 Simultaneity If an explanatory variable is determined simultaneously with the dependent variable, it is generally correlated with the error terms -> OLS is biased and inconsistent. y1 = α1y2 + β1z1 + u1 y2 = α2y1 + β2z2 + u2 -> we plug in the first equation for y2 into the second: y2 = α2(α1y2 + β1z1 + u1) + β2z2 + u2 ->(1− α2α1)y2 = α2β1z1 + β2z2 + α2u1 + u2 -> y2 = π21z1 + π22z2 + (α2u1 + u2)/(1− α2α1) -> we see that y2 and u1 are correlated if ν2 and u1 are correlated. Since ν2 is linear in u1, it is generally correlated with u1. When is it not correlated? α2 = 0 and u1, u2 are uncorrelated Measurement error in an explanatory variable: We consider the simple regression model: y= β0 + β1x1' + u We do not observe x1' but x1 (e.g. actual and reported income). The measurement error in the population is: e1 = x1− x1' The model can be written as: y= β0 + β1x1 + (u− β1e1) The classical errors-in-variables (CEV) assumption is that the measurement error is uncorrelated with the unobserved explanatory variable: cov(x1',e1) = 0 The above assumption implies that x1 and e1 must be correlated: cov(x1, e1)= σ²(e1) -> cov(x1, u− β1e1)= −β1σ²(e1) The OLS estimator under classical errors-in-variables is biased toward zero: Instrumental Variable estimation A variable z is a candidate for an instrument for a variable x if it satisfies: cov(x, z) ≠ 0 : Relevance and cov(u, z) = 0 : Exogeneity. The IV estimator has an asymptotic normal distribution. IV/2SLS: Uses hat(x), the predicted values of x from the first-stage regression, which depend on the instruments z. If R2(x,z) (the correlation between x and z) is less than 1, the denominator shrinks, leading to larger asymptotic variance for the IV estimator. “poor” instrumental variable (IV): The asymptotic bias in the IV estimator arises when z is not perfectly exogenous or is only weakly correlated with x. -> B1= Cov(z,y) / Cov(z,x) when corr(z,u) = 0 large SE if x and z are only weakly correlated large asymptotic bias if z and u are only weakly correlated This implies that the bias can be large if the population correlation between z and x is small even if the population correlation between z and u is small. IV Estimation of the Multiple Regression Model Two Stage Least Squares (2SLS) the endogenous variable is replaced with its predicted values from the first-stage regression, which are derived using the instrument. The linear combination of z which is most highly correlated with is the linear projection of x on z: The steps in matrix form are 1. First Stage: X^ = Z (Z'Z)^(-1) Z'X, where: Z : Instrument matrix; X : Matrix of endogenous variables 2. Second Stage: Regress Y on X^ X^ consists of the fitted values of the endogenous variables obtained from their regression on the instrumental variables (IVs) and any exogenous variables in the model. 2SLS standard errors are larger than for OLS: In 2SLS, the endogenous variable X is replaced by its predicted values (\hat{X}), obtained from the first-stage regression -> X^ has less variation than the original X because it is a linear combination of the instruments, which cannot capture all variability in X. Increased Multicollinearity in the Second Stage: The predicted values (\hat{X}) are typically more correlated with other regressors than the original X. Multicollinearity inflates the standard errors of the 2SLS estimator. we need at least as many instruments as explanatory variables in the model (order condition) STATA Example: check for multicollinearity - STATA Example:. reg educ meduc age town farm. predict heduc (option xb assumed; fitted values). reg heduc age town farm In the first stage, the endogenous regressor (educ) is regressed on all exogenous variables -> The coefficient on meduc in the reduced form regression is significant, showing that meduc has a partial correlation with educ. This indicates that the instrument meduc is relevant (i.e., it satisfies the instrument relevance condition). -> The fitted values from this first stage, denoted as hat(educ) , represent the part of educ that can be explained by the exogenous variables (like meduc, age, etc.), and does not include the endogenous component that might be correlated with the error term in the structural equation. In the second stage, the fitted values of educ from the first stage are then used in place of the actual educ in the structural equation -> This regression checks whether the predicted values from the reduced form (the fitted values of educ, denoted heduc) are highly correlated with the other exogenous regressors: we check the R^2 Testing for Endogeneity 1. Regression based test, Hausman Structural Equation: y1 = β0 + β1y2 + β2z1 + β3z2 + u1 -> y2 is the suspected endogenous var. Reduced form equation for y2: y2 = π0 + π1z1 + π2z2 + π3z3 + π4z4 + ν2 -> if y2 is endogenous, ν2 must be correlated with u1 -> The key assumption here is that the exogenous variables z_1, z_2, z_3, z_4 are uncorrelated with u1. If y2 is endogenous, part of the variation in y2 (specifically the error term) is correlated with u1. -> u1 = δ1ν2 + e1 -> We then test: H0: δ1=0 Estimate the reduced form for y2 (endogenous variable) and obtain its error term v. Add ν2 to the structural regression and estimate it by OLS. If it is statistically significant from zero, we conclude that y2 is indeed endogenous. ON stata: 2. Durbin-Wu-Hausman (DWH) test If under the H0 all regressors are exogenous but some are endogenous under H1, we can base a test directly on the difference between 2SLS and OLS estimators. 3. Testing Overidentifying restrictions - Sargan Test When you have more instruments than endogenous variables, the model is overidentified. This introduces an opportunity to test whether the instruments are valid. 1. Set Up Two Models: Model 1: A just-identified model (where the number of instruments equals the number of endogenous variables). Model 2: An overidentified model (where the number of instruments exceeds the number of endogenous variables). -> In Model 2, the additional moment conditions (from the extra instruments) impose further restrictions on the estimation. H0: the instruments in Model 2 are valid (i.e., uncorrelated with the error term). Sargan test 1. Run the 2SLS regression using all instruments (Model 2, overidentified). 2. Obtain the residuals from this regression. 3. Regress the residuals on all the instruments. 4. The test statistic is LM = N * R^2 Panel Data Panel data refers to a dataset that combines cross-sectional data (observations across different entities, such as individuals, firms, states, or countries) with time- series data Pooling Independent Cross Sections across Time: Pooled OLS assumes that all observations are independent, ignoring the fact that observations from the same entity (e.g., a specific person, firm, or state) are likely correlated over time. log(wage) = β0 + δ0y85 + β1educ + δ1y85 ∗ educ + β2exper + β3exper2 + β4union + β5female + δ5y85 ∗ female + u The variable y85 is a dummyvariable equal to one if the observation comes from 1985 and zero if it comes from 1978. For the econometric analysis of panel data we cannot assume that observations are independently distributed over time. static linear (in parameters) panel models: ”static” means that the regressors contemporaneously determine the dependent variable Panel or longitudinal data: Consists of a time series for each cross sectional member i=1,…,N in the data set Pooled OLS (POLS): apply OLS to y and X. assumes that there are no unobserved factors specific to individual units (e.g., states, firms) or time periods that influence the dependent variable. In other words, it ignores fixed effects (e.g., state-specific characteristics) or random effects. We make two assumptions: 1. E(Xi'ui) = 0 2. E(Xi'Xi) = A is nonsingular Why don’t we make the homoskedasticity assumption? Highly unlikely to hold in applications with panel data. Repeated observations of the same cross section unit i likely cause error terms to be correlated over time First Differencing (FD) Estimator An approach to remove unobserved time-invariant individual effects (a(i)) that might bias the results. y(it) = β0 + 𝛿0d(2t) + βx(it) + ai + u(it), t = 1,2 d(2t) is a dummy variable for the period To eliminate the time-invariant component (a(i)), we subtract the equation at t=1 from the equation at t=2: y(i2) - y(i1) = (β0 + 𝛿0) - β0 + β1 (x(i2) - x(i1)) + (ai - ai) + (u(i2) - u(i1)) Δyi = 𝛿0 + β1Δxi + Δui -> now the variables are period changes. When we apply OLS to this equation, we call it the first-differenced estimator. The FD estimator is unbiased conditional on X and consistent under the assumption of Strict exogeneity of the regressors conditional on ai : E(uit|Xi,ai)=0 -> rules out that any of the regressors is time constant If u_it is a random walk (u_it = u_i,t-1 + e_it), the original errors are highly serially correlated. This creates problems for Pooled OLS because it assumes uncorrelated errors across time. However, differencing transforms u_it into e_it , which is independent and identically distributed (i.i.d.) -> FD estimator allows for correlation between variables and the time constant part of the error, while POLS does not Fixed Effects (FE) Estimation For each cross-sectional unit i, compute the time average of the model and subtract the averaged equation from the original equation: The time-invariant component (a_i) is removed because it does not vary over time. suppose we have for each t: yit = β1xit + ai + uit Since N time-invariant effects are removed, the degrees of freedom are adjusted: df= N(T - 1) - K In addition to the standard “no perfect multicollinearity” assumption, we need that all regressors are time varying for the estimator to be well behaved -> the regressors are allowed to be correlated with the time constant part of the error (fixed effect) When is the FE estimator unbiased? Strict exogeneity of all explanatory variables: the idiosyncratic error, uit, is uncorrelated with each explanatory variable across all time periods. Serial Correlation: we can test for first order serial correlation by regressing ü(it) on ü(it-1) for any ( t>1 ) and testing H0 : delta = -1/(T-1) where delta is the coefficient on ü(i,t-1) Dummy Variable (DV) regression: alternative approach to estimating fixed effects (FE) 3. Dummy Variable (DV) Regression: Instead of eliminating ai using transformations (like demeaning or differencing in the typical FE model), we estimate it directly by: Creating N dummy variables (one unobserved fixed effect for each individual i). Adding these dummy variables to the regression. The resulting parameters are estimated using pooled OLS -> This makes a total of N + K parameters to estimate, where K is the number of independent variables. FE versus Dummy Variable: With N dummy variables (one for each unit), the DV model becomes high- dimensional DV regression also estimates (but imprecisely) the coefficients on time constant variables. (The issue arises from collinearity between the time-constant variables and the group dummies) DV regression has a large R^2 Identical results if two periods (if FE model includes dummy for the second time period, which is the constant in the FD model). If positive serial correlation in uit (random walk), the differenced error ∆uit is serially uncorrelated -> Use FD; -> If there is substantial negative serial correlation in ∆uit use FE! Clustering Techniques: Instead of assigning a unique dummy variable for every i , individuals with similar fixed effects ( ai ) can be clustered together. -> This reduces the number of unique parameters to estimate, simplifying the model and reducing dimensionality. When we “cluster” by a variable (e.g., district), we assume that: Observations within each cluster (district) may have correlated errors. Example: Socioeconomic factors or local policies affecting crime rates might create dependencies in errors for observations from the same district. Observations across clusters (different districts) are assumed to have independent errors. Clustering helps improve the reliability of statistical inference by addressing two issues: a. Within-Cluster Error Correlation If the errors ( u_{it} ) are correlated within districts, standard OLS would underestimate standard errors, leading to overly optimistic (small) p -values. -> Clustering accounts for this by allowing residuals within each district to be correlated, resulting in adjusted standard errors that are typically larger and more realistic. b. Heteroskedasticity Across Clusters Variability in errors might differ across districts (e.g., districts with different populations or policies may show different error variances). -> Clustering ensures standard errors are robust to such heteroskedasticity. Random Effects Models: it allows for time constant explanatory variables The random effect ai is uncorrelated with all explanatory variables (main disadvantage) Estimating: Pooled OLS with time period dummies is consistent but it ignores a feature of the model. Random Effects Estimator: The RE estimator adjusts for the correlation between ai and u(it) by modelling the combined error term: v_it = a_i + u_it. The RE model uses Generalized Least Squares (GLS) Advantage of the RE model: it allows for time constant explanatory variables RE model only addresses the serial correlation but not possible endogeneities (correlation of regressors and fixed effects). -> Generalized Least Squares (GLS) Transformation In a random effects model, the composite error term v_{_it} = a_i + u_{it} has serial correlation due to the individual-specific component a_i. To eliminate this correlation, the GLS transformation is applied. The term v_{it} - \lambda bar{v}_i removes the serial correlation introduced by a_i. Once the data is transformed, OLS is applied to the transformed equation, which now satisfies the assumptions of the classical linear regression model. Choice of Model, Specification Testing General thoughts: FE and DV allow for correlation between a_i and x_i. RE is applied if main interest on a time constant variable. RE is generally more efficient than pooled OLS. If ai is not correlated with all (or a set of important) x_i, use RE rather than FE. Subset F-Test: evaluates whether individual-specific effects (a_i) vary across individuals. Null hypothesis ( H_0 ): a1 = a2 =.. = aN , implying no individual-specific variation (pooled OLS is sufficient). Alternative hypothesis ( H_1 ): At least some a_i ≠ a_j , requiring individual-specific fixed effects (dummy variables for each unit). Approach: Compare the pooled OLS model (restricted) with the dummy variable (DV) estimator model (unrestricted). Breusch-Pagan Lagrange Multiplier test: Pooled OLS vs. FE/RE Model Null Hypothesis (H_0): σa = 0 -> No individual-specific random effects. This means there is no unobserved individual heterogeneity, so the RE model simplifies to the pooled OLS model. Pooled OLS assumes all individual differences (across groups) are captured entirely by the included explanatory variables. 1. Run a Pooled OLS Model:Estimate the pooled OLS regression and compute the residuals (v^_it) These residuals reflect the variation not explained by the pooled OLS model. 2. Formulate the LM Test Statistic: -> Rejects pooled OLS but tells us nothing about whether FE or RE. Hausman Test: used to decide between using a random effects (RE) or fixed effects (FE) estimate the regression model using both The test evaluates whether the RE estimator is consistent by comparing the Betas from FE ( b ) and RE ( B ). H0 : ai and xitj are uncorrelated f or all i, j -> The RE model is appropriate (difference in coefficients is not systematic, i.e., RE is consistent). H1 : ai and xitj are correlated f or some i, j -> If rejected, The FE model is appropriate (difference in coefficients is systematic, i.e., RE is inconsistent because the random effects are correlated with the regressors). Caveats: \1) if x(it,j) is not strictly exogenous, the two estimators are inconsistent 2) if the specification of the error structure in the RE model is incorrect, the resulting size of W can be greater 3) Test assumes that all regressors are time varying. Sargan Test for overidentifying restrictions: POLS and RE models assume that the unobserved individual effects ( a_i ) areuncorrelated with the regressors ( X_i ): E(a_i | X_i) = E(a_i) = 0 This assumption imposes additional moment restrictions compared to the fixed- effects (FE) model. The Sargan test evaluates the validity of these restrictions by introducing time- averaged regressors (like seasonbar) into the model and testing whether their coefficients are zero: H_0: 𝛿 = 0; -> 𝛿 represents the coefficients on the time averages of regressors, like seasonbar. If H_0 is rejected, it suggests that the RE assumptions are violated because the unobserved effects ( a_i ) are not independent of the regressors. Endogeneity -> using IV estimation: It requires the availability of at least K valid instruments P2SLS (Pooled Two-Stage Least Squares): A method that can replace POLS to account for endogeneity by using instruments in two stages. REIV (Random Effects IV): A method that extends the RE model by using a system 2SLS estimator to obtain the appropriate weighting factors. This method uses instruments with transformed data and residuals from the first stage of 2SLS. FEIV (Fixed Effects IV): An extension of the FE model that applies P2SLS to the demeaned (within) equation. This method uses time-varying instruments that are transformed (demeaned) to address the endogeneity issue. Remark: Unbalanced Panel In many data sets, information on some cross sectional units (individuals, firms) is missing for some time periods. Systematic missingness (attrition): Data is missing for reasons related to the cross- sectional unit’s characteristics, e.g., unprofitable firms leaving the market. This introduces a sample selection problem, leading to biased estimates. the reason for missing data must not be correlated with the error term ( u_{it} ). Binary Dependent Variable Models A Limited Dependent Variable is defined as a dependent variable whose values are substantially restricted -> βj can no longer be interpreted as the change in y given a one unit change in xj. linear probability model (LPM): The probability of “success” (y=1) is a linear function of β measuring Goodness of fit in the LPM: percent correctly predicted: ỹ=1 if ŷ>0.5 and compare ỹ to y The percent correctly predicted for the entire sample is the total number of correct predictions (true positives and true negatives) divided by the total number of observations Drawback: It allows the fitted values (P^(y = 1 | x) ) to go below 0 or above 1 and partial effects to exceed 1. P(y=1|x) is supposed to represent a probability, and probabilities are restricted to the range [0,1]. However, because the LPM is a linear regression, it does not impose this restriction. Logit and Probit Models Our interest lies primarily in modelling the conditional response probability P(y = 1|x). Logit and probit models can be derived from an underlying latent variable model: y∗ = β0 + xβ + e, y = 1[y∗> 0] y^* is a latent (unobserved) variable that determines the binary outcome y. The likelihood function for the probit model is constructed based on the joint probability of observing the data given the model parameters beta. ensures that predicted probabilities are always between 0 and 1 because it uses a CDF. allows marginal effects to vary with X , reflecting the realistic nature of probabilities that change non-linearly. accounts for heteroskedasticity because the error term follows a standard normal distribution. Interpreting the relative effect of two variables xj and xh is the ratio of the two partial effects: βj /βh Because g() depends on the specific value of x , marginal effects are not constant in nonlinear models Maximum Likelihood Estimation: MLE takes into account the distributional assumptions, so it automatically accounts for heteroscedasticity. The log likelihood function of observation i is obtained by taking the log: Interpretation of Logit and Probit Estimates pseudo R-squared: Pseudo - R^2 = 1 - (LUR/LR) provides a measure of how well the explanatory variables in the model explain the variation in the dependent variable. calculated using the log-likelihood values from the restricted (only intercept) model and the unrestricted (full, with explanatory variables) model: Partial effects: Binary Dependent Variable Models - Advanced Topics Multiple equations or decisions: bivariate probit model, which is a statistical approach used to model joint binary outcomes that might be correlated: The decisions are correlated when the covariance of the errors is non zero (p≠0). marginal effects: Joint estimation Testing for zero correlation: p=0 Common set of regressors x. Two separate models impose the restriction: p=0, if it is true, then the sum of the log-likelihood for the two models (i.e. logL1+logL2) is the log-likelihood of the bivariate model. Any increase in the likelihood is therefore when p≠0. Testing for zero correlation/ Test for Heteroskedasticity: Two separate models impose the restriction rho = 0, thus if it is true, then the sum of the log-likelihood for the two models (i.e. logL1+logL2) is the log-likelihood of the bivariate model. Any increase in the likelihood is therefore when rho ≠ 0. Use LR test with one exclusion restriction: Omitted variables/neglected heterogeneity The distribution of the latent model error term e could be misspecified, e.g. not normal in a probit model or due to omission of variables. the structural partial effect is Endogeneity Latent error not independent of the regressors. methods to correct for endogenous explanatory variables: Simplest: Use LPM and estimate by 2SLS. When Probit model and continuous endogeneous regressors: conditional maximum likelihood (CMLE). sometimes called IV Probit (as in STATA: ivprobit). -> for binary endogenous regressor: heteroskedasticity of the latent model error term an extension of the standard probit model, where the error variance is allowed to depend on a specified variable models the log of the error variance as a linear function of income: The standard deviation of e|x is exp(x1𝛿) Likelihood Ratio (LR) Test is used in the heteroskedastic probit model to determine whether heteroskedasticity is present in the data. The test compares two models: 1. Restricted Model (Homoskedastic Probit Model): Assumes constant variance (\ln(\sigma^2) = 0), meaning the error variance does not depend on any covariates. 2. Unrestricted Model (Heteroskedastic Probit Model): Allows the error variance to depend on a covariate (in this case, income). The LR test statistic is: R = 2 ( LL_{unrestricted} - LL_{restricted})