Unit 3: Multi Regression Model PDF

Unit 3: Multi regression model Martin / Zulehner: Introductory Econometrics 1 / 75 Outline 1 Introduction and interpretation of MLR 2 OLS estimator Assumptions OLS Estimator Partitioned Regression Omitted Variable Bias Unbiasedness 3 Variance of the OLS estimator Assumption Variance estimation Properties 4 Goodness of Fit 5 Exercises Exercise 1: derive OLS estimator, mean & variance of OLS Exercise 2: omitted variable bias Exercise 3: best linear prediction Exercise 4: Frisch-Waugh (1933) Theorem Exercise 5: The CEF-Decomposition Property Exercise 6: What is the direction of the bias? Exercise 7: Examples from daily live Martin / Zulehner: Introductory Econometrics 2 / 75 1. The Multiple Linear Regression (MLR) model The major drawback of the SLR is that the key assumption SLR.4 is often unrealistic The MLR allows explicitly to control for many other factors Therefore, it is more sensible to talk about ceteris paribus analysis The model with k independent variables is given by y = β0 + β1 x1 + β2 x2 + β3 x3 +... + βK xK + u (1) The key assumption of the MLR is E (u|x1 , x2 , x3 ,... , xK ) = 0 (2) Martin / Zulehner: Introductory Econometrics 3 / 75 Specification There exists a linear relationship (in the parameters!) between the k observable exogenous variables x1 , x2 ,... xk (the regressors) and the observable endogenous variable y Equation (1) is still fairly general, as x can include nonlinear functions of underlying variables (e.g. logarithms, squares, reciprocals, log-odds, and interactions) The explanatory variables xj (j = 1,... , K ) influence y and not the other way around The correlation among the explanatory variable is not perfect. There exist further unobservable variables, which influence y non systematically, and are combined in u Martin / Zulehner: Introductory Econometrics 4 / 75 Specification - Random Sample In what follows, we assume again that we can collect a i.i.d. random sample from the underlying population. Given randomly sampled observations Hence, for individual observation units i = 1,... , N from the sample, it means: yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +... + βK xiK + ui (3) Martin / Zulehner: Introductory Econometrics 5 / 75 Specification - System We therefore get the following system of equations: y1 = β0 + β1 x11 + β2 x12 + β3 x13 +... + βK x1K + u1 y2 = β0 + β1 x21 + β2 x22 + β3 x23 +... + βK x2K + u2... yi = β0 + β1 xi1 + β2 xi2 + β3 xi3 +... + βK xiK + ui... yN = β0 + β1 xN1 + β2 xN2 + β3 xN3 +... + βK xNK + uN Martin / Zulehner: Introductory Econometrics 6 / 75 Specification - Matrix form We can express it in matrix notation: y = Xβ + u (4) y : (Row × Column)(N × 1) vector of the dependent variable u : (N × 1) vector of the error term β : ((K + 1) × 1) vector of the unknown parameter X : (N × (K + 1)) matrix of the regressors, where the first column only consists of ones (i.e. for notational convenience, we absorb the intercept into the matrix X) Martin / Zulehner: Introductory Econometrics 7 / 75 Specification - Matrix form u1 β1       y1  y2   u2   β2  ..  ..        ... .  .  y =  yi  ; u =   ; β =         u i  βi  .    .    .  ..  ..  ..  yN uN βK 1 x11 x12... x1K   1 x21 x22... x2K  ..........    .....  X= 1   xi1 xi2... xiK  .........  ......  1 xN1 xN2... xNK Martin / Zulehner: Introductory Econometrics 8 / 75 The MLR - An example We are interested in the effect of education (years of schooling) on hourly wage: wage = β0 + β1 educ + β2 exper + u (5) We control for years of labor market experience (exper ) We are still primarily interested in the effect of education The MLR takes exper out of the error term u In the SLR we would have to assume that experience is uncorrelated with education (since experience would be part of the error term!) β1 measures the ceteris paribus effect of educ on wage; we hold exper constant β2 measures the ceteris paribus effect of exper on wage; we hold educ constant Martin / Zulehner: Introductory Econometrics 9 / 75 Interpreting Multiple Regression Take the fitted values for the MRL model (1) ŷ = β̂0 + β̂1 x1 + β̂2 x2 +... + β̂K xK Consider the changes in the variables (i.e. the ∆ operator). Thus: ∆ŷ = β̂1 ∆x1 + β̂2 ∆x2 +... + β̂K ∆xK Holding x2 ,... , xK fixed (which means ∆x2 = 0,... , ∆xK = 0) implies that ∆ŷ = ∆β̂1 x1 , that is each β has a ceteris paribus interpretation Martin / Zulehner: Introductory Econometrics 10 / 75 Simple vs. Multiple Regression Estimate Compare the simple regression: ỹ = β̃0 + β̃1 x1 with the multiple regression: ŷ = β̂0 + β̂1 x1 + β̂2 x2 In general β̃1 6= β̂1 unless: 1 β̂2 = 0 (i.e. no partial effect of x2 ) or 2 x1 and x2 are perfectly uncorrelated in the sample Martin / Zulehner: Introductory Econometrics 11 / 75 2. The OLS estimator Assumption MLR.1 - Linear in parameters The model in the population can be written as y = β0 + β1 x1 + β2 x2 + β3 x3 +... + βK xK + u (6) where β0 , β1 ,... , βK are the unknown parameters (constants) of interest and u is an unobservable random error or disturbance term This assumption describes the population relationship we hope to estimate. It explicitly sets out the βj - the ceteris paribus population effects of the xj on y - as the parameters of interest Martin / Zulehner: Introductory Econometrics 12 / 75 Assumption MLR.2 - Random sampling We have a random sample (i.i.d.) of N observations, {(xi1 , xi2 ,... , xiK , yi ) : i = 1, 2,... , N}, following the population model in assumption MLR.1 This random sampling assumption means that we have data that can be used to estimate the βj , and that the data have been chosen to be representative of the population described in assumption MLR.1 Martin / Zulehner: Introductory Econometrics 13 / 75 Assumption MLR.3 - No perfect collinearity In the sample (and therefore in the population), none of the K independent variables is constant, and there are no exact linear relationships among the independent variables If we have sample variation in each independent variable and no exact linear relationships among them, we can compute β In matrix notation this condition can be expressed as: rank(X) = rank(X0 X) = K + 1 It means that the K + 1 columns of the matrix X are linearly independent (i.e. the K explanatory variable are independent among each other and independent of the constant) This is very important because it allows do define (X0 X)−1 Martin / Zulehner: Introductory Econometrics 14 / 75 Assumption MLR.4 - Zero conditional mean The error u has an expected value of zero and it is uncorrelated with each explanatory variables: E (X0 u) = E (u) = 0 (7) Assuming that unobservables are, on average, unrelated to the explanatory variables is key to derive the unbiasedness of the OLS estimator It fails in the case of (i) omitted variable, (ii) misspecified functional form (simultaneity), and (iii) measurement error Sufficient for (7) is the stronger zero conditional mean assumption E (u|X) = 0 (8) Martin / Zulehner: Introductory Econometrics 15 / 75 OLS Estimator The objective of the Ordinary P Least Squares (OLS) - estimator is again to minimize the sum of squared residuals ( N 2 i=1 ûi → min!) The sum of squared residuals can be expressed in matrix notation: N X ûi2 = û0 û = (y − Xβ̂)0 (y − Xβ̂) (9) i=1 The OLS estimator for the parameter vector β̂ is a linear combination of X and y, resulting from the first order condition for the minimum: β̂ = (X0 X)−1 X0 y (10) The matrix X0 X has a unique solution if its determinant is unequal to zero. Hence, β̂ can be defined if and only if det(X0 X) 6= 0 Martin / Zulehner: Introductory Econometrics 16 / 75 Derivation The objective function of our minimization problem is therefore: û0 û = (y − Xβ̂)0 (y − Xβ̂) 0 0 = y0 y − β̂ X0 y − y0 Xβ̂ + β̂ X0 Xβ̂ 0 0 = y0 y − 2β̂ X0 y + β̂ X0 Xβ̂ 0 since y0 Xβ̂ = (y0 Xβ̂)0 = β̂ X0 y ^ is: The first order condition with respect to β ∂(û0 û) = −2X0 y + 2(X0 X)β̂ = 0 ∂ β̂ → β̂ = (X0 X)−1 X0 y if rank(X) = K + 1 → û = y − Xβ̂ = y − X(X0 X)−1 X0 y = [IN − PX ]y = MX y I properties of PX : PX X = X and PX PX = PX I properties of MX : MX X = 0 and MX 0 = MX and MX MX = MX Martin / Zulehner: Introductory Econometrics 17 / 75 Geometric interpretation of OLS estimator OLS estimation can be viewed as a projection onto the linear space spanned by the regressors. (Here each of X1 and X2 refers to a column of the data matrix.) Source: Wikipedia https://shorturl.at/b2me8 Martin / Zulehner: Introductory Econometrics 18 / 75 Partitioned Regression Imagine that you can partition the set of regressors into two groups (e.g. (constant, time trend, seasonal effects) (other variables)) y = Xβ + u = X1 β1 + X2 β2 + u By slightly re-arranging the OLS estimator we obtain: (X0 X)β̂ = X0 y If X = [X1 , X2 ] then: X1 0 X1 X1 0 X2 X1 0 y 0 0 (X X) = and (X y) = X2 0 X1 X2 0 X2 X2 0 y Martin / Zulehner: Introductory Econometrics 19 / 75 The Frisch-Waugh (1933) Theorem The OLS estimator can then be written as X1 0 X1 X1 0 X2 βˆ1 X1 0 y 0 (X X)β̂ = = X2 0 X1 X2 0 X2 βˆ2 X2 0 y Which can be re-written as: X1 0 X1 βˆ1 + X1 0 X2 βˆ2 = X1 0 y ⇒ X1 0 X1 βˆ1 = X1 0 y − X1 0 X2 βˆ2 X1 0 X1 βˆ1 = X1 0 (y − X2 βˆ2 ) X2 0 X1 βˆ1 + X2 0 X2 βˆ2 = X2 0 y ⇒ X2 0 X2 βˆ2 = X2 0 y − X2 0 X1 βˆ1 X2 0 X2 βˆ2 = X2 0 (y − X1 βˆ1 ) Martin / Zulehner: Introductory Econometrics 20 / 75 The Frisch-Waugh (1933) Theorem And therefore: βˆ1 = (X1 0 X1 )−1 X1 0 (y − X2 βˆ2 ) (11) βˆ2 = (X2 0 X2 )−1 X2 0 (y − X1 βˆ1 ) (12) which are the regression of (y − X2 βˆ2 ) on X1 and (y − X1 βˆ1 ) on X2 when solving the system of equations (13) and (14), we get βˆ1 = (X1 0 M2 X1 )−1 (X1 0 M2 y) βˆ2 = (X2 0 M1 X2 )−1 (X2 0 M1 y) where M1 = IN − X1 (X01 X1 )−1 X01 and M2 = IN − X2 (X02 X2 )−1 X02 Martin / Zulehner: Introductory Econometrics 21 / 75 A “partialing out” interpretation The previous considerations imply that regressing y on X1 and X2 gives the same effect for X2 as regressing y on the residuals from a regression of X2 on X1 1. y = Xβ + u = X1 β1 + X2 β2 + u 2. regress X2 on X1 → residuals 3. regress y on residuals Hence, only the part of X2 that is uncorrelated with X1 is related to y so that we are estimating the effect of X2 on y after X1 has been "partialled out" If the variables in a MLR model are perfectly uncorrelated (i.e. the regressors are "orthogonal"), then the estimated coefficients are exactly the same that one can estimate in the respective simple linear regressions The reason is that, if regressors are uncorrelated each of them contains no information about the others and therefore there is no effect to be "partialled out"! Martin / Zulehner: Introductory Econometrics 22 / 75 A "partialing out" interpretation - simple case Consider the case where k = 2, i.e. ŷ = β̂0 + β̂1 x1 + β̂2 x2 Then one can show: Pn rˆi1 yi β̂1 = Pi=1 n 2 i=1 rˆi1 Where rˆi1 are the OLS residual from the estimated regression of x1 on x2 : x̂1 = γ̂0 + γ̂2 x2 → rˆ1 = x1 − γ̂0 + γ̂2 x2 Martin / Zulehner: Introductory Econometrics 23 / 75 Example wage regression 1. wage = β0 + β1 education + β2 experience + β3 female + u there are two ways to obtain an estimate for β1 1. regress wage on education, experience and female 2. regress education on experience and female → obtain the resuiduals (reducation) and regress wage on reducation and compare with above regression: education = α0 + α1 experience + α2 female + v → reducation = education − (α̂0 + α̂1 experience + α̂2 female) and then regress wage on reducation: wage = γ0 + γ1 reducation + w → γ̂1 = β̂1 log(wage) regression 2. lwage = β0 + β1 education + β2 experience + β3 female + u Martin / Zulehner: Introductory Econometrics 24 / 75 Example: wage function – CPS 2015 (N=7098) (1) (2) (3) (4) (5) (6) wage education wage lwage education lwage education 9.846*** 0.462*** (0.262) (0.011) experience 0.531*** 0.001 0.024*** 0.001 (0.045) (0.002) (0.002) (0.002) female -4.144*** 0.151*** -0.178*** 0.151*** (0.266) (0.012) (0.012) (0.012) reducation (residuals) 9.846*** (0.267) reducation (residuals) 0.462*** (0.012) Constant 2.045 0.445*** 21.237*** 2.027*** 0.445*** 2.913*** (1.355) (0.061) (0.132) (0.059) (0.061) (0.006) Adjusted R-squared 0.189 0.022 0.161 0.208 0.022 0.180 # of observations 7098 7098 7098 7098 7098 7098 Martin / Zulehner: Introductory Econometrics 25 / 75 Too many or too few variables What happens if we include variables in our specification that actually do not belong to it? I There is no effect on our parameter estimate, and OLS remains unbiased. Why? What however if we exclude a variable from our specification that indeed does belong to it? I OLS will usually be biased. This is called omitted variable bias Martin / Zulehner: Introductory Econometrics 26 / 75 Omitted Variable Bias - I Suppose the true model is given as: y = β0 + β1 x1 + β2 x2 + u But we only estimate: ỹ = β˜0 + β˜1 x1 + u Then PN (xi1 − x̄1 )yi β˜1 = Pi=1 N (13) 2 i=1 (xi1 − x̄1 ) Martin / Zulehner: Introductory Econometrics 27 / 75 Omitted Variable Bias - II Recall that the true model is: yi = β0 + β1 xi1 + β2 xi2 + ui Thus, the numerator becomes: N X (xi1 − x̄1 )(β0 + β1 xi1 + β2 xi2 + ui ) i=1 N X N X N X =β1 (xi1 − x̄1 )2 + β2 (xi1 − x̄1 )xi2 + (xi1 − x̄1 )ui (14) i=1 i=1 i=1 Martin / Zulehner: Introductory Econometrics 28 / 75 Omitted Variable Bias - III Inserting equation (14) into (13): PN PN (xi1 − x̄1 )xi2 (xi1 − x̄1 )ui β̃1 = β1 + β2 Pi=1 N + Pi=1 N 2 2 i=1 (xi1 − x̄1 ) i=1 (xi1 − x̄1 ) Since E (ui ) = 0, taking the expectation yields: PN (xi1 − x̄1 )xi2 E (β̃1 ) = β1 + β2 Pi=1 N 2 i=1 (xi1 − x̄1 ) Martin / Zulehner: Introductory Econometrics 29 / 75 Omitted Variable Bias - IV Consider the regression of x2 on x1 : x̃2 = δ̃0 + δ̃1 x1 Then: PN (xi1 − x̄1 )xi2 δ̃1 = Pi=1 N 2 i=1 (xi1 − x̄1 ) Thus: E (β̃1 ) = β1 + β2 δ1 Martin / Zulehner: Introductory Econometrics 30 / 75 Omitted Variable Bias Summary Possible outcomes Corr (x1 , x2 ) > 0 Corr (x1 , x2 ) < 0 β2 > 0 positive bias negative bias β2 < 0 negative bias positive bias Two cases where bias is equal to zero I β2 = 0 , that is x2 does not really belong to the model I x1 and x2 are uncorrelated in the sample If the correlation between x2 and x1 and the correlation between x2 , y have the same direction, the bias will be positive If the correlation between x2 and x1 and the correlation between x2 , y have opposite direction, the bias will be negative Martin / Zulehner: Introductory Econometrics 31 / 75 Example: wage function – CPS 2015 (N=7098) 1 wage = β0 + β1 ∗ educ + u 2 wage = β0 + β1 ∗ educ + β2 ∗ age + v 3 age = δ0 + δ1 ∗ education + w 4 wage = β0 + β1 ∗ educ + β2 ∗ female + v 5 female = δ0 + δ1 ∗ education + w (1) (2) (3) (4) (5) wage wage age wage female constant 16.381 -0.029 29.634 17.823 0.340 (0.193) (1.371) (0.050) (0.211) (0.008) educ 9.234 9.238 -0.007 9.857 0.147 (0.267) (0.264) (0.068) (0.265) (0.012) age 0.554 (0.046) female -4.244 (0.268) adjusted R2 0.145 0.162 -0.000 0.174 0.022 Exercise: code in R Martin / Zulehner: Introductory Econometrics 32 / 75 The More General Case Technically, one can only sign the bias for the more general case if all of the included x’s are uncorrelated Typically, then, we work through the bias assuming the x’s are uncorrelated, as a useful guide even if this assumption is not strictly true Martin / Zulehner: Introductory Econometrics 33 / 75 Unbiasedness Theorem 1 - Unbiasedness of OLS Under assumptions MLR.1 through MLR.4, E (β̂) = β or E (β̂ j ) = β j , j = 0, 1,... , k for any values of the population parameter β j. In other words, the OLS estimators are unbiased estimators of the population parameters Martin / Zulehner: Introductory Econometrics 34 / 75 Unbiasedness - Derivation We showed that: β̂ = (X0 X)−1 X0 y = (X0 X)−1 X0 (Xβ + u) = β + (X0 X)−1 X0 u Therefore taking expectations: E (β̂) = E (β + (X0 X)−1 X0 u) = β + (X0 X)−1 X0 E (u), by MLR.4 =β Martin / Zulehner: Introductory Econometrics 35 / 75 3. Variance of the OLS estimator Assumption MLR.5 - Homoskedasticity The error u has the same variance independently of the values taken by (x1 ,... , xK ). In other words: Var (uj |x1 ,... , xK ) = σ 2 (15) Compared to MLR.4 this assumption is of secondary importance MLR.5 has no implications for the unbiasedness of β̂ However, it helps to derive formulas for the sampling variance Martin / Zulehner: Introductory Econometrics 36 / 75 Variance-Covariance matrix The i.i.d. assumption (MLR.2) together with the zero conditional mean assumption (MLR.4) and the homoskedasticity assumption (MLR.5) imply that the variance-covariance matrix of the error term u is:   2 1 0 0 0 0  ... σ... 0 1... 0 0 σ2... 0 Var (u|X) = E (uu0 ) = σ 2 I = σ 2 .....  = ......      ......... .... 0 0... 1 0 0... σ2 Martin / Zulehner: Introductory Econometrics 37 / 75 Theorem 2 - Variance of OLS estimator Under assumptions MLR.1 through MLR.5, conditional on the sample values of the independent variables X the variance covariance matrix of the least squares estimator β̂ is given by: Var (β̂) = σ 2 (XX0 )−1 or alternatively (16) 2 σ Var (β̂j ) = for j = 1, 2,... , K , (17) SSTj 1 − Rj2 where SSTj = Ni=1 (xij − x̄j )2 is the total sample variation in xj , and Rj2 is the P R-squared from regressing xj on all other independent variables (including an intercept) Martin / Zulehner: Introductory Econometrics 38 / 75 Derivation of the Variance We showed that: β̂ = (X0 X)−1 X0 y = (X0 X)−1 X0 (Xβ + u) = β + (X0 X)−1 X0 u Therefore we have that β̂ − β = (X0 X)−1 X0 u Using the Var operator Var (β̂) = E [(β̂ − β)(β̂ − β)0 ] = E [((X0 X)−1 X0 u)((X0 X)−1 X0 u)0 ] = E [(X0 X)−1 X0 uu0 X(X0 X)−1 ] = (X0 X)−1 X0 E (uu0 )X(X0 X)−1 = σ 2 I(X0 X)−1 X0 X(X0 X)−1 = σ 2 (X0 X)−1 ((X0 X)−1 X0 u)0 = u0 X(X0 X)−1 as ((X0 X)−1 )0 = (X0 X)−1 For the derivation using the summation notation instead of the matrix notation look at Wooldridge p 115 Martin / Zulehner: Introductory Econometrics 39 / 75 Since we do not observe σ 2 , we have to estimate it: PN 2 i=1 ûi û0 û σ̂ 2 = = = SSR/df (18) N −K −1 N −K −1 where df = N − (K + 1), are the degrees of freedom, i.e. the number of observations minus the number of estimated parameters Hence the estimated variance of the OLS estimator is: d (β̂) = σ̂ 2 (X0 X)−1 or Var (19) 2 σ̂ Var d (β̂ ) = j , for j = 1,... , K (20) SSTj 1 − Rj2 The variance of the OLS estimator equals the variance of the error term weighted by the variation in the regressors Martin / Zulehner: Introductory Econometrics 40 / 75 Components of OLS Variances: The error variance: a larger σ 2 implies a larger variance for the OLS estimators The total sample variation: a larger SST implies a smaller variance for the estimators Linear relationships among the independent variables: a larger Rj2 implies a larger variance for the estimators Martin / Zulehner: Introductory Econometrics 41 / 75 Unbiasedness of the OLS estimator of the variance Theorem 3 - Unbiased Estimation of σ 2 Under the Gauss-Markov Assumptions MLR.1 through MLR.5, E (σ̂ 2 ) = σ 2 (21) The estimate σ̂ 2 for σ 2 is an unbiased estimator, since it is normalized by the number of degrees of freedom, i.e. the number of observations minus the number of estimated parameters Martin / Zulehner: Introductory Econometrics 42 / 75 Efficiency of the OLS estimator Theorem 4 - Gauss-Markov Theorem Under the Assumptions MLR.1 - MLR.5, β̂0 , β̂1 ,... , β̂k are the best (i. e. smallest variance) linear unbiased estimators (BLUEs) of β0 , β1 ,... , βk respectively That means, the OLS estimator is the estimator with the smallest variance among all linear unbiased estimators In other words, OLS is BLUE, i. e Best Linear Unbiased Estimator. Martin / Zulehner: Introductory Econometrics 43 / 75...More formally Theorem 4 - Gauss-Markov Theorem Under the Assumptions MLR.1 - MLR.5, β̂ is the best linear unbiased estimator (BLUE), i.e. Var (β̃) − Var (β̂) = σ 2 DD0 positive semidefinite where β̃ is an alternative estimator of β and D is defined as β̃ − β̂ = Dy Martin / Zulehner: Introductory Econometrics 44 / 75 Proof of the Gauss-Markov Theorem Let us start showing under which conditions (any) alternative estimator β̃ is unbiased Define β̃ = ((X0 X)−1 X0 + D)y Taking the expectation E (β̃) = E [(X0 X)−1 X0 + D](Xβ + u) = (X0 X)−1 X0 Xβ + DXβ + (X0 X)−1 X0 E (u) +D E (u) | {z } | {z } =0 =0 = (I + DX)β Hence β̃ is unbiased if and only if DX = 0 Definition: a matrix is said to be positive semidefinite, if ((z0 Mz) ≥ 0∀z ∈ R n \0 here: M = D0 D Martin / Zulehner: Introductory Econometrics 45 / 75 Proof of the Gauss-Markov Theorem Now denote C = [(X0 X)−1 X0 + D] and use the variance operator Var (β̃) = Var (Cy) = CVar (y)C0 = σ 2 CC0 = σ 2 [(X0 X)−1 X0 + D][X(X0 X)−1 + D0 ] = σ 2 [(X0 X)−1 X0 X(X0 X)−1 + (X0 X)−1 X0 D + DX(X0 X)−1 + DD0 ] = σ 2 (X0 X)−1 + σ 2 (X0 X)−1 (|{z} DX )0 + σ 2 (|{z} DX )(X0 X)−1 + σ 2 (DD0 ) =0 =0 0 −1 2 = σ (X X) + σ DD = Var (β̂) + σ DD0 2 0 2 Since DD0 is a positive semidefinite matrix, the variance of β̃ is larger than the variance of β̂ Martin / Zulehner: Introductory Econometrics 46 / 75 The coefficient of determination R 2 The coefficient of determination is: SSE Var d (ŷ) R2 = = SST Var d (y) Since y = ŷ + û = Xβ̂ + û then y0 y = (ŷ + û)0 (ŷ + û) = ŷ0 ŷ + ŷ0 û +û0 û + û0 ŷ since X0 û = 0 |{z} |{z} =0 =0 0 = ŷ ŷ + û û = β̂ X Xβ̂ + û0 û 0 0 0 Martin / Zulehner: Introductory Econometrics 47 / 75 The coefficient of determination R 2 The Sum of Squared Total is measured by the sum of squared deviations from the sample mean ( Ni=1 (yi − ȳ )2 = N 2 ) − N ȳ 2 ) Subtracting N ȳ 2 from each side P P i=1 (yi of the previous decomposition: 0 y0 y − N ȳ 2 = β̂ X0 Xβ̂ − N ȳ 2 + |{z} û0 û | {z } | {z } SST SSE SSR Hence we have: SSR Var d (û) R2 = 1 − =1− SST Var d (y) Martin / Zulehner: Introductory Econometrics 48 / 75 R 2 = ρ2y ,ŷ , where ρ2y ,ŷ denotes the empirical correlation between y and the regressors x1 ,... , xK If the model has an intercept, 0 ≤ R 2 ≤ 1 The R 2 follows an unknown distribution Adding further variables leads to an increase in the R 2 Linear transformations of the regression model do not change the value of the R 2 coefficient Martin / Zulehner: Introductory Econometrics 49 / 75 Adjusted R 2 - I Recall that the R 2 will always increase as more variables are added to the model The adjusted R 2 takes into account the number of variables in a model, and may decrease if we increase K [SSR/(N − K − 1)] R̄ 2 ≡ 1 − [SST /(N − 1)] σ̂ 2 =1− [SST /(N − 1)] Martin / Zulehner: Introductory Econometrics 50 / 75 Adjusted R 2 - II It is easy to see that the adjusted R 2 is just 1 − (1 − R 2 )(N − 1)/(N − K − 1), Most statistical software packages will give you both R 2 and adj-R 2 You can compare the fit of 2 models (with the same y ) by comparing the adj-R 2 You cannot use the adj-R 2 to compare models with different y ’s (e.g. y vs. ln(y )) Martin / Zulehner: Introductory Econometrics 51 / 75 Goodness of Fit The goodness of fit is an important principle to understand the quality of your model However! It is important not to care too much about adj-R 2 and lose sight of theory and common sense If economic theory clearly predicts that a variable belongs to the model, you should generally leave it in Martin / Zulehner: Introductory Econometrics 52 / 75 Exercise 1: Mean & Variance of OLS of the MLR model Let’s assume: y = Xβ + u, where y vector of the dependent variable, u vector of the error term, β vector of the unknown parameter, and X matrix of the regressors, where the first column only consists of ones a Derive the OLS estimator. Answer: see page 17+18 b Show that the OLS estimator is unbiased. Which assumptions are required for the unbiasedness of OLS to hold? Answer: see page 37+38 c Derive the variance-covariance matrix of OLS under the assumption of homoskedasticity. Answer: see page 41+42 Martin / Zulehner: Introductory Econometrics 53 / 75 Exercise 2: omitted variable bias Let’s assume the true model is y = β0 + β1 x1 + β2 x2 + u. We estimate the model y = γ0 + γ1 x1 + v. a Derive the omitted variable bias. Answer: see page 30-33 b Discuss the direction of the bias. Answer: see page 34 Martin / Zulehner: Introductory Econometrics 54 / 75 Exercise 3: best linear prediction Let’s assume the true model is y = β0 + β1 x1 + u. a Write down the MSE (see unit 1). b Show that the OLS estimator for β0 and β1 in the simple linear regression model minimizes the MSE if the true model is linear. Martin / Zulehner: Introductory Econometrics 55 / 75 Exercise 3: best linear prediction Solution: If the true model is linear, the mean squared error is MSE = E (y − β0 − β1 x)2. In b order to minimize the MSE with respect to the choice of β0 and β1 , we compute the first-order conditions ∂MSE ! = −2E(y − β0 − β1 x) = 0 ⇔ β0 = E(y ) − β1 E(x) ∂β0 ∂MSE ! = −2E [x · (y − β0 − β1 x)] = 0 ∂β1 2 2E(x) H = is positive definite 2E(x) 2E(x 2 ) h i 2 ⇒ E xy − xE(y ) + β1 xE(x) − β1 x =0 2 2 E(xy ) − E(x)E(y ) + β1 [E(x)] − β1 E(x ) = 0 E(xy ) − E(x)E(y ) Cov (x, y ) β1 = = E(x 2 ) − [E(x)]2 Var (x) Hence, we have obtained the OLS estimates of β0 & β1 as the solution to our MSE minimization problem. Martin / Zulehner: Introductory Econometrics 56 / 75 Exercise 4: Frisch-Waugh (1933) Theorem Assume a model of the structure y = X1 β1 + X2 β2 + u, where β̂ = (β̂10 , β̂10 )0 = (X 0 X )−1 X 0 y. Show that a β̂1 = (X10 M2 X1 )−1 (X10 M2 y ) where M2 = IN − X2 (X20 X2 )−1 X20. b Analogously, show that β̂2 = (X20 M1 X2 )−1 (X20 M1 y ). Martin / Zulehner: Introductory Econometrics 57 / 75 Exercise 4: Frisch-Waugh (1933) Theorem Solution: We can specify y as a function of estimates such that y = X1 β̂1 + X2 β̂2 + u. Now, let us multiply both sides with M2 so that M2 y = M2 X1 β̂1 + M2 X2 β̂2 + M2 u (22) We can show that M2 X2 = (IN − X2 (X20 X2 )−1 X20 )X2 = IN X2 − X2 (X20 X2 )−1 X20 X2 = X2 − X2 = 0 Hence we can re-write equation (1) such that M2 y = M2 X1 β̂1 + M2 e, (23) with the corresponding OLS estimator β̂1 = (X10 M2 X1 )−1 (X10 M2 y ). Analogously, you can show β̂2 = (X20 M1 X2 )−1 (X20 M1 y ) by multiplying y = X1 β̂1 + X2 β̂2 + e with M1 on both sides and use M1 X1 = 0. Martin / Zulehner: Introductory Econometrics 58 / 75 Exercise 5: The CEF-Decomposition Property Assume that Yi and Xi are random variables. The Conditional expectation function (CEF)-decomposition property states that Yi = E[Yi |Xi ] + ui a Show that such a representation of Yi always exists.. b Show that ui is mean-independent of Xi , i.e. E[ui |Xi ] = 0. c Use the result from (b) and the law of iterated expectation to show that ui is uncorrelated with any function of Xi , i.e. show that for an arbitrary function h we have E[h(Xi ) · ui ] = 0. Martin / Zulehner: Introductory Econometrics 59 / 75 Exercise 5: The CEF-Decomposition Property Solution: a We can simply write Yi = Yi + E[Yi |Xi ] − E[Yi |Xi ] and by defining ui = Yi − E[Yi |Xi ] we get Yi = E[Yi |Xi ] + ui b We have h i E[ui |Xi ] = E Yi − E[Yi |Xi ] Xi = E[Yi |Xi ] − E[Yi |Xi ] = 0 c With the law of iterated expectations, we get h i E[h(Xi ) · ui ] = E E[h(Xi ) · ui |Xi ] h i = E h(Xi ) · E[ui |Xi ] = 0 | {z } =0 Martin / Zulehner: Introductory Econometrics 60 / 75 Exercise 6: What is the direction of the bias? Question: we estimate: pi = α0 + α1 Ni + ui where pi are prices of a product (eg cars) in countries i = 1,... , n and Ni the number of firms offering the product true model: pi = β0 + β1 Ni + β2 zi + vi What is the direction of the bias if i) zi are cost factors or ii) if zi are demand factors? Solution: bias if we do not account for cost components: we underestimate the effect of the number of firms on market structure, i.e. we find a significant relation between market structure and prices although it is driven be unobserved cost bias if we do not account for demand components: we overestimate the effect of the number of firms on market structure, i.e. we may find no significant relation between market structure and prices, although its is driven by unobserved demand Martin / Zulehner: Introductory Econometrics 61 / 75 Exercise 6: Direction of the bias Let’s assume the true relationship looks like: pi = β0 + β1 number of firms + β2 cost + ui (Note: β1 < 0 and β2 > 0) i = 1,..., n; pi is price for product i. Let’s assume we estimate: pi = α0 + α1 number of firms + vi α1 − β1 =δ1 ∗ β2 < 0 ,→ costi = δ0 + δ1 number of firmsi + wi (Note: δ1 < 0) α1 < β1 α1 β1 0 (true) We underestimate the effect of competition on prices. We may find a significant relation between prices and the number of firms. Martin / Zulehner: Introductory Econometrics 62 / 75 Exercise 6: Direction of the bias Let’s assume the true relationship looks like: pi = β0 + β1 number of firms + β2 demand + ui (Note: β1 < 0 and β2 > 0) i = 1,..., n; pi is price for product i. Let’s assume we estimate: pi = α0 + α1 number of firms + vi α1 − β1 =δ1 ∗ β2 > 0 ,→ demandi = δ0 + δ1 number of firmsi + wi (Note: δ1 > 0) α1 > β1 β1 α1 0 (true) We may overestimate the effect of competition on prices. We may find no significant relation between prices and the number of firms. Martin / Zulehner: Introductory Econometrics 63 / 75 Exercise 6: Solutions add (all necessary) variables that account for cost and demand conditions panel data: fixed effects (→ units 13 + 14) instrumental variable estimation (→ units 15 + 16) I prices and market structure: not that easy to find appropriate instruments structural approach (→ Quantitative Methods in Industrial Organization and Competition Policy) Martin / Zulehner: Introductory Econometrics 64 / 75 Exercise 7: Examples from daily live: Correlation vs. causality Martin / Zulehner: Introductory Econometrics 65 / 75 Correlation vs. causality 1 What are the “observations” in this statement? Martin / Zulehner: Introductory Econometrics 66 / 75 Correlation vs. causality 1 What are the “observations” in this statement? Persons i 2 What is the dependent variable y in this statement? Martin / Zulehner: Introductory Econometrics 66 / 75 Correlation vs. causality 1 What are the “observations” in this statement? Persons i 2 What is the dependent variable y in this statement? Whether person i ends up dying, so yi ∈ {0, 1} 3 What is the independent variable x in this statement? Martin / Zulehner: Introductory Econometrics 66 / 75 Correlation vs. causality 1 What are the “observations” in this statement? Persons i 2 What is the dependent variable y in this statement? Whether person i ends up dying, so yi ∈ {0, 1} 3 What is the independent variable x in this statement? Whether person i confuses correlation and causation, so also xi ∈ {0, 1} 4 Does x cause y ? Alternatively: Is y = 1 more likely for x = 0 than for x = 1? Martin / Zulehner: Introductory Econometrics 66 / 75 Correlation vs. causality 1 What are the “observations” in this statement? Persons i 2 What is the dependent variable y in this statement? Whether person i ends up dying, so yi ∈ {0, 1} 3 What is the independent variable x in this statement? Whether person i confuses correlation and causation, so also xi ∈ {0, 1} 4 Does x cause y ? Alternatively: Is y = 1 more likely for x = 0 than for x = 1? No ⇒ no causality here ⇒ it’s about correlation Martin / Zulehner: Introductory Econometrics 66 / 75 Ronaldo effect Did Ronaldo cause Portugal’s better performance? Martin / Zulehner: Introductory Econometrics 67 / 75 Ronaldo effect year rank 1960 16 1964 16 1968 16 1972 16 1976 16 1980 16 1984 4 1988 16 1992 16 1996 8 2000 4 2004 2 2008 8 2012 2 2016 1 2020 16 2024 8 Martin / Zulehner: Introductory Econometrics 68 / 75 Ronaldo effect Did Ronaldo cause Portugal’s better performance? Martin / Zulehner: Introductory Econometrics 69 / 75 Ronaldo effect Did Ronaldo cause Portugal’s better performance? Several issues here, we will get to know them when we talk about time series Key problem in short: Portugal was on increasing trend even before Ronaldo joined! Martin / Zulehner: Introductory Econometrics 69 / 75 News Martin / Zulehner: Introductory Econometrics 70 / 75 Dismantling Google? 1 What is the empirical content of this statement? 2 How could we embed this in a model? 3 Which data would be required to verify or falsify it? https://www.economist.com/leaders/2024/10/03/dismantling-google-is-a-terrible-idea Martin / Zulehner: Introductory Econometrics 71 / 75 US Jobs "Today, we received good news for American workers and families with more than 250,000 new jobs in September and unemployment back down at 4.1%," President Biden said. "With today’s report, we’ve created 16 million jobs, unemployment remains low, and wages are growing faster than prices." 1 What is the empirical content Mr Biden’s statement concerning “job creation”? 2 How could we embed this in a model? 3 Which data would be required to verify or falsify it? 4 Suppose we have cross-sectional data for September 2024 for all US states, with the following two variables: Amount of government spending, and unemployment. Which of the Gauss Markov assumptions is likely violated? https://www.bbc.com/news/articles/c3e903nx51qo Martin / Zulehner: Introductory Econometrics 72 / 75 Remote work “When we look back over the last five years, we continue to believe that the advantages of being together in the office are significant,” [Amazon CEO Andy] Jassy wrote. 1 What is the empirical content of this statement? 2 How could we embed this in a model? https://techcrunch.com/2024/09/21/amazon-says-no-to-remote-work/ Martin / Zulehner: Introductory Econometrics 73 / 75 Remote work Suppose you obtained a sample of employees fraction of remote work, and a productivity measure (see plot). Which of the Gauss-Markov assumptions is likely violated? Martin / Zulehner: Introductory Econometrics 74 / 75 Hikes for health Suppose you obtain a sample of UKs individuals, with (i) the number of hours spent in Nature per week and (ii) a health measure. Can you run a linear regression? Which of the Gauss-Markov assumptions is likely violated? https://www.theguardian.com/lifeandstyle/2024/oct/06/best-walk-hike-for-mental-health-uk Martin / Zulehner: Introductory Econometrics 75 / 75

Unit 3: Multi Regression Model PDF

Document Details

Tags

Related

Summary

Full Transcript