STA 773 Advanced Econometric Methods Lecture 2 K-variables PDF

# STA 773: Advanced Econometric Methods Olusanya E. Olubusoye¹ and Ephraim Ogbonna² 1,2Department of Statistics, Faculty of Science, University of Ibadan, Nigeria 1,2 Centre for Econometric and Applied Research (CEAR), Ibadan, Nigeria October 2023 ## Course Outline 1. K-Variable Linear Equation (Johnston and DiNardo 2007) 2. Maximum Likelihood and Instrumental Variables 3. Univariate Time Series Modelling 4. Multiple Equation Models 5. Generalized Method of Moments 6. Panel Data 7. Discrete and Limited Dependent Variable Models 8. Bayesian Regression (Normal Linear Regression Models) (Koop 2003) ## References Johnston, J. and J.E. DiNardo (2007). *Econometric Methods*. McGraw-Hill economics series. McGraw-Hill. ISBN: 9780071259644. URL: https://books.google.com.ng/books?id=GB0InwEACAAJ. Koop, G. (2003). *Bayesian econometrics*. J. Wiley, Chichester. ISBN: 9780470845677, 0470845678. # 1 K-Variable Linear Equation ## 1.1 Introduction Econometric analysis revolves around structural analysis of plausible relationship(s) between or among economic variables, for the purpose of forecasting and policy evaluation, with respect to the given economic phenomena being modelled. The econometric model could either be specified in single equation or multiple equations; the latter being more realistic because it takes cognisance of certain statistical features (for example, simultaneity) that are ignored under the single equation model framework. However in this lecture, the focus would be on a single equation model that comprises multiple regressors - the k-variable linear equation. The k-variable linear equation, in a single equation framework, is a model that specifies a regressand (dependent, response, predictand/predicted, or endogenous variable) as a linear combination of two or more regressors (independent, explanatory, predictor, or exogenous variables). Essentially, the term "k-variable" derives from the fact that k-1 regressors (X2t, X3t, Xkt) are linearly combined to explain the observed pattern in the regressand of interest. The regressors may be of various transformations of other variables; but the linearity of the model is defined the linearity in the parameters. ## 1.2 Model Specification and Description The k-variable model specification is defined as in equation (1) $y_t = β_1+β_2X_{2t} + β_3X_{3t}+...+ β_kX_{kt} + u_t t=1,...,n$ (1) where _β_1 represents the intercept; _β_2, _β_3,.., _β_k are the coefficients associated with the corresponding regressors (X2t, X3t, ..., Xkt); and _u_t represents the white noise disturbance term. There are therefore a total of k + 1 parameters to be estimated. These include the _β_'s and the variance of the disturbance (_σ_2). For simplicity and clarity, the equation is re-written in matrix notation, as defined in equation (2). $y = Xβ + u$ (2) Where $y = \begin{bmatrix} y_1 \\ y_2\\ : \\ y_n \end{bmatrix}$ $X = \begin{bmatrix} 1 & X_{21} & X_{31} & ... & X_{k1}\\ 1 & X_{22} & X_{32} & ... & X_{k2}\\ : & : & : & : & : \\ 1 & X_{2n} & X_{3n} & ... & X_{kn} \end{bmatrix}$ $β = \begin{bmatrix} β_1 \\ β_2\\ : \\ β_k \end{bmatrix}$ and $u= \begin{bmatrix} u_1\\ u_2\\ : \\ u_n \end{bmatrix}$ $n\times 1$ = $n\times k$ = $k\times 1$ = $n\times 1$ Given that the model residual is defined by _e_ = _y_ - _Xβ_, where _e_ is an estimate of _u_; the model parameters are estimated by minimizing the residual sum of squares as in Equation (3) $SRSS = (y – Xβ)' (y – Xβ)$ $ = y'y - y' XB – B'Χ'y + β'Χ' Χβ $ $ = y'y - 2ẞ'X'y + β'Χ' Χβ $ (3) Taking partial derivatives of the residual sum of squares in Equation (3) and equating to zero gives the normal equation as given in Equation (4) $\frac{δSRSS}{δβ} = -2X'y +2X'Xẞ = 0$ (4) $(X'X) β = X'y$ Replacing _y_ in Equation (4) with Xβ + e yields the following result $(X'X) β = X' (Xβ + e) = (X'X) β + X'e$ $ X'e = 0$ (5) Equation (5) indicates zero co-variances (no correlation) between the regressors and the residual. ## 1.3 Decomposition of the Sum of Squares The total variation in Y comprises explained (by the joint contributions of the regressors in the regression equation) and unexplained (the residual) parts. Mathematically, it follows that $y = y + e$ $= Xβ + e$ $\Rightarrow y'y = (Xβ+e)' (Xβ+e)$ $= β'X'Xβ + 2ẞ'X'e + e'e$ $ = β'X'Xβ + e'e$ $y'y = β'X'Xβ + e'e$ (6) since X'e = 0 (zero covariance between the regressors and the residual). Given that the variation in Y is measured by sum of the squared deviation from the mean, defined mathematically as $\sum_{t=1}^n (Y_t - \bar{Y})^2 = \sum_{t=1}^n (Y_t^2 - 2\bar{Y} Y_t + \bar{Y}^2)$ $ = \sum_{t=1}^n Y_t^2 - 2\bar{Y}\sum_{t=1}^n Y_t+ n\bar{Y}^2 $ $ = \sum_{t=1}^n Y_t^2 - 2n\bar{Y}^2 + n\bar{Y}^2$ since $\sum_{t=1}^n Y_t = n\bar{Y}$ Therefore, subtracting _n_ _Y_2 from the LHS and RHS of Equation (6) results in Equation (8) $ \sum_{t=1}^n Y_t^2 - n\bar{Y}^2 = (β'X'Xβ-n\bar{Y}^2) + e'e $ $ TSS = ESS + RSS $ (8) where TSS - Total Sum of Squares; ESS - Explained (Regression) Sum of Squares; and RSS - Residual (Unexplained) Sum of Squares. ## 1.4 Equation in Deviation Form Alternatively, the decomposition of the sum of squares can be derived from expressing all the observation points as deviations from the sample means. Given the least squares equation in Equation (9) $Y_t = β_1+β_2X_{2t} + β_3X_{3t} + ... + β_kX_{kt} + e_t t = 1,..., n$ (9) The average over the observation points gives $\bar{Y} = β_1+\bar{β_2X_2} + \bar{β_3X_3} + ... + \bar{β_kX_k} + \bar{e}$ (10) The deviation form in Equation (11) is obtained by subtracting Equation (10) from Equation (9) $Y_t - \bar{Y} = β_2(X_{2t}-\bar{X_2}) + β_3(X_{3t}-\bar{X_3}) + ... + β_k(X_{kt}-\bar{X_k}) + (e_t - \bar{e})$ (11) Note that _β_1 is omitted from Equation (11); however, it can be algebraically obtained from Equation (10) as _β_1 = _Y_ - _β_ 2_X_2 - _β_3_X_3 - ... - _β_k_X_k. The deviation form in Equation (11) can be re-written more compactly using a symmetric and idempotent transformation matrix defined in Equation (12) $A = I_n - \frac{1}{n}ii'$ (12) where i is an n-column vector of ones. Equation (12) can be expressed as Equation (13). $A = \begin{bmatrix} 1 - \frac{1}{n}& - \frac{1}{n}& - \frac{1}{n}& ... & -\frac{1}{n} \\ - \frac{1}{n}& 1 - \frac{1}{n}& - \frac{1}{n}& ... & -\frac{1}{n}\\ - \frac{1}{n}&- \frac{1}{n}& 1 - \frac{1}{n}& ... & -\frac{1}{n}\\ : &: &: & : & : \\ - \frac{1}{n}&- \frac{1}{n}& - \frac{1}{n}& ... & 1 - \frac{1}{n} \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 & ... & 0\\ 0 & 1 & 0 & ... & 0\\ 0 & 0 & 1 & ... & 0\\ : & : & : & : & : \\ 0 & 0 & 0 & ... & 1 \end{bmatrix} - \frac{1}{n} \begin{bmatrix} 1 & 1 & 1 & ... & 1\\ 1 & 1 & 1 & ... & 1\\ 1 & 1 & 1 & ... & 1\\ : & : & : & : & : \\ 1 & 1 & 1 & ... & 1 \end{bmatrix}$ (13) Let the least squares equation be re-written as $y = Xβ + e $ $y = [i X_2]\begin{bmatrix} β_1\\ β_2 \end{bmatrix} + e$ (14) where X2 and _β_2 in Equation (14) are respectively n × (k-1) matrix of data points of the regressors and (k-1) vector of coefficients (_β_2, _β_ 3, ..., _β_k). Multiplying _A_ and Equation (14), bearing in mind the idempotent property of _A_ such that _Ai_ = 0 and _Ae_ = _e_, gives $Ay = Y^* = \begin{bmatrix} 0 & X_2 \end{bmatrix} \begin{bmatrix} β_1\\ β_2 \end{bmatrix} + e $ $ =[A] + Ae$ $ = (AX_2)\beta_2 + e$ (15) Again, multiplying Equation (15) with X gives Equation (16) $X^*Y^* = X^*X^*_2\beta_2 + Xe$ $ X^*Y^* = X^*X^*_2β_2$ where Xe = 0 since X'e = 0 (16) Equation (16) resembles Equation (4) except that the latter is expressed in deviation form and _β_2 in Equation (16) is a k - 1 vector that excludes the intercept term. Hence, in a similar manner, the decomposed sum of squares can be expressed as $ Y^*Y^* = β'_2X^*_2X^{**}_2β + e'e $ $ TSS = ESS + RSS $ (17) Some important statistics derived from the decomposed sum of squares are presented in Table (1). | SN| Statistics | Formula | Remarks | | - | - | - | ----- | | 1 | Coef. of Multiple R2 | ESS/TSS | Measures the proportion of total variation explained by the combination of regressors. | | 2 | Adj. Coef. of Mult. R2 | 1 - RSS/(n-k) | Takes cognisance of the number of regressors used in the equation. n and n - 1 are respectively the residual and total degrees of freedom. | | 3 | Akaike Information Criterion | AIC = ln (RSS/n) +2k/n | Used for model selection (model fit comparison). The imposed penalty is fixed and relatively low for larger samples. | | 4 | Schwarz Criterion | SC = ln (RSS/n) + kln (n) / n | Also called Bayesian Information Criterion and used for model selection (comparing model fit). Imposed penalties increases with sample size. | ## 1.5 Inference in the k-Variable Equation The statistical properties of the least squares estimator and appropriate inference is dependent on some assumptions. 1. X is nonstochastic and has full column rank k. Inference is conditional on the sample values of X variables, so the elements of X matrix are treated as fixed in repeated sampling. The linear independence of the columns of X is a requirement for the unique determination of the _β_ vector. 2. The disturbances have two important properties. * Expected value of the residual is zero. $E (u) = E \begin{bmatrix} u_1\\ u_2\\ : \\ u_n \end{bmatrix} = \begin{bmatrix} 0\\ 0\\ : \\ 0 \end{bmatrix}$ (18) * Constant Error Variances (Homoscedasticity) and Zero Covariances (No Auto or Serial Correlation) $var (u) =E (uu') = E \begin{bmatrix} u_1 & u_2 & ... & u_n \\ u_2& u_1 & ... & u_n \\ : & : & : & : \\ u_n & u_n & ... & u_n \end{bmatrix}$ $ = \begin{bmatrix} E (u_1^2) & E (u_1u_2) & ... & E (u_1 u_n)\\ E (u_2u_1) & E (u_2^2) & ... & E (u_2u_n)\\ : & : & : & : \\ E (u_nu_1) & E (u_nu_2) & ... & E (u_n^2) \end{bmatrix}$ $ = \begin{bmatrix} var (u_1) & cov (u_2, u_1) & ... & cov (u_n, u_1)\\ cov (u_1, u_2) & var (u_2) & ... & cov (u_n, u_2)\\ : & : & : & : \\ cov (u_1, u_n) & cov (u_2, u_n) & ... & var (u_n) \end{bmatrix}$ $= \begin{bmatrix} σ^2 & 0 & ... & 0\\ 0 & σ^2 & 0 & ... \\ : & 0 & ... & : \\ 0 & ... & ... & σ^2 \end{bmatrix}= σ^2I$ (19) The variances and covariances of the disturbances are on the main and off-diagonal positions, respectively. Violation of the stances of homoscedasticity and no auto- or serial-correlation are termed "heteroscedasticity" and "auto- or serial-correlation". ## 1.6 Mean and Variance of β For theoretical derivations, _β_ in Equation (4) is made the subject of the formula as defined in Equation (20) $β = (X'X)^{-1} X'y$ (20) Substituting Equation (9) in Equation (20) gives $β = (X'X)^{-1} X'(Xβ + u)$ $ = β + (X'X)^{-1} X'u$ $ = (X'X)^{-1} X'u $ (21) Taking the expectation of Equation (21) gives $E (β) = (X'X)^{-1} X'E (u) = 0$ $= β $ since E(u) = 0 and E(β) = β (22) Equation (22) shows that _β_ is an unbiased estimate of _β_, which is the mean. The variance of _β_ is defined as Equation (23) $ var (β) = E [(β-β) (β-β)'] $ $ = E [(X'X)¯¹ X'uu'X (X'X)¯¹]$ $ = σ^2(X'X)^{-1}$ (23) Equation (23) represents a matrix where the elements on the main diagonal indicate variances, and the offdiagonal elements represent covariances. ## 1.7 Estimation of σ2 In Equation (23), the matrix containing variances and covariances includes an unknown disturbance variance σ2. An estimate of this variance could be obtained from the residual sum of squares of the fitted regression model. Given that _β_ = (X'X)-1 X'y in Equation (20), the least squares regression residual can be expressed as $ e = y - Xβ$ $ = y - X (X'X)^{-1} X'y $ $ = Mu $ since M = I - X (X'X)-1 X' and MX = 0 $e = Mu$ (24) where M is a symmetric (M' = M) and an idempotent (MM' = M'M) matrix; such that MX = 0 and Me = e. From Equation (24), the residual variance estimate is defined as $E (e'e) = E (u'M'Mu) = E (u'Mu) $ (25) Given that the trace of a scalar returns the scalar, then $E (e'e) = E [tr (uu'M)] = E (tr (u'Mu))$ $ = σ²tr (M)$ $ = σ²tr (I – X (X'X)-1 X')$ $ = σ²tr (I) - σ²tr [X (X'X)-1 X']$ $ = σ²tr (I) – σ²tr [(X'X)-1 (X'X)]$ $ = σ² $ $\hat{σ}^2 = \frac{e'e}{n-k}$ (26) Equation (1.7) is an unbiased estimate of σ2 and the square root of this estimate is referred to as the standard error of the regression (SER). ## 1.8 Gauss-Markov Theorem Gauss-Markov Theorem states that under certain Assumptions, the least squares estimator of the (3) coefficients is the Best Linear Unbiased Estimator (BLUE). Essentially, the resulting parameter estimates are unbiased, have minimum variance, and are linear in the observed values. It is therefore crucial to understand that no alternative linear and unbiased estimator for the coefficients (3) can exhibit a smaller variance in its sample estimates than the one expressed in Equation (1.7). A more comprehensive outcome regarding any linear combination of these (3) coefficients is demonstrated here. Consider an arbitrary k- element vector of known constants denoted as _c_ and let a scalar quantity _μ_ be defined as $ μ = c'β $ If _c_ = [0 1 0 ... 0] is chosen, then _μ_ = _β_2; which implies that a single element in _β_ can be picked. In a more complex example, if $c' = [1 X_{2,n+1} X_{3,n+1} ... X_{k,n+1}]$ , then $ μ = E (Y_{n+1})$ This refers to the expected value of the response variable (Y), for the period n + 1, specifically taking into account the values of the predictor variables (X) in that same period. In general, _μ_ represents any linear combination of the elements of _β_. To explore the group of linear and unbiased estimators for _μ_, let's introduce a scalar (m), which will function as a linear estimator for _μ_. This means that: $ m = a'y = a'Xβ + a'u $ where _a_ is an n-element vector and the definition guarantees linearity. To ascertain unbiasedness, we have $ E (m) = α΄Χβ + α΄Ε (u)$ $= α΄Χβ$ $= c'β$ only if $a' X = c$ (27) The challenge is to determine an n-vector _a_ that minimizes the variance of _m_, while adhering to the constraints outlined in Equation (27). The variance of _m_ can be expressed as $ var (m) = E (a'uu'a) = σ²α'α$ where _a'u_ is a scalar and thus its square written as a product of its transpose and itself. We therefore have to find _a_ to minimize _a'a_ subject to X'a = c. $ Φ = a'a - 2λ'(X'a - c) $ $\frac{δφ}{δα} = 2α - 2XXλ = 0$ $ a = XXλ$ $\frac{δφ}{δλ} = -2 (X'a - c) = 0$ $← X'a = c$ Premultiplying the above with X' gives $X'a = X'XXλ$ $ λ = (X'X)^{-1} X'a $ $⇒ a = ( X'X)^{-1} X'c$ $ a = X (X'X)^{-1} c$ (28) (29) If _m_ is a scalar defined as m = _a'y_, then $ m = a'y$ $ = c'(X'X)^{-1}y$ (30) which is a best linear unbiased estimator of c'β. This result specifically implies: 1. Each _β_ is a best linear unbiased estimator (BLUE) of the corresponding population _β_i 2. The best linear unbiased estimate (BLUE) of any linear combination of _β_'s is that same linear combination of the _β_'s 3. The BLUE of E(Y) is $ \hat{Y}_s = β_1+β_2X_{2s} + β_3X_{3s} + ... + β_kX_{ks}$ which is the value that is obtained when the relevant vector of X values are inserted into the regression equation. ## 1.9 Testing Linear Hypotheses about β Having established the properties of the LS estimator of _β_, the next step is to explain how this estimator can be used for testing different hypotheses relating to _β_. A summary is presented below in Table (2). | SN | Hypothesis | Implication | | - | - | ----- | | 1 | Η0: β₁ = 0 | The corresponding regressor X₁ does not have predictive potential for Y. It is often referred to as significance test. | | 2 | Η0: βί = βi0| Here, the βi0 is some specified value. Example, if βi denotes price elasticity, it may be pertinent to test that the elasticity is -1 | | 3 | Η0: β2 + β3 = 1 | If the β's are respectively labour and capital elasticities in a production function, it hypothesizes constant returns to scale. | | 4 | Η0: β3 - β4 = 0 or β3 - β4 = 0 | This hypothesizes that the coefficients of X3 and X4 are not statistically different. | | 5 | Η0: β2 = 0 β3 = 0 : βk = 0 | This hypothesis posits that the entire group of independent variables has no impact on the response variable Y. It does not include the intercept term, as the focus is on the variations of Y around its mean, and the absolute level of the series is typically not of particular significance. | | 6 | Η0: β2 = 0 | Suppose that _β_ can be divided into two smaller vectors, _β_1 and _β_2 where _β_1 consists of k1 elements, and _β_2 consists of the remaining k2 = k - k1 elements. This hypothesis posits that a specified subset of predictor variables does not have any influence on the determination of the response variable Y. | A more generic specification for the six examples is given as _RB_ = _r_ (31) where _R_ is a q x k matrix of known constants, with q < k; _r_ is a q-vector of known constants. Each null hypothesis determines the relevant elements in _R_ and _r_. For the six examples above, we have 1. R = [0 0 1 0 ... 0], r = 0 and q = 1 with 1 in the ith position. 2. R = [0 0 1 0 ... 0], r = β0 and q = 1 with 1 in the ith position. 3. R=[0 1 1 0 0… 0], r = 1 and q = 1. 4. R=[0 0 1 -1 0… 0], r = 0 and q = 1. 5. R = [0 Ik-1], r = 0 and q = k − 1 where 0 is a vector of k − 1 zeros. 6. R = [0k2 x k1 Ik2], r = 0 and q = k2 with 1 in the ith position. The most effective approach is to develop a testing method for the overall linear hypothesis H0: RB-r=0 which may then be specialized to deal with any specific application. Given the least squares estimator, the next step is to compute the vector (Rβ-r). This measures the discrepancy between expectation and observation. If this vector is, to a certain extent, "large", it raises questions about the null hypothesis. Conversely, if it is "small", it generally doesn't challenge the null hypothesis. In traditional testing procedures, the demarcation between large and small is determined by referencing the appropriate sampling distribution under the null hypothesis, in this instance, the distribution of Rβ when Rβ = r. From Equation (22), it follows directly that _E (Rβ) = RB_ (32) and from Equation (23), $ var (Rβ) = ER(β-β) (β-β)' R$ $ = R [var (β)] R'$ $ = σ² R(X'X)¯¹ R'$ (33) Having obtained the mean (Equation (32)) and variance (Equation (33)) of the Rβ vector, one more assumption is required to determine the sampling distribution. Given that _β_ is a function of the u vector, the sampling distribution of Rβ will be determined by the distribution of u. Assuming that _u_i is normally distributed in addition to the assumptions about u in Equations (18) and (19), we have $ u ~ N (0, σ²I)$ (34) Since the linear combinations of normal variables are also normally distributed, it follows directly that $ β ~ N (β, σ²(X'X)¯¹)$ (35) $ RB ~ N (RB, σ² R(X'X)¯¹ R')$ (36) $Rβ - β ~ N (0, σ² R(X'X)¯¹ R')$ (37) If the null hypothesis _Rβ_ = _r_ is true, then $ RB - r ~ N (0, σ² R(X'X)¯¹ R') $ (38) This gives the sampling distribution of _Rβ_ - _r_, with which a _χ²_q variable is derives as $ (Rβ-r)'[σ²R(X'X)¯¹ R']-1 (Rβ-r) ~ χ²q$ (39) A major challenge to the practical application of the Equation (39) is the presence of _σ_2 which is unknown; however, it can be estimated using as $\frac{e'e}{n-k} \approx σ² $ (40) such that Equation (39) becomes $\frac{(Rβ-r)' [R(X'X)¯¹R']-1 (Rβ-r)}{e'e/(n-k)} ~ F(q, n-k) $ (41)

STA 773 Advanced Econometric Methods Lecture 2 K-variables PDF

Document Details

Tags

Related

Summary

Full Transcript