Advanced Econometrics (ECO-609) Handouts PDF

Document Details

CongenialForethought

Uploaded by CongenialForethought

Virtual University of Pakistan

Tags

econometrics matrix algebra time series analysis econometric modeling

Summary

These handouts cover Advanced Econometrics (ECO-609) focusing on matrix algebra for OLS estimations, time series econometrics, panel data analysis, and practical examples using Eviews. They also include topics like volatility modeling and spatial econometrics for Pakistan's context. Notably, the material emphasizes both theoretical and practical applications, potentially beneficial for students of economics.

Full Transcript

Advanced Econometrics (ECO-609) VU Advanced Econometrics (ECO-609) Table of Contents Lesson No. Lesson Title...

Advanced Econometrics (ECO-609) VU Advanced Econometrics (ECO-609) Table of Contents Lesson No. Lesson Title Pg. No. Lesson 01 Essentials Of Matrix Algebra In Ols Estimations By Using Matrix Approach 2 Lesson 02 Types Of Matrices 5 Lesson 03 Matrix Operations 7 Lesson 04 Matrix Determinants 12 Lesson 05 Matrix Approach to Linear Regression Model 15 Lesson 06 OLS Estimation 25 Lesson 07 Coefficient of Determination R2 in Matrix Notation 32 Lesson 08 Prediction Using Multiple Regression: Matrix Formulation 37 Lesson 09 Time Series Econometrics 45 Lesson 10 Some Basic Concepts of Time Series 52 Lesson 11 Unit Root Stochastic Process 57 Lesson 12 Integrated Stochastic Processes 61 Lesson 13 The Unit Root Test 68 Lesson 14 Transforming Nonstationary Time Series 73 Lesson 15 Empirical Analysis of Time Series Data 79 Lesson 16 Use of Eviews for Data Analysis 87 Lesson 17 OLS Estimations in Eviews 93 Lesson 18 Time Series Estimations in Eviews 98 Lesson 19 Economic Analysis for Pakistan: Use of Eviews 104 Lesson 20 Case Study Of Pakistan: Use of Eviews 110 Lesson 21 Panel Data Analysis 115 Lesson 22 Case Study Of Pakistan: Use of Eviews for Panel Data 121 Lesson 23 Case Study Of Pakistan: Use of Eviews for Panel Data Handling 128 Lesson 24 Functional Forms of Regression Models 135 Lesson 25 Functional Forms of Regression Models 141 Lesson 26 Functional Forms of Regression Models 145 Lesson 27 Model Specification and Diagnostic Testing 149 Lesson 28 Tests of Specification Errors 153 Lesson 29 Practical Examples Using Eviews 162 Lesson 30 Intrinsically Linear and Intrinsically Nonlinear Regression Models 166 Lesson 31 Spatial Econometrics 175 Lesson 32 Spatial Econometrics 185 Lesson 33 An Applied Illustration 194 Lesson 34 Practical Example of Spatial Analysis in Social Sciences 201 Lesson 35 The Seemingly Unrelated Regressions (SUR) Model 202 Lesson 36 Seemingly Unrelated Regression Analysis of Bank Lending 210 Lesson 37 Empirical Results and Interpretation 215 Lesson 38 Empirical Results and Interpretation 218 Lesson 39 The Use Of Seemingly Unrelated Regression (SUR) Carcass Composition 222 Lesson 40 Modelling Volatility 228 Lesson 41 Volatility Measurement 234 Lesson 42 Implied Volatility Models 237 Lesson 43 ARCH Implications 243 Lesson 44 Generalized ARCH (GARCH) Models 247 Lesson 45 ARCH/GARCH Diagnostics 257 ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 1 Advanced Econometrics (ECO-609) VU Lesson 1 Essentials of Matrix Algebra in OLS Estimations by Using Matrix Approach Topic 001: Importance of This Course Econometrics deals with the measurement of economic relationships. It is an integration of economics, mathematics and statistics with an objective to provide numerical values to the parameters of economic relationships. The relationships of economic theories are usually expressed in mathematical forms and combined with empirical economics. The econometrics methods are used to obtain the values of parameters which are essentially the coefficients of the mathematical form of the economic relationships. The statistical methods which help in explaining the economic phenomenon are adapted as econometric methods. This course also focuses on practical aspects of econometrics in handling the data and empirical investigation. The students of economics are well equipped with theoretical aspects of economic theories and now we investigate economic theories empirically. Topic 002: The Design of the Course In this course, our focus will be on data handling for time series, panel and cross sectional perspectives. This course will also help the students in hands on practical work by using EVIEWS software. Some data, statistical tools and research papers will be shared with students. After this course, students can conduct research on any socio-economic issue independently. Topic 003: What is Matrix? A matrix is a rectangular array of numbers or elements arranged in rows and columns. More precisely, a matrix of order, or dimension, M by N (written as M× N) is a set of M × N elements arranged in M rows and N columns. Thus, letting boldface letters denote matrices, an (M × N) matrix A may be expressed as ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 2 Advanced Econometrics (ECO-609) VU where aij is the element appearing in the ith row and the jth column of A and where [aij] is a shorthand expression for the matrix A whose typical element is aij. The order, or dimension, of a matrix—that is, the number of rows and columns—is often written underneath the matrix for easy reference. Scalar. A scalar is a single (real) number. Alternatively, a scalar is a 1×1 matrix. Topic 004: Column Vector A matrix consisting of M rows and only one column is called a column vector. Letting the boldface lowercase letters denote vectors, an example of a column vector is Row Vector A matrix consisting of only one row and N columns is called a row vector. Topic 005: Matrix Transpose ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 3 Advanced Econometrics (ECO-609) VU The transpose of an M× N matrix A, denoted by A’ (read as A prime or A transpose) is an N × M matrix obtained by interchanging the rows and columns of A’; that is, the ith row of A becomes the ith column of A’. For example, Since a vector is a special type of matrix, the transpose of a row vector is a column vector and the transpose of a column vector is a row vector. Thus We shall follow the convention of indicating the row vectors by primes. Topic 006: Submatrix Given any M× N matrix A, if all but r rows and s columns of A are deleted, the resulting matrix of order r × s is called a submatrix of A. Thus, if and we delete the third row and the third column of A, we obtain which is a submatrix of A whose order is 2 × 2. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 4 Advanced Econometrics (ECO-609) VU Lesson 2 Types of Matrices Topic 007: Types of Matrix: Square Matrix A matrix that has the same number of rows as columns is called a square matrix. Topic 008: Diagonal Matrix A square matrix with at least one nonzero element on the main diagonal (running from the upper- left-hand corner to the lower-right-hand corner) and zeros elsewhere is called a diagonal matrix. Topic 009: Scalar Matrix A diagonal matrix whose diagonal elements are all equal is called a scalar matrix. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 5 Advanced Econometrics (ECO-609) VU Topic 10: Identity, or Unit Matrix A diagonal matrix whose diagonal elements are all 1 is called an identity, or unit, matrix and is denoted by I. It is a special kind of scalar matrix. Topic 11: Symmetric Matrix A square matrix whose elements above the main diagonal are mirror images of the elements below the main diagonal is called a symmetric matrix. Alternatively, a symmetric matrix is such that its transpose is equal to itself; that is, A = A’. That is, the element aij of A is equal to the element aji of A’. Topic 12: Null Matrix A matrix whose elements are all zero is called a null matrix and is denoted by 0. Topic 13: Null Vector A row or column vector whose elements are all zero is called a null vector and is also denoted by 0. Topic 14: Equal Matrices Two matrices A and B are said to be equal if they are of the same order and their corresponding elements are equal; that is, aij = bij for all i and j. For example, the matrices, are equal; that is A = B. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 6 Advanced Econometrics (ECO-609) VU ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 7 Advanced Econometrics (ECO-609) VU Lesson 3 MATRIX OPERATIONS Topic 15: Matrix Operation: Matrix Addition Let A = [aij] and B = [bij]. If A and B are of the same order, we define matrix addition as, A+B=C where C is of the same order as A and B and is obtained as cij = aij + bij for all i and j; that is, C is obtained by adding the corresponding elements of A and B. If such addition can be affected, A and B are said to be conformable for addition. For example, if and C = A + B, then Topic 16: Matrix Subtraction Matrix subtraction follows the same principle as matrix addition except that C = A − B; that is, we subtract the elements of B from the corresponding elements of A to obtain C, provided A and B are of the same order. Topic 17: Scalar Multiplication To multiply a matrix A by a scalar λ (a real number), we multiply each element of the matrix by λ: ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 8 Advanced Econometrics (ECO-609) VU For example, if λ = 2 and Then Topic 18: Matrix Multiplication Let A be M× N and B be N × P. Then the product AB (in that order) is defined to be a new matrix C of order M× P such that, That is, the element in the ith row and the jth column of C is obtained by multiplying the elements of the ith row of A by the corresponding elements of the jth column of B and summing over all terms; this is known as the row by column rule of multiplication. Thus, to obtain c11 the element in the first row and the first column of C, we multiply the elements in the first row of A by the corresponding elements in the first column of Band sum over all terms. Similarly, to obtain c12, we multiply the elements in the first row of A by the corresponding elements in the second column of B and sum over all terms, and so on. Note that for multiplication to exist, matrices A and B must be conformable with respect to multiplication; that is, the number of columns in A must be equal to the number of rows in B. If, for example, ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 9 Advanced Econometrics (ECO-609) VU But if the product AB is not defined since A and B are not conformable with respect to multiplication. Topic 19: Properties of Matrix Multiplication 1. Matrix multiplication is not necessarily commutative; that is, in general, AB ≠ BA. Therefore, the order in which the matrices are multiplied is very important. AB means that A is post multiplied by B or B is pre multiplied by A. 2. Even if AB and BA exist, the resulting matrices may not be of the same order. Thus, if A is M× N and B is N × M, AB is M × M whereas BA is N × N, hence of different order. 3. Even if A and B are both square matrices, so that AB and BA are both defined, the resulting matrices will not be necessarily equal. For example, if Then and AB ≠ BA. An example of AB = BA is when both A and B are identity matrices. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 10 Advanced Econometrics (ECO-609) VU 4. A row vector post multiplied by a column vector is a scalar. Thus, consider the ordinary least-squares residuals 𝑢̂1 , 𝑢̂2 , … 𝑢̂𝑛. Letting u be a column vector and u’ be a row vector, we have 5. A column vector post multiplied by a row vector is a matrix. As an example, consider the population disturbances of the classical linear regression model, namely, u1, u2,... , un. Letting u be a column vector and u’ a row vector, we obtain which is a matrix of order n × n. Note that the preceding matrix is symmetrical. 6. A matrix post multiplied by a column vector is a column vector. 7. A row vector post multiplied by a matrix is a row vector. 8. Matrix multiplication is associative; that is, (AB)C = A(BC), where A is M× N, B is N × P, and C is P × K. 9. Matrix multiplication is distributive with respect to addition; that is, A(B + C) = AB + AC and (B + C)A = BA + CA. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 11 Advanced Econometrics (ECO-609) VU Topic 20: Matrix Transposition We have already defined the process of matrix transposition as interchanging the rows and the columns of a matrix (or a vector). We now state some of the properties of transposition. 1. The transpose of a transposed matrix is the original matrix itself. Thus, (A’)’= A. 2. If A and B are conformable for addition, then C = A + B and C’= (A + B)’= A’+ B’. That is, the transpose of the sum of two matrices is the sum of their transposes. 3. If AB is defined, then (AB)’= B’A’. That is, the transpose of the product of two matrices is the product of their transposes in the reverse order. This can be generalized: (ABCD)’= D’C’B’A’. 4. The transpose of an identity matrix I is the identity matrix itself; that is I’= I. 5. The transpose of a scalar is the scalar itself. Thus, if λ is a scalar, λ’ = λ. 6. The transpose of (λA)’ is λA’ where λ is a scalar. [Note: (λA)’ = A’λ’ = A’λ = λA’.] 7. If A is a square matrix such that A = A’, then A is a symmetric matrix. Topic 21: Matrix Inversion An inverse of a square matrix A, denoted by A−1 (read A inverse), if it exists, is a unique square matrix such that where I is an identity matrix whose order is the same as that of A. For example We shall see how A−1 is computed after we study the topic of determinants. In the meantime, note these properties of the inverse. 1. (AB)−1 = B−1 A−1; that is, the inverse of the product of two matrices is the product of their inverses in the reverse order. 2. (A−1)’= (A’)−1; that is, the transpose of A inverse is the inverse of A transpose. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 12 Advanced Econometrics (ECO-609) VU Lesson 4 Matrix Determinants Topic 22: Matrix Determinants To every square matrix, A, there corresponds a number (scalar) known as the determinant of the matrix, which is denoted by Det A or by the symbol |A|, where || means “the determinant of.” Note that a matrix per se has no numerical value, but the determinant of a matrix is a number. The |A| in this example is called a determinant of order 3 because it is associated with a matrix of order 3 × 3. Topic 23: Evaluation of a Determinant The process of finding the value of a determinant is known as the evaluation, expansion, or reduction of the determinant. This is done by manipulating the entries of the matrix in a well- defined manner. Evaluation of a 2 × 2 Determinant. If its determinant is evaluated as follows: ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 13 Advanced Econometrics (ECO-609) VU which is obtained by cross-multiplying the elements on the main diagonal and subtracting from it the cross-multiplication of the elements on the other diagonal of matrix A, as indicated by the arrows. Evaluation of a 3 × 3 Determinant. If Then A careful examination of the evaluation of a 3 × 3 determinant shows: 1. Each term in the expansion of the determinant contains one and only one element from each row and each column. 2. The number of elements in each term is the same as the number of rows (or columns) in the matrix. Thus, a 2 × 2 determinant has two elements in each term of its expansion, a 3 × 3 determinant has three elements in each term of its expansion, and so on. 3. The terms in the expansion alternate in sign from + to −. 4. A 2 × 2 determinant has two terms in its expansion, and a 3 × 3 determinant has six terms in its expansion. The general rule is: The determinant of order N × N has N! = N(N − 1)(N − 2) ··· 3 · 2 · 1 terms in its expansion, where N! is read “N factorial.” Following this rule, a determinant of order 5 × 5 will have 5 · 4 · 3 · 2 · 1 = 120 terms in its expansion. Topic 24 and 25: Properties of Determinants and Examples 1. A matrix whose determinantal value is zero is called a singular matrix, whereas a matrix with a nonzero determinant is called a nonsingular matrix. The inverse of a matrix as defined before does not exist for a singular matrix. 2. If all the elements of any row of A are zero, its determinant is zero. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 14 Advanced Econometrics (ECO-609) VU Thus, 3. |A’|=|A|; that is, the determinants of A and A transpose are the same. 4. Interchanging any two rows or any two columns of a matrix A changes the sign of |A|. if where B is obtained by interchanging the rows of A, then 5. If every element of a row or a column of A is multiplied by a scalar λ, then |A| is multiplied by λ. If and we multiply the first row of A by 5 to obtain it can be seen that |A|= 36 and |B|= 180, which is 5 |A|. 6. If two rows or columns of a matrix are identical, its determinant is zero. 7. If one row or a column of a matrix is a multiple of another row or column of that matrix, its determinant is zero. Thus, if ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 15 Advanced Econometrics (ECO-609) VU where the first row of A is twice its second row, |A|= 0. More generally, if any row (column) of a matrix is a linear combination of other rows (columns), its determinant is zero. 8. |AB|=|A||B|; that is, the determinant of the product of two matrices is the product of their (individual) determinants. Topic 26: Rank of a Matrix The rank of a matrix is the order of the largest square submatrix whose determinant is not zero. It can be seen that |A|= 0. In other words, A is a singular matrix. Hence although its order is 3 × 3, its rank is less than 3. Actually, it is 2, because we can find a 2 × 2 submatrix whose determinant is not zero. For example, if we delete the first row and the first column of A, we obtain whose determinant is −6, which is nonzero. Hence the rank of A is 2. As noted previously, the inverse of a singular matrix does not exist. Therefore, for an N × N matrix A, its rank must be N for its inverse to exist; if it is less than N, A is singular. Topic 27: Minor If the ith row and jth column of an N × N matrix A are deleted, the determinant of the resulting submatrix is called the minor of the element aij (the element at the intersection of the ith row and the jth column) and is denoted by |Mij| ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 16 Advanced Econometrics (ECO-609) VU If The minor of a11 is Similarly, the minor of a21 is The minors of other elements of A can be found similarly. Topic 28: Cofactor The cofactor of the element aij of an N × N matrix A, denoted by cij, is defined as In other words, a cofactor is a signed minor, the sign being positive if i + j is even and being negative if i + j is odd. Thus, the cofactor of the element a11 of the 3 × 3 matrix A given previously is a22a33 − a23a32, whereas the cofactor of the element a21 is −(a12a33 − a13a32) since the sum of the subscripts 2 and 1 is 3, which is an odd number. Cofactor Matrix. Replacing the elements aij of a matrix A by their cofactors, we obtain a matrix known as the cofactor matrix of A, denoted by (cof A). Adjoint Matrix. The adjoint matrix, written as (adj A), is the transpose of the cofactor matrix; that is, (adj A) = (cof A)’. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 17 Advanced Econometrics (ECO-609) VU Lesson 5 THE MATRIX APPROACH TO LINEAR REGRESSION MODEL Topic 29: Finding the Inverse of a Square Matrix If A is square and nonsingular (that is, |A| ≠ 0), its inverse A−1 can be found as follows: The steps involved in the computation are as follows: 1. Find the determinant of A. If it is nonzero, proceed to step 2. 2. Replace each element aij of A by its cofactor to obtain the cofactor matrix. 3. Transpose the cofactor matrix to obtain the adjoint matrix. 4. Divide each element of the adjoint matrix by |A|. Topic 30: Example of fining inverse of A matrix If we have a 3 × 3 matric Step 1. We first find the determinant of the matrix. Applying the rules of expanding a 3 × 3 determinant given previously, we obtain |A|=−24. Step 2. We now obtain the cofactor matrix, say, C ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 18 Advanced Econometrics (ECO-609) VU Step 3. Transposing the preceding cofactor matrix, we obtain the following adjoint matrix: Step 4. We now divide the elements of (adj A) by the determinantal value of −24 to obtain. It can be readily verified that Which is an identity matrix. Topic 31: Matrix Differentiation There are two rules regarding matrix differentiation. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 19 Advanced Econometrics (ECO-609) VU Rule 1: Rule 2: Topic 32: The Matrix Approach to Linear Regression Model The classical linear regression model involve a k variables (Y and X2, X3,..., Xk) in matrix algebra notation. Conceptually, the k-variable model is a logical extension of the two and three variable models. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 20 Advanced Econometrics (ECO-609) VU A great advantage of matrix algebra over scalar algebra (elementary algebra dealing with scalars or real numbers) is that it provides a compact method of handling regression models involving any number of variables; once the k-variable model is formulated and solved in matrix notation, the solution applies to one, two, three, or any number of variables. Topic 33: The K-Variable Linear Regression Model The k-variable population regression model (PRF) involving the dependent variable Y and k − 1 explanatory variables X2, X3,... , Xk may be written as, where β1 = the intercept, β2 to βk = partial slope coefficients, u = stochastic disturbance term, and i = ith observation, n being the size of the population. The PRF is to be interpreted in the usual manner: It gives the mean or expected value of Y conditional upon the fixed (in repeated sampling) values of X2, X3,... , Xk, that is, E(Y | X2, X3,... , Xki). Equation of PRF is a shorthand expression for the following set of n simultaneous equations: Let us write the system of equations in an alternative but more illuminating way as follows where y = n× 1 column vector of observations on the dependent variable Y ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 21 Advanced Econometrics (ECO-609) VU X = n× k matrix giving n observations on k− 1 variables X2 to Xk, the first column of 1’s representing the intercept term (this matrix is also known as the data matrix) β = k× 1 column vector of the unknown parameters β1, β2,... , βk u = n× 1 column vector of n disturbances ui Using the rules of matrix multiplication and addition, the reader should verify that above two systems of equation are equivalent. It is known as the matrix representation of the general (k-variable) linear regression model. Topic 34: Example The K-Variable Linear Regression Model An illustration of the matrix representation, consider the two-variable consumption–income model considered as Yi = β1 + β2 Xi + ui where Y is consumption expenditure and X is income. Using the example data, we may write the matrix formulation as ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 22 Advanced Econometrics (ECO-609) VU As in the two- and three-variable cases, our objective is to estimate the parameters of the multiple regression and to draw inferences about them from the data at hand. In matrix notation this amounts to estimating β and drawing inferences about this β. Topic 35: Assumptions of The Classical Linear Regression Model in Matrix Notation The assumptions underlying the classical linear regression model are as follows Assumption 1 given in means that the expected value of the disturbance vector u, that is, of each of its elements, is zero. More explicitly, E(u) = 0 means Assumption 2 is a compact way of expressing the two assumptions, And in matrix form it can be written as, To see this, we can write where u` is the transpose of the column vector u, or a row vector. Performing the multiplication, we obtain ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 23 Advanced Econometrics (ECO-609) VU Applying the expectations operator E to each element of the preceding matrix, we obtain by the scalar notation. To see this, we can write Because of the assumptions of homoscedasticity and no serial correlation, matrix reduces to where I is an n× n identity matrix. Assumption 3 states that the n× k matrix X is non-stochastic; that is, it consists of fixed numbers. As noted previously, our regression analysis is conditional regression analysis, conditional upon the fixed values of the X variables. Assumption 4 states that the X matrix has full column rank equal to k, the number of columns in the matrix. This means that the columns of the X matrix are linearly independent; that is, there is no exact linear relationship among the X variables. In other words, there is no multicollinearity. In scalar notation this is equivalent to saying that there exists no set of numbers λ1, λ2,... , λk not all zero such that ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 24 Advanced Econometrics (ECO-609) VU where X1i = 1 for all i (to allow for the column of 1’s in the X matrix). In matrix notation, can be represented as ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 25 Advanced Econometrics (ECO-609) VU Lesson 6 OLS ESTIMATION Topic 36: OLS Estimation To obtain the OLS estimate of β, let us first write the k-variable sample regression (SRF): which can be written more compactly in matrix notation as and in matrix form as where 𝛽̂ is a k-element column vector of the OLS estimators of the regression coefficients and where 𝑢̂ is an n× 1 column vector of n residuals. As in the two- and three-variable models, in the k-variable case the OLS estimators are obtained by minimizing, Where ∑ 𝑢̂𝑖2 is the residual sum of squares (RSS). In matrix notation, this amounts to minimizing 𝑢̂′𝑢̂ since ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 26 Advanced Econometrics (ECO-609) VU We obtain Therefore Topic 37: OLS Estimation-1 In scalar notation, the method of OLS consists of estimating β1, β2,... , βk that is ∑ 𝑢̂𝑖2 as small as possible. This is done by differentiating partially with respect to β1, β2,... , βk and setting the resulting expressions to zero. This process yields k simultaneous equations in k unknowns, the normal equations of the least-squares theory. These equations are as follows: In matrix form, it can be represented as ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 27 Advanced Econometrics (ECO-609) VU or, more compactly, as Note these features of the (X’X) matrix: (1) It gives the raw sums of squares and cross products of the X variables, one of which is the intercept term taking the value of 1 for each observation. The elements on the main diagonal give the raw sums of squares, and those off the main diagonal give the raw sums of cross products (by raw we mean in original units of measurement). (2) It is symmetrical since the cross-product ben and X3i is the same as that between X3i and X2i. (3) It is of order (k × k), that is, k rows and k columns. The known quantities are (X`X) and (X`y) (the cross product between the X variables and y) and the unknown is βˆ. Now using matrix algebra, if the inverse of (X`X) exists, say, (X`X)−1, then premultiplying both sides of by this inverse, we obtain Or The final equation is a fundamental result of the OLS theory in matrix notation. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 28 Advanced Econometrics (ECO-609) VU Topic 38: Example OLS Estimation As an illustration of the matrix methods developed so far, let us rework the consumption–income model discussed earlier. Yi = β1 + β2 Xi + ui where Y is consumption expenditure and X is income. For the two-variable case we have And Estimation with data and Using the rules of matrix inversion given in previous modules, we can see that the inverse of the preceding (X`X) matrix is ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 29 Advanced Econometrics (ECO-609) VU Therefore, Previously we obtained β1 = 24.4545 and β2 = 0.5091 using the computer. The difference between the two estimates is due to the rounding errors. In passing, note that in working on a desk calculator it is essential to obtain results to several significant digits to minimize the rounding errors. Topic 39: Variance–Covariance Matrix of β Matrix methods enable us to develop formulas not only for the variance of βi any given element of β, but also for the covariance between any two elements of β, say, βi and βj.We need these variances and covariances for the purpose of statistical inference. By definition, the variance–covariance matrix of β is which can be written explicitly as The preceding variance-covariance matrix can be obtained from the following formula: ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 30 Advanced Econometrics (ECO-609) VU where σ2 is the homoscedastic variance of ui and (X’X)−1 is the inverse matrix, which gives the OLS estimator β. where there are now n − k df. Although in principle u`u can be computed from the estimated residuals, in practice it can be obtained directly as follows. Recalling that (= RSS) = TSS − ESS, in the two-variable case we may write, and in the three-variable case By extending this principle, it can be seen that for the k-variable model In matrix notation, where the term nY2 is known as the correction for mean. Therefore, ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 31 Advanced Econometrics (ECO-609) VU Once u`u is obtained, σ2 can be easily computed, which, in turn, will enable us to estimate the variance–covariance matrix. Topic 40: Example, Variance–Covariance Matrix of β By using data from previous example of OLS Hence, σ2 = (337.273/8) = 42.1591. Topic 41: Properties of OLS Vector β In the two- and three-variable cases we know that the OLS estimators are linear and unbiased, and in the class of all linear unbiased estimators they have minimum variance (the Gauss– Markov property). In short, the OLS estimators are best linear unbiased estimators (BLUE). This property extends to the entire β vector; that is, β is linear (each of its elements is a linear function of Y, the dependent variable). E(β) = β, that is, the expected value of each element of βˆ is equal to the corresponding element of the true β, and in the class of all linear unbiased estimators of β, the OLS estimator β has minimum variance. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 32 Advanced Econometrics (ECO-609) VU Lesson 7 THE COEFFICIENT OF DETERMINATION R2 IN MATRIX NOTATION Topic 42: The Coefficient of Determination R2 in Matrix Notation The coefficient of determination R2 has been defined as In the two-variable case, and in the three-variable case Generalizing we obtain for the k-variable case Rewriting the with matrix algebra equation explained in previous topic which gives the matrix representation of R2. Topic 43: The Correlation Matrix In the k-variable case, we shall have in all k(k− 1)/2 zero-order correlation coefficients. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 33 Advanced Econometrics (ECO-609) VU These k(k− 1)/2 correlations can be put into a matrix, called the correlation matrix R as follows: where the subscript 1, as before, denotes the dependent variable Y (r12 means correlation coefficient between Y and X2, and so on) and where use is made of the fact the coefficient of correlation of a variable with respect to itself is always 1(r11 = r22 =···= rkk = 1). From the correlation matrix R one can obtain correlation coefficients of first order and of higher order such as r12. 34... k. Topic 44: Hypothesis Testing About Individual Regression Coefficients in Matrix Notation If our objective is inference as well as estimation, we shall have to assume that the disturbances ui follow some probability distribution. Also for reasons given previously, in regression analysis we usually assume that each ui follows the normal distribution with zero mean and constant variance σ2. In matrix notation, we have where u and 0 are n × 1 column vectors and I is an n × n identity matrix, 0 being the null vector. Given the normality assumption, we know that in two- and three-variable linear regression models (1) the OLS estimators βiˆ and the ML estimators ˜βi are identical, but the ML estimator ˜σ2 is biased, although this bias can be removed by using the unbiased OLS estimator σˆ2; and (2) the OLS estimators βiˆ are also normally distributed. Generalizing, in the k-variable case we can show that ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 34 Advanced Econometrics (ECO-609) VU that is, each element of βˆ is normally distributed with mean equal to the corresponding element of true β and the variance given by σ2 times the appropriate diagonal element of the inverse matrix (X’X)−1. Since in practice σ2 is unknown, it is estimated by ˆσ2. Then by the usual shift to the t distribution, it follows that each element of βˆ follows the t distribution with n− k df. Symbolically, with n − k df, where βiˆ is any element of βˆ. The t distribution can therefore be used to test hypotheses about the true βi as well as to establish confidence intervals about it. Topic 45: Testing the Overall Significance of Regression: Analysis of Variance in Matrix Notation The ANOVA technique can be easily extended to the k-variable case. Recall that the ANOVA technique consists of decomposing the TSS into two components: the ESS and the RSS. The degrees of freedom associated with these sums of squares are n − 1, k − 1, and n − k, respectively, Assuming in a three variable regression model the disturbances ui are normally distributed and the null hypothesis is β2 = β3 =···= βk = 0. follows the F distribution with k − 1 and n − k df. There is a close relationship between F and R2, namely, ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 35 Advanced Econometrics (ECO-609) VU Topic 46: Matrix Formulation of ANOVA Table for K-Variable Linear Regression Model Topic 47: Testing Linear Restrictions: General F Testing Using Matrix Notation we introduced the general F test to test the validity of linear restrictions imposed on one or more parameters of the k-variable linear regression model. Let ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 36 Advanced Econometrics (ECO-609) VU The matrix counterpart of is then, which follows the F distribution with (m, n− k) df. As usual, if the computed F value exceeds the critical F value, we can reject the restricted regression; otherwise, we do not reject it. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 37 Advanced Econometrics (ECO-609) VU Lesson 8 Prediction Using Multiple Regression: Matrix Formulation Topic 48: Prediction Using Multiple Regression: Matrix Formulation Using scalar notation, how the estimated multiple regression can be used for predicting (1) the mean and (2) individual values of Y, given the values of the X regressors. In this module, we show how to express these predictions in matrix form. We also present the formulas to estimate the variances and standard errors of the predicted values. These formulas are better handled in matrix notation, for the scalar or algebraic expressions of these formulas become rather unwieldy. Topic 49: Mean Prediction Let be the vector of values of the X variables for which we wish to predict ˆY0, the mean prediction of Y. Now the estimated multiple regression, in scalar form, is which in matrix notation can be written compactly as ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 38 Advanced Econometrics (ECO-609) VU Previous Equations are of course the mean prediction of Yi corresponding to given x’i. Topic 50: Variance of Mean Prediction where σ2 is the variance of ui, x’0 are the given values of the X variables for which we wish to predict. In practice, we replace σ2 by its unbiased estimator ˆσ2. Topic 51: Individual Prediction In matrix notation can be written compactly the individual prediction for a standard regression model, as If x’i is as given it becomes where, of course, the values of x0 are specified. It gives unbiased prediction. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 39 Advanced Econometrics (ECO-609) VU Variance of Individual Prediction The formula for the variance of an individual prediction is as follows Topic 52, 53 and 54: Summary of The Matrix Approach: An Illustrative Example Consider the data given in below Table. These data pertain to per capita personal consumption expenditure (PPCE) and per capital personal disposable income (PPDI) and time or the trend variable. By including the trend variable in the model, we are trying to find out the relationship of PPCE to PPDI net of the trend variable (which may represent a host of other factors, such as technology, change in tastes, etc.) For empirical purposes, therefore, the regression model is where Y = per capita consumption expenditure, X2 = per capita disposable income, and X3 = time. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 40 Advanced Econometrics (ECO-609) VU In matrix notation, our problem may be shown as follows: From the preceding data we obtain the following quantities: ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 41 Advanced Econometrics (ECO-609) VU Using the rules of matrix inversion The residual sum of squares can now be computed as whence we obtain The variance–covariance matrix for ˆβ can therefore be shown as ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 42 Advanced Econometrics (ECO-609) VU The diagonal elements of this matrix give the variances of ˆβ1, ˆβ2, and ˆβ3, respectively, and their positive square roots give the corresponding standard errors. For estimating the R2 from the previous data, it can be readily verified that Therefore, the adjusted coefficient of determination can be seen to be Collecting our results thus far, we have recall that a null hypothesis like β2 = β3 = 0, simultaneously, can be tested by the analysis of variance technique and the attendant F test, ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 43 Advanced Econometrics (ECO-609) VU which is distributed as the F distribution with 2 and 12 df. The computed F value is obviously highly significant; we can reject the null hypothesis that β2 = β3 = 0, that is, that per capita personal consumption expenditure is not linearly related to per capita disposable income and trend. The correlation matrix R. For our data, the correlation matrix is as follows: we have bordered the correlation matrix by the variables of the model so that we can readily identify which variables are involved in the computation of the correlation coefficient. Topic 55: Generalized Least Squares (GLS) On several occasions we have mentioned that OLS is a special case of GLS. To see this, take into account heteroscedastic variances and autocorrelations in the error terms, assume that where V is a known n × n matrix. Therefore, if our model is: where E(u) = 0 and var-cov (u) = σ2V. In case σ2 is unknown, which is typically the case, V then represents the assumed structure of variances and covariances among the random errors ut. Under the stated condition of the variance–covariance of the error terms, it can be shown that. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 44 Advanced Econometrics (ECO-609) VU βgls is known as the generalized least-squares (GLS) estimator of β. It can also be shown that It can be proved that βgls is the best linear unbiased estimator of β. Topic 56: K-Variable Regression Model in Original Units and in The Deviation Form ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 45 Advanced Econometrics (ECO-609) VU Lesson 9 TIME SERIES ECONOMETRICS Topic 57: Types of Data Used in Empirical Analysis; Time Series First, empirical work based on time series data assumes that the underlying time series is stationary. Second, sometimes autocorrelation results because the underlying time series is nonstationary. Third, in regressing a time series variable on another time series variable(s), one often obtains a very high R2 (in excess of 0.9) even though there is no meaningful relationship between the two variables. Sometimes we expect no relationship between two variables, yet a regression of one on the other variable often shows a significant relationship. This situation exemplifies the problem of spurious, or nonsense regression, whose nature will be explored shortly. It is therefore very important to find out if the relationship between economic variables is spurious or nonsensical. Fourth, some financial time series, such as stock prices, exhibit what is known as the random walk phenomenon. This means the best prediction of the price of a stock, say IBM, tomorrow is equal to its price today plus a purely random shock (or error term). If this were in fact the case, forecasting asset prices would be a futile exercise. Fifth, regression models involving time series data are often used for forecasting. In view of the preceding discussion, we would like to know if such forecasting is valid if the underlying time series are not stationary. Finally, causality tests of Granger and Sims assume that the time series involved in analysis are stationary. Therefore, tests of stationarity should precede tests of causality. Topic 58 : A Look at Selected Economic Time Series ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 46 Advanced Econometrics (ECO-609) VU In WDI data set of World Bank, a lot of economic time series are available for empirical analysis and you can choose the variables of your choice according to objectives of the research. Topic 59 : A Look at Selected Economic Time Series of Pakistan GDP GDP per Years growth capita FDI GFCF Inflation GDP(Billion) 1972 0.813406 153.384 0.180563 12.46136 5.183238 9.415016 1973 7.064264 101.1647 -0.06266 11.33023 23.07008 6.383429 1974 3.540192 137.1089 0.044948 12.0474 26.66303 8.899192 1975 4.211416 168.0804 0.222606 14.58676 20.90451 11.23061 1976 5.15619 191.3011 0.062428 17.46648 7.158324 13.16808 1977 3.947698 213.1687 0.100642 17.64364 10.13297 15.12606 1978 8.048534 243.3358 0.181193 16.42338 6.138693 17.81152 1979 3.758436 260.5623 0.295881 16.21476 8.267047 19.68838 1980 10.2157 303.051 0.269011 16.81449 11.93823 23.65444 1981 7.920764 348.2951 0.384635 17.1487 11.87991 28.10061 1982 6.537487 368.2774 0.20775 16.84019 5.903529 30.72597 1983 6.778378 332.5211 0.102667 16.94929 6.362033 28.69189 1984 5.065206 349.1821 0.178192 16.48682 6.087167 31.15183 1985 7.592115 337.8285 0.421864 16.50404 5.614839 31.14492 1986 5.501654 335.0201 0.331453 17.01449 3.506414 31.89907 1987 6.452343 339.3323 0.387921 17.47488 4.681219 33.35153 1988 7.625279 379.4545 0.484737 16.47436 8.837937 38.47274 1989 4.959769 384.3643 0.524258 17.30053 7.844265 40.17102 1990 4.458587 371.6786 0.612998 17.29975 9.052132 40.01042 1991 5.061568 411.8595 0.566385 17.40604 11.79127 45.62523 1992 7.705898 429.1469 0.688315 18.6035 9.509041 48.88461 1993 1.757748 442.4922 0.672761 19.12928 9.973665 51.80995 1994 3.737416 434.4654 0.805119 17.85503 12.36819 52.29346 ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 47 Advanced Econometrics (ECO-609) VU 1995 4.962609 489.8818 1.191753 17.03421 12.34358 60.63602 1996 4.846581 497.2161 1.456056 17.37707 10.37381 63.32012 1997 1.014396 476.3812 1.147229 16.343 11.37549 62.4333 1998 2.550234 461.2167 0.81361 15.04469 6.228004 62.19196 1999 3.660133 454.2761 0.844795 13.93139 4.142637 62.97386 2000 4.260088 576.1956 0.375528 15.97884 4.366665 82.01774 2001 3.554418 544.4943 0.475565 15.80007 3.148261 79.4844 2002 2.508338 534.3039 1.033728 14.51533 3.290345 79.90499 2003 5.777034 599.3763 0.581949 15.07059 2.914135 91.76054 2004 7.54686 687.8364 1.037494 14.83436 7.444625 107.7597 2005 6.518778 748.9226 1.833322 16.12224 9.063327 120.0553 2006 5.898984 836.8605 3.112978 17.73199 7.921084 137.2641 2007 4.832817 908.0951 3.668323 17.18706 7.598684 152.3857 2008 1.701405 990.8466 3.19736 17.60585 20.28612 170.0778 2009 2.831659 957.9957 1.390402 15.94948 13.64777 168.1528 2010 1.606689 987.4097 1.141305 14.20456 12.93887 177.1656 2011 2.748406 1164.976 0.620823 12.52063 11.91609 213.5874 2012 3.507033 1198.109 0.382827 13.47596 9.682352 224.3836 2013 4.396457 1208.904 0.576511 13.35733 7.692156 231.2186 2014 4.674708 1251.164 0.772219 13.03527 7.189384 244.3609 2015 4.731147 1356.668 0.618356 14.10703 2.529328 270.5561 2016 5.526736 1368.454 0.924442 14.08612 3.765119 278.6546 2017 5.554277 1464.993 0.819523 14.55054 4.085374 304.5673 2018 5.836417 1482.306 0.552187 15.74249 5.078057 314.5675 2019 0.988829 1284.702 0.797205 14.01077 10.57836 278.2219 Topic 60: Data Generating Process (DGP) on Pakistan Economy The GDP calculation in Pakistan is flowed by System of National Accounts proposed by United Nations (UN). The System of National Accounts 2008 (2008 SNA) is the latest version of the ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 48 Advanced Econometrics (ECO-609) VU international statistical standard for the national accounts, adopted by the United Nations Statistical Commission (UNSC). The 2008 SNA is an update of the System of National Accounts, 1993 (1993 SNA). The update was in 2003 entrusted to the Intersecretariat Working Group on National Accounts (ISWGNA) to address issues brought about by changes in the economic environment, advances in methodological research and the needs of users. The first seventeen chapters of the 2008 SNA comprising the accounting rules, the accounts and tables, and their integration were adopted by the UNSC in 2008; chapters 18 to 29, comprising the interpretations and extensions of the accounts and tables of the System, were adopted by the UNSC in 2009. The 2008 SNA is the result of a process that was notable for its transparency and the wide involvement of the international statistical community, both of which were made possible by the innovative use of the project's website Towards 2008 SNA as a communication tool. In its adoption of the 2008 SNA the UNSC encouraged Member States, regional and sub-regional organizations to implement its recommendations and use it for the national and international reporting of national accounts statistics. Topic 61: Important Steps to Consider in Time Series Data Handling It consists of concepts such as these: 1. Stochastic processes 2. Stationarity processes 3. Purely random processes 4. Nonstationary processes 5. Integrated variables 6. Random walk models 7. Cointegration ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 49 Advanced Econometrics (ECO-609) VU 8. Deterministic and stochastic trends 9. Unit root tests Topic 62: Time Series Data Management The excel sheet provide the example of time series data. Another excel sheet shows data for economic indicators for Pakistan. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 50 Advanced Econometrics (ECO-609) VU Topic 63: The Role of “Time,’’ or “Lag,’’ in Economics In economics the dependence of a variable Y (the dependent variable) on another variable(s) X (the explanatory variable) is rarely instantaneous. Very often, Y responds to X with a lapse of time. Such a lapse of time is called a lag. Topic 64: The Reasons for Lags There are three main reasons: 1. Psychological reasons. As a result of the force of habit (inertia), people do not change their consumption habits immediately following a price decrease or an income increase perhaps because the process of change may involve some immediate disutility. Thus, those who become instant millionaires by winning lotteries may not change the lifestyles to which they were accustomed for a long time because they may not know how to react to such a windfall gain immediately. Of course, given reasonable time, they may learn to live with their newly acquired fortune. Also, people may not know whether a change is “permanent’’ or “transitory.’’ Thus, my reaction to an increase in my income will depend ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 51 Advanced Econometrics (ECO-609) VU on whether or not the increase is permanent. If it is only a nonrecurring increase and in succeeding periods my income returns to its previous level, I may save the entire increase, whereas someone else in my position might decide to “live it up.’’ 2. Technological reasons. Suppose the price of capital relative to labor declines, making substitution of capital for labor economically feasible. Of course, addition of capital takes time (the gestation period). Moreover, if the drop in price is expected to be temporary, firms may not rush to substitute capital for labor, especially if they expect that after the temporary drop the price of capital may increase beyond its previous level. Sometimes, imperfect knowledge also accounts for lags. At present the market for personal computers is glutted with all kinds of computers with varying features and prices. Moreover, since their introduction in the late 1970s, the prices of most personal computers have dropped dramatically. As a result, prospective consumers for the personal computer may hesitate to buy until they have had time to look into the features and prices of all the competing brands. Moreover, they may hesitate to buy in the expectation of further decline in price or innovations. 3. Institutional reasons. These reasons also contribute to lags. For example, contractual obligations may prevent firms from switching from one source of labor or raw material to another. As another example, those who have placed funds in long-term savings accounts for fixed durations such as 1 year, 3 years, or 7 years, are essentially “locked in’’ even though money market conditions may be such that higher yields are available elsewhere. Similarly, employers often give their employees a choice among several health insurance plans, but once a choice is made, an employee may not switch to another plan for at least 1 year. Although this may be done for administrative convenience, the employee is locked in for 1 year. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 52 Advanced Econometrics (ECO-609) VU Lesson 10 SOME BASIC CONCEPTS OF TIME SERIES Topic 65: Stochastic Processes A random or stochastic process is a collection of random variables ordered in time. If we let Y denote a random variable, and if it is continuous, we denote it as Y(t), but if it is discrete, we denoted it as Yt. An example of the former is an electrocardiogram, and an example of the former is an electrocardiogram, and an example of the latter is GDP, PDI, etc. Since most economic data are collected at discrete points in time, for our purpose we will use the notation Yt rather than Y(t). If we let Y represent GDP, for our data we have Y1, Y2, Y3,... , Yn, where the subscript 1 denotes the first observation (i.e., GDP for the first quarter of 1970) and the subscript 88 denotes the last observation. Keep in mind that each of these Y’s is a random variable. Topic 66: Stationary Stochastic Processes A type of stochastic process that has received a great deal of attention and scrutiny by time series analysts is the so-called stationary stochastic process. Broadly speaking, a stochastic process is said to be stationary if its mean and variance are constant over time and the value of the covariance between the two time periods depends only on the distance or gap or lag between the two time periods and not the actual time at which the covariance is computed. In the time series literature, such a stochastic process is known as a weakly stationary, or covariance stationary, or second-order stationary, or wide sense, stochastic process. For the purpose of this chapter, and in most practical situations, this type of stationarity often suffices. To explain weak stationarity, let Yt be a stochastic time series with these properties: ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 53 Advanced Econometrics (ECO-609) VU where γk, the covariance (or autocovariance) at lag k, is the covariance between the values of Y t and Yt+k, that is, between two Y values k periods apart. If k = 0, we obtain γ0, which is simply the variance of Y (= σ2); if k = 1, γ1 is the covariance between two adjacent values of Y, the type of covariance we encountered in (recall the Markov first-order autoregressive scheme). Topic 67: Example of Stationary Stochastic Processes Suppose we shift the origin of Y from Yt to Yt+m. Now if Yt is to be stationary, the mean, variance, and autocovariances of Yt+m must be the same as those of Yt. In short, if a time series is stationary, its mean, variance, and autocovariance (at various lags) remain the same no matter at what point we measure them; that is, they are time invariant. Such a time series will tend to return to its mean (called mean reversion) and fluctuations around this mean (measured by its variance) will have a broadly constant amplitude. Topic 68: Nonstationary Time Series If a time series is not stationary in the sense just defined, it is called a nonstationary time series (keep in mind we are talking only about weak stationarity). In other words, a nonstationary time series will have a time-varying mean or a time-varying variance or both. Why are stationary time series so important? Because if a time series is nonstationary, we can study its behavior only for the time period under consideration. Each set of time series data will therefore be for a particular episode. As a consequence, it is not possible to generalize it to other time periods. Therefore, for the purpose of forecasting, such (nonstationary) time series may be of little practical value. Topic 69: Random Walk Model (RWM) It is often said that asset prices, such as stock prices or exchange rates, follow a random walk; that is, they are nonstationary. We distinguish two types of random walks: (1) random walk ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 54 Advanced Econometrics (ECO-609) VU without drift (i.e., no constant or intercept term) and (2) random walk with drift (i.e., a constant term is present). Random Walk without Drift Random Walk without Drift. Suppose ut is a white noise error term with mean 0 and variance σ2. Then the series Yt is said to be a random walk if In the random walk model, shows, the value of Y at time t is equal to its value at time (t − 1) plus a random shock. a regression of Y at time t on its value lagged one period. Believers in the efficient capital market hypothesis argue that stock prices are essentially random and therefore there is no scope for profitable speculation in the stock market: If one could predict tomorrow’s price on the basis of today’s price, we would all be millionaires. we can write In general, if the process started at some time 0 with a value of Y0, we have Therefore, In like fashion, it can be shown that As the preceding expression shows, the mean of Y is equal to its initial, or starting, value, which is constant, but as t increases, its variance increases indefinitely, thus violating a condition of ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 55 Advanced Econometrics (ECO-609) VU stationarity. In short, the RWM without drift is a nonstationary stochastic process. In practice Y0 is often set at zero, in which case E(Yt ) = 0. Yt is the sum of initial Y0 plus the sum of random shocks. As a result, the impact of a particular shock does not die away. For example, if u2 = 2 rather than u2 = 0, then all Yt ’s from Y2 onward will be 2 units higher and the effect of this shock never dies out. That is why random walk is said to have an infinite memory. If you write as, where is the first difference operator that. It is easy to show that, while Yt is nonstationary, its first difference is stationary. In other words, the first differences of a random walk time series are stationary. But we will have more to say about this later. Topic 70: Random Walk with Drift Let us modify as follows: where δ is known as the drift parameter. The name drift comes from the fact that if we write the preceding equation as It shows that Yt drifts upward or downward, depending on δ being positive or negative. Following the procedure discussed for random walk without drift, it can be shown that for the random walk with drift model. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 56 Advanced Econometrics (ECO-609) VU As you can see, for RWM with drift the mean as well as the variance increases over time, again violating the conditions of (weak) stationarity. In short, RWM, with or without drift, is a nonstationary stochastic process. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 57 Advanced Econometrics (ECO-609) VU Lesson 11 UNIT ROOT STOCHASTIC PROCESS Topic 71: Unit Root Stochastic Process Let us write the RWM as This model resembles the Markov first-order autoregressive model. If ρ = 1, becomes a RWM (without drift). If ρ is in fact 1, we face what is known as the unit root problem, that is, a situation of non-stationarity; we already know that in this case the variance of Yt is not stationary. The name unit root is due to the fact that ρ = 1. Thus the terms non-stationarity, random walk, and unit root can be treated as synonymous. If, however, |ρ|≤ 1, that is if the absolute value of ρ is less than one, then it can be shown that the time series Yt is stationary in the sense we have defined it. Topic 72: Trend Stationary (TS) and Difference Stationary (DS) Stochastic Processes If the trend in a time series is completely predictable and not variable, we call it a deterministic trend, whereas if it is not predictable, we call it a stochastic trend. To make the definition more formal, consider the following model of the time series Yt. where ut is a white noise error term and where t is time measured chronologically. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 58 Advanced Econometrics (ECO-609) VU Topic 73: Graphical Analysis: Trend Stationary (TS) and Difference Stationary (DS) Stochastic Processes Topic 74: Pure Random Walk If in β1 = 0, β2 = 0, β3 = 1, we get which is nothing but a RWM without drift and is therefore nonstationary. But note that, if we write as it becomes stationary, as noted before. Hence, a RWM without drift is a difference stationary process (DSP). Topic 75: Random Walk with Drift If in ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 59 Advanced Econometrics (ECO-609) VU we get which is a random walk with drift and is therefore nonstationary. If we write it as this means Yt will exhibit a positive (β1 > 0) or negative (β1 < 0) trend Such a trend is called a stochastic trend Topic 76: Deterministic Trend If in We get which is called a trend stationary process (TSP). Although the mean of Yt is β1 + β2t, which is not constant, its variance (= σ2) is. Once the values of β1 and β2 are known, the mean can be forecast perfectly. Therefore, if we subtract the mean of Yt from Yt, the resulting series will be stationary, hence the name trend stationary. This procedure of removing the (deterministic) trend is called detrending. Topic 77: Random Walk with Drift and Deterministic Trend If in β1 = 0, β2 = 0, β3 = 1, we obtain: ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 60 Advanced Econometrics (ECO-609) VU we have a random walk with drift and a deterministic trend, which can be seen if we write this equation as which means that Yt is nonstationary. Topic 78: Deterministic Trend with Stationary AR(1) Component. If in we get. which is stationary around the deterministic trend. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 61 Advanced Econometrics (ECO-609) VU Lesson 12 INTEGRATED STOCHASTIC PROCESSES Topic 79: Integrated Stochastic Processes The random walk model is but a specific case of a more general class of stochastic processes known as integrated processes. Recall that the RWM without drift is nonstationary, but its first difference stationary. Therefore, we call the RWM without drift integrated of order 1, denoted as I(1). Similarly, if a time series has to be differenced twice (i.e., take the first difference of the first differences) to make it stationary, we call such a time series integrated of order 2. In general, if a (nonstationary) time series has to be differenced d times to make it stationary, that time series is said to be integrated of order d. A time series Yt integrated of order d is denoted as Yt ∼ I(d). If a time series Yt is stationary to begin with (i.e., it does not require any differencing), it is said to be integrated of order zero, denoted by Yt ∼ I(0). Thus, we will use the terms “stationary time series” and “time series integrated of order zero” to mean the same thing. Topic 80: Properties of Integrated Series The following properties of integrated time series may be noted: Let Xt, Yt, and Zt be three time series 1. If Xt ∼ I(0) and Yt ∼ I(1), then Zt = (Xt + Yt ) = I(1); that is, a linear combination or sum of stationary and nonstationary time series is nonstationary. 2. If Xt ∼ I(d), then Zt = (a + bXt) = I(d), where a and b are constants. That is, a linear combination of an I(d) series is also I(d). Thus, if Xt ∼ I(0), then Zt = (a + bXt) ∼ I(0). 3. If Xt ∼ I(d1) and Yt ∼ I(d2), then Zt = (aXt + bYt ) ∼ I(d2), where d1 < d2. 4. If Xt ∼ I(d) and Yt ∼ I(d), then Zt = (aXt + bYt ) ∼ I(d*); d*is generally equal to d, but in some cases d*< d ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 62 Advanced Econometrics (ECO-609) VU Topic 81: The Phenomenon of Spurious Regression To see why stationary time series are so important, consider the following two random walk models: where we generated 500 observations of ut from ut ∼ N(0, 1) and 500 observations of vt from vt ∼ N(0, 1) and assumed that the initial values of both Y and X were zero. We also assumed that ut and vt are serially uncorrelated as well as mutually uncorrelated. As you know by now, both these time series are nonstationary; that is, they are I(1) or exhibit stochastic trends. Suppose we regress Yt on Xt. Since Yt and Xt are uncorrelated I(1) processes, the R2 from the regression of Y on X should tend to zero; that is, there should not be any relationship between the two variables. But wait till you see the regression results: As you can see, the coefficient of X is highly statistically significant, and, although the R2 value is low, it is statistically significantly different from zero. From these results, you may be tempted to conclude that there is a significant statistical relationship between Y and X, whereas a priori there should be none. This is in a nutshell the phenomenon of spurious or non-sense regression, first discovered by Yule. Yule showed that (spurious) correlation could persist in nonstationary time series even if the sample is very large. That there is something wrong in the preceding regression is suggested by the extremely low Durbin–Watson d value, which suggests very strong first-order autocorrelation. According to Granger and Newbold, an R2 > d is a good rule of thumb to suspect that the estimated regression is spurious, as in the example above. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 63 Advanced Econometrics (ECO-609) VU Topic 82: Tests of Stationarity By now the reader probably has a good idea about the nature of stationary stochastic processes and their importance. In practice we face two important questions: (1) How do we find out if a given time series is stationary? (2) If we find that a given time series is not stationary, is there a way that it can be made stationary? Although there are several tests of stationarity, we discuss only those that are prominently discussed in the literature. In this section we discuss two tests: (1) graphical analysis and (2) the correlogram test. Topic 83: Test of Stationarity: Graphical Analysis Suppose we wanted to speculate on the shape of these curves over the quarterly period. The GDP time series shown in Figure You will see that over the period of study GDP has been increasing, that is, showing an upward trend, suggesting perhaps that the mean of the GDP has been changing. This perhaps suggests that the GDP series is not stationary. Topic 84: Autocorrelation Function (ACF) and Correlogram One simple test of stationarity is based on the so-called autocorrelation function (ACF). The ACF at lag k, denoted by ρk, is defined as ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 64 Advanced Econometrics (ECO-609) VU where covariance at lag k and variance are as defined before. Note that if k = 0, ρ0 = 1 Since both covariance and variance are measured in the same units of measurement, ρk is a unitless, or pure, number. It lies between −1 and +1, as any correlation coefficient does. If we plot ρk against k, the graph we obtain is known as the population correlogram. Since in practice we only have a realization (i.e., sample) of a stochastic process, we can only compute the sample autocorrelation function (SAFC), ˆρk. To compute this, we must first compute the sample covariance at lag k, ˆγk, and the sample variance, ˆ γ0, which are defined as where n is the sample size and Y¯ is the sample mean. Therefore, the sample autocorrelation function at lag k is which is simply the ratio of sample covariance (at lag k) to sample variance. A plot of ˆρk against k is known as the sample correlogram. Topic 85: Example: Autocorrelation Function (ACF) and Correlogram Let us first present the sample correlograms of a purely white noise random process and of a random walk process. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 65 Advanced Econometrics (ECO-609) VU There we generated a sample of 500 error terms, the u’s, from the standard normal distribution. The correlogram of these 500 purely random error terms is as shown in Figure. We have shown this correlogram up to 30 lags. We will comment shortly on how one chooses the lag length. For the time being, just look at the column labeled AC, which is the sample autocorrelation function, and the first diagram on the left, labeled autocorrelation. The solid vertical line in this diagram represents the zero axis; observations above the line are positive values and those below the line are negative values. As is very clear from this diagram, for a purely white noise process the autocorrelations at various lags hover around zero. This is the picture of a correlogram of a stationary time series. Thus, if the correlogram of an actual (economic) time series resembles the correlogram of a white noise time series, we can say that time series is probably stationary. Correlogram of white noise error term u. AC = autocorrelation, PAC = partial autocorrelation, Q- Stat = Q statistic, Prob = probability. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 66 Advanced Econometrics (ECO-609) VU Now look at the correlogram of a random walk series, as generated, and shown in following Figure. The most striking feature of this correlogram is that the autocorrelation coefficients at various lags are very high even up to a lag of 33 quarters. As a matter of fact, if we consider lags of up to 60 quarters, the autocorrelation coefficients are quite high; the coefficient is about 0.7 at lag 60. Figure is the typical correlogram of a nonstationary time series: The autocorrelation coefficient starts at a very high value and declines very slowly toward zero as the lag lengthens. The Choice of Lag Length: This is basically an empirical question. A rule of thumb is to compute ACF up to one-third to one-quarter the length of the time series. Since for our economic data we have 88 quarterly observations, by this rule lags of 22 to 29 quarters will do. The best practical advice is to start with sufficiently large lags and then reduce them by some statistical criterion, such as the Akaike or Schwarz information criterion. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 67 Advanced Econometrics (ECO-609) VU Statistical Significance of Autocorrelation Coefficients: The statistical significance of any ˆρ k can be judged by its standard error. Bartlett has shown that if a time series is purely random, that is, it exhibits white noise (Figure), the sample autocorrelation coefficients ˆρk are approximately that is, in large samples the sample autocorrelation coefficients are normally distributed with zero mean and variance equal to one over the sample size. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 68 Advanced Econometrics (ECO-609) VU Lesson 13 THE UNIT ROOT TEST Topic 86: The Unit Root Test A test of stationarity (or non-stationarity) that has become widely popular over the past several years is the unit root test. We will first explain it, then illustrate it and then consider some limitations of this test. The starting point is the unit root (stochastic) process that we discussed in previous Section. We start with where ut is a white noise error term. We know that if ρ = 1, that is, in the case of the unit root, becomes a random walk model without drift, which we know is a nonstationary stochastic process. Therefore, why not simply regress Yt on its (oneperiod) lagged value Yt−1 and find out if the estimated ρ is statistically equal to 1? If it is, then Yt is nonstationary. This is the general idea behind the unit root test of stationarity. For theoretical reasons, we manipulate the equation as follows: Subtract Yt−1 from both sides of to obtain: which can be alternatively written as: where δ = (ρ − 1) and Δ, as usual, is the first-difference operator. In practice, the (null) hypothesis that δ = 0. If δ = 0, then ρ = 1, that is we have a unit root, meaning the time series under consideration is nonstationary. It may be noted that if δ = 0, then it will become ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 69 Advanced Econometrics (ECO-609) VU Since ut is a white noise error term, it is stationary, which means that the first differences of a random walk time series are stationary, a point we have already made before. This is simple enough; all we have to do is to take the first differences of Yt and regress them on Yt−1 and see if the estimated slope coefficient in this regression (= ˆδ) is zero or not. If it is zero, we conclude that Yt is nonstationary. But if it is negative, we conclude that Yt is stationary. Topic 87: Dickey–Fuller (DF) Test While estimating the unit root test, You might be tempted to say, why not use the usual t test? Unfortunately, under the null hypothesis that δ = 0 (i.e., ρ = 1), the t value of the estimated coefficient of Yt−1 does not follow the t distribution even in large samples; that is, it does not have an asymptotic normal distribution. Dickey and Fuller have shown that under the null hypothesis that δ = 0, the estimated t value of the coefficient of Yt−1 follows the τ (tau) statistic. These authors have computed the critical values of the tau statistic on the basis of Monte Carlo simulations. In the literature the tau statistic or test is known as the Dickey–Fuller (DF) test, in honor of its discoverers. Interestingly, if the hypothesis that δ = 0 is rejected (i.e., the time series is stationary), we can use the usual (Student’s) t test. Topic 88, 89 and 90: Dickey–Fuller (DF) test (Yt is Random walk) The actual procedure of implementing the DF test involves several decisions. To allow for the various possibilities, the DF test is estimated in three different forms, that is, under three different null hypotheses. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 70 Advanced Econometrics (ECO-609) VU where t is the time or trend variable. In each case, the null hypothesis is that δ = 0; that is, there is a unit root—the time series is nonstationary. The alternative hypothesis is that δ is less than zero; that is, the time series is stationary. If the null hypothesis is rejected, it means that Yt is a stationary time series with zero mean, that Yt is stationary with a nonzero mean [= β1/(1 − ρ)], and that Yt is stationary around a deterministic trend in final equation. It is extremely important to note that the critical values of the tau test to test the hypothesis that δ = 0, are different for each of the preceding three specifications of the DF test. Topic 91: The Augmented Dickey–Fuller (ADF) Test In conducting the DF test assumed that the error term ut was uncorrelated. But in case the ut are correlated, Dickey and Fuller have developed a test, known as the augmented Dickey–Fuller (ADF) test. This test is conducted by “augmenting” the preceding three equations by adding the lagged values of the dependent variable ΔYt. The ADF test here consists of estimating the following regression: where εt is a pure white noise error term and where ΔYt−1 = (Yt−1 − Yt−2), ΔYt−2 = (Yt−2 − Yt−3), etc. The number of lagged difference terms to include is often determined empirically, the idea being to include enough terms so that the error term is serially uncorrelated. In ADF we still test whether δ = 0 and the ADF test follows the same asymptotic distribution as the DF statistic, so the same critical values can be used. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 71 Advanced Econometrics (ECO-609) VU Topic 92: Testing the Significance of More Than One Coefficient: The F Test Suppose we estimate Test the hypothesis that β1 = β2 = 0, that is, the model is RWM without drift and trend. To test this joint hypothesis, we can use the restricted F test. We estimate (the unrestricted regression) and estimate the equation, dropping the intercept and trend. Then we use the restricted F test, except that we cannot use the conventional F table to get the critical F values. As they did with the τ statistic, Dickey and Fuller have developed critical F values for this situation. Topic 93: The Phillips–Perron (PP) Unit Root Tests An important assumption of the DF test is that the error terms ut are independently and identically distributed. The ADF test adjusts the DF test to take care of possible serial correlation in the error terms by adding the lagged difference terms of the regressand. Phillips and Perron use nonparametric statistical methods to take care of the serial correlation in the error terms without adding lagged difference terms. Since the asymptotic distribution of the PP test is the same as the ADF test statistic and we will pursue this topic in next modules. Topic 94: A Critique of the Unit Root Tests We have discussed several unit root tests and there are several more. The question is: Why are there so many unit root tests? The answer lies in the size and power of these tests. By size of a test we mean the level of significance (i.e., the probability of committing a Type I error) and by power of a test we mean the probability of rejecting the null hypothesis when it is false. The power of a test is calculated by subtracting the probability of a Type II error from 1; Type II error is the probability of accepting a false null hypothesis. The maximum power is 1. Most unit ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 72 Advanced Econometrics (ECO-609) VU root tests are based on the null hypothesis that the time series under consideration has a unit root; that is, it is nonstationary. The alternative hypothesis is that the time series is stationary. Size of Test: The distinction we made between the nominal and the true levels of significance. The DF test is sensitive to the way it is conducted. Remember that we discussed three varieties of the DF test: (1) a pure random walk, (2) a random walk with drift, and (3) a random walk with drift and trend. If, for example, the true model is (1) but we estimate (2), and conclude that, say, on the 5 percent level that the time series is stationary, this conclusion may be wrong because the true level of significance in this case is much larger than 5 percent. Power of Test: Most tests of the DF type have low power; that is, they tend to accept the null of unit root more frequently than is warranted. That is, these tests may find a unit root even when none exists. There are several reasons for this. First, the power depends on the (time) span of the data more than mere size of the sample. For a given sample size n, the power is greater when the span is large. Thus, unit root test(s) based on 30 observations over a span of 30 years may have more power than that based on, say, 100 observations over a span of 100 days. Second, if ρ ≈ 1 but not exactly 1, the unit root test may declare such a time series nonstationary. Third, these types of tests assume a single unit root; that is, they assume that the given time series is I(1). But if a time series is integrated of order higher than 1, say, I(2), there will be more than one unit root. In the latter case one may use the Dickey–Pantula test. Fourth, if there are structural breaks in a time series (see the chapter on dummy variables) due to, say, the OPEC oil embargoes, the unit root tests may not catch them. In applying the unit root tests one should therefore keep in mind the limitations of the tests. Of course, there have been modifications of these tests by Perron and Ng, Elliot, Rothenberg and Stock, Fuller, and Leybounre. Because of this, Maddala and Kim advocate that the traditional DF, ADF, and PP tests should be discarded. As econometric software packages incorporate the new tests, that may very well happen. But it should be added that as yet there is no uniformly powerful test of the unit root hypothesis. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 73 Advanced Econometrics (ECO-609) VU Lesson 14 TRANSFORMING NONSTATIONARY TIME SERIES Topic 95: Transforming Nonstationary Time Series Now that we know the problems associated with nonstationary time series, the practical question is what to do. To avoid the spurious regression problem that may arise from regressing a nonstationary time series on one or more nonstationary time series, we have to transform nonstationary time series to make them stationary. The transformation method depends on whether the time series are difference stationary (DSP) or trend stationary (TSP). We consider each of these methods in turn. Topic 96: Difference-Stationary Processes If a time series has a unit root, the first differences of such time series are stationary. Therefore, the solution here is to take the first differences of the time series. Returning to our GDP time series, we have already seen that it has a unit root. Let us now see what happens if we take the first differences of the GDP series. Let ΔGDPt = (GDPt − GDPt−1). For convenience, let Dt = ΔGDPt. Now consider the following regression: The 1 percent critical DF τ value is −3.5073. Since the computed τ (= t) is more negative than the critical value, we conclude that the first-differenced GDP is stationary; that is, it is I(0). ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 74 Advanced Econometrics (ECO-609) VU Topic 97: Trend-Stationary Process Run the following regression: where Yt is the time series under study and where t is the trend variable measured chronologically. Now will be stationary. ˆut is known as a (linearly) detrended time series. It is important to note that the trend may be nonlinear. For example, it could be which is a quadratic trend series. If that is the case, the residuals will now be (quadratically) detrended time series. It should be pointed out that if a time series is DSP but we treat it as TSP, this is called under differencing. On the other hand, if a time series is TSP but we treat it as DSP, this is called over differencing. The consequences of these types of specification errors can be serious, depending on how one handles the serial correlation properties of the resulting error terms. In passing it may be noted that most macroeconomic time series are DSP rather than TSP. Topic 98: Cointegration: Regression of A Unit Root Time Series on Another Unit Root Time Series We have warned that the regression of a nonstationary time series on another nonstationary time series may produce a spurious regression. Let us suppose that we consider the PCE and PDI time series. Subjecting these time series individually to unit root analysis, you will find that they both are I(1); that is, they contain a unit root. Suppose, then, that we regress PCE on PDI as follows: ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 75 Advanced Econometrics (ECO-609) VU Let us write this as: Suppose we now subject ut to unit root analysis and find that it is stationary; that is, it is I(0). This is an interesting situation, for although PCEt and PDIt are individually I(1), that is, they have stochastic trends, their linear combination is I(0). So to speak, the linear combination cancels out the stochastic trends in the two series. If you take consumption and income as two I(1) variables, savings defined as (income − consumption) could be I(0). As a result, a regression of consumption on income would be meaningful (i.e., not spurious). In this case we say that the two variables are cointegrated. Economically speaking, two variables will be cointegrated if they have a long-term, or equilibrium, relationship between them. Economic theory is often expressed in equilibrium terms, such as Fisher’s quantity theory of money or the theory of purchasing parity (PPP), just to name a few. In short, provided we check that the residuals are I(0) or stationary, the traditional regression methodology (including the t and F tests) that we have considered extensively is applicable to data involving (nonstationary) time series. The valuable contribution of the concepts of unit root, cointegration, etc. is to force us to find out if the regression residuals are stationary. As Granger notes, “A test for cointegration can be thought of as a pre-test to avoid ‘spurious regression’ situations.” Topic 99: Testing for Cointegration A number of methods for testing cointegration have been proposed in the literature. We consider here two comparatively simple methods: (1) the DF or ADF unit root test on the residuals estimated from the cointegrating regression and (2) the cointegrating regression Durbin–Watson (CRDW) test. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 76 Advanced Econometrics (ECO-609) VU Topic 100: Engle–Granger (EG) or Augmented Engle–Granger (AEG) Test We already know how to apply the DF or ADF unit root tests. All we have to do is estimate a regression as follows. obtain the residuals, and use the DF or ADF tests. There is one precaution to exercise, however. Since the estimated ut are based on the estimated cointegrating parameter β2, the DF and ADF critical significance values are not quite appropriate. Engle and Granger have calculated these values, which can be found in the references. Therefore, the DF and ADF tests in the present context are known as Engle–Granger (EG) and augmented Engle–Granger (AEG) tests. However, several software packages now present these critical values along with other outputs. Let us illustrate these tests. We first regressed PCE on PDI and obtained the following regression: Since PCE and PDI are individually nonstationary, there is the possibility that this regression is spurious. But when we performed a unit root test on the residuals obtained. The Engle–Granger 1 percent critical τ value is −2.5899. Since the computed τ (= t) value is much more negative than this, our conclusion is that the residuals from the regression of PCE on PDI are I(0); that is, they are stationary. Thus, the two series used in the regression are cointegrated. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 77 Advanced Econometrics (ECO-609) VU Topic 101: Cointegration and Error Correction Mechanism (ECM) We just showed that PCE and PDI are cointegrated; that is, there is a long-term, or equilibrium, relationship between the two. Of course, in the short run there may be disequilibrium. Therefore, one can treat the error term as the “equilibrium error.” And we can use this error term to tie the short-run behavior of PCE to its long-run value. The error correction mechanism (ECM) first used by Sargan and later popularized by Engle and Granger corrects for disequilibrium. An important theorem, known as the Granger representation theorem, states that if two variables Y and X are cointegrated, then the relationship between the two can be expressed as ECM. To see what this means, let us revert to our PCE–PDI example. Now consider the following model: where Δ as usual denotes the first difference operator, εt is a random error term, and ut−1 = (PCEt−1 − β1 − β2PDIt−1), that is, the one-period lagged value of the error from the cointegrating regression ECM equation states that ΔPCE depends on ΔPDI and also on the equilibrium error term. If the latter is nonzero, then the model is out of equilibrium. Suppose ΔPDI is zero and ut−1 is positive. This means PCEt−1 is too high to be in equilibrium, that is, PCEt−1 is above its equilibrium value of (α0 + α1PDIt−1). Since α2 is expected to be negative, the term α2ut−1 is negative and, therefore, ΔPCEt will be negative to restore the equilibrium. That is, if PCEt is above its equilibrium value, it will start falling in the next period to correct the equilibrium error; hence the name ECM. By the same token, if ut−1 is negative (i.e., PCE is below its equilibrium value), α2ut−1 will be positive, which will cause ΔCPEt to be positive, leading PCEt to rise in period t. Thus, the absolute value of α2 decides how quickly the equilibrium is restored. In practice, we estimate ut−1 by ˆut−1 = (PCEt − ˆβ1 − ˆβ2PDIt ). Topic 102: Some Economics Implication Time series Econometrics uses economic theory, mathematics, and statistical inference to quantify economic phenomena. In other words, it turns theoretical economic models into useful tools for economic policymaking. The objective of econometrics is to convert qualitative ©COPYRIGHT VIRTUAL UNIVERSITY OF PAKISTAN 78 Advanced Econometrics (ECO-609) VU statements (such as “the relationship between two or more variables is positive”) into quantitative statements (such as “consumption expenditure increases by 95 cents for every one dollar increase in disposable income”). Econometricians—practitioners of econometrics— transform models developed by economic theorists into versions that can be estimated. As Stock and Watson (2007) put it, “econometric methods are used in many branches of economics, including finance, labor economics, macroeconomics, microeconomics, and economic policy.” Economic policy decisions are rarely made without econometric analysis to assess their impact. ©COPYRIGHT VIRTUAL UNIVERSITY OF PAK

Use Quizgecko on...
Browser
Browser