Autocorrelation in Time Series Data PDF

Summary

This document provides a lecture or tutorial introduction into autocorrelation in time series data. It discusses problems of autocorrelation, first-order autoregressive error models, Durbin-Watson tests and remedial measures, illustrated with relevant examples and tables.

Full Transcript

Autocorrelation in Time Series Data  Problems of Autocorrelation  First-Order Autoregressive Error Model  Durbin-Watson Test  Remedial Measures 1 Introduction  The basic regression models considered so far have assumed that the random error terms ui (i ) are either uncorrelated random varia...

Autocorrelation in Time Series Data  Problems of Autocorrelation  First-Order Autoregressive Error Model  Durbin-Watson Test  Remedial Measures 1 Introduction  The basic regression models considered so far have assumed that the random error terms ui (i ) are either uncorrelated random variables or independent normal random variables.  In business and economics, many regression applications involve time series data. For such data, the assumption of uncorrelated or independent error terms is often not appropriate; rather, the error terms are frequently positively correlated over time. Error terms correlated over time are said to be autocorrelated or serially correlated.  A major cause of positively autocorrelated error terms in business and economic regression applications involving time series data is the omission of one or several key variables from the model. 2 Problems of Autocorrelation  When the error terms in the regression model are positively autocorrelated, the use of ordinary least squares procedures has a number of important consequences (1) The estimated regression coefficients are still unbiased, but they no longer have the minimum variance property and may be quite inefficient. (2) MSE may seriously underestimate the variance of the error terms. (3) s(bk) calculated according to ordinary least squares procedures may seriously underestimate the true standard deviation of the estimated regression coefficient. (4) Confidence intervals and tests using the t and F distributions, discussed earlier, are no longer strictly applicable. 3 Example  Consider the simple linear regression model with time series data Yt = 0 + 1 Xt+ ut where ut = ut-1 + vt . The vt, called disturbances, are independent N(0, 1). The error term ut is the sum of the previous error ut-1 and a new random term vt . Table 1: Example of Positively Autocorrelated Error Terms (u0 = 3) 4 Systematic Pattern in These Error Terms 5 True Regression Line & Observations When u0= 3 E(Y)=2 + 0.5X 6 Fitted Regression Line & Observations When u0= 3 7 Example (cont.)  Had the initial u0 value been small, say, u0 =-0.2, and the disturbances different, a sharply different fitted regression line might have been obtained because of the persistency pattern, as shown in Figure 12.2 (c) on page 483 of the textbook.  From Figure 12.2, we can clearly see that MSE may seriously underestimate the variance of the ut. This is one of the factors leading to an indication of greater precision of the regression coefficients than is actually the case when OLS methods are used in the presence of positively autocorrelated errors.  A plot of residuals against time is an effective method to detect the presence of autocorrelated errors. Formal statistical tests have also been developed. We now discuss a widely used test, Durbin-Watson Test, which is based on the first-order autoregressive error model . 8 First-Order Autoregressive Error Model  The generalized multiple regression model when the random error terms follows a first-order autoregressive , or AR(1), process is Yt = 0 + 1 Xt1 + 2 Xt2 + ... + k Xt, p-1 + ut, where ut = ut -1 + vt  is a parameter, called the autocorrelation parameter, and || < 1. vt are, called disturbances, independent N(0, 2).  This generalized multiple regression model is identical to the earlier multiple regression model except for the structure of the error terms. 9 Properties of Error Terms  ut = ut -1 + vt = (ut -2+vt-1) + vt = 2ut -2+ vt-1 + vt = vt + vt-1 + 2(vt-2 + ut -3) = …= vt +vt-1+2vt -2 + 3vt -3+… =i=0 ivt -i i i  E(ut) = E( vt -i ) =  E(vt -i ) = 0  Var(ut ) = Var(ivt -i ) = Var(vt + vt-1+2vt -2 + 3vt -3+ … ) = Var(vt)+2Var(vt-1)+(2 )2Var(vt -2)+(3)2Var(vt -3)+ … = 2 (1 + 2 + (2 )2 + (2)3 + … + (2)n +… ) = 2 /(1 - 2)  Cov(ut, ut-1) = E(utut-1) = E(ut-1ut-1) = Var(ut-1) = [2/(1 - 2)]  (ut, ut-1) = Cov(ut, ut-1)/[Var(ut ) Var(ut-1)]1/2 =   (ut, ut-s) = Cov(ut, ut-s)/[Var(ut ) Var(ut-s)]1/2 = s  The autocorrelation parameter  is the coefficient of correlation between adjacent error terms. The coefficient of correlation between ut and ut-s is s. 10 Properties of Error Terms (cont.)  When  is positive, all error terms are correlated, but the further apart they are, the less is the correlation between them. The only time the error terms for the autoregressive error model are uncorrelated is when  = 0.  Based on the results for the variances and covariances of the error terms, the variance-covariance matrix of the error terms for the first-order autoregressive generalized regression model can be stated as follows 1 Var(u) = k   2 3 …n-1 1  2 … n-2 ……………………….. n-1 n-2 n-3 n-4 …1 where k = 2 /(1 - 2) 11 Durbin-Watson Test  The Durbin-Watson test is for testing whether or not the autocorrelation parameter  is zero in the first-order autoregressive error model.  Note that if  = 0, then ut = vt . Hence, the error term t are independent N(0, 2).  Since correlated error terms in business and economic applications tend to show positive serial correlation, the usual test alternatives considered are H0:  = 0 vs Ha:  > 0 The test statistic is as follows D = t=2n(et - et-1)2/t=1n et2 12 Durbin-Watson Test (cont.)  Exact critical values are difficult to obtain, but Durbin and Watson have obtained lower and upper bounds dL and dU such that a value of D outside these bounds leads to a definite decision. The decision rule for testing between the alternatives is If D > dU,  , conclude H0:  = 0 If D < dL,  , conclude Ha:  > 0 If dL,   D  dU,  , the test is inconclusive  Small values of D lead to the conclusion that  > 0 because the adjacent error terms ut and ut-1 tend to be of the same magnitude when they are positively autocorrelated. Hence, the differences in the residuals, et - et-1, would tend to be small when  > 0, leading to a small numerator in D and hence to a small test statistic D. 13 Durbin-Watson Test (cont.) (1)If the research hypothesis is Ha:  < 0, the test statistic to be used is (4 - D), where D is defined as above. The decision rule is as follows: If (4 - D) > dU,  , conclude H0:  = 0 If (4 - D) < dL,  , conclude Ha:  < 0 If dL,   (4 - D)  dU,  , the test is inconclusive (2)For a two-tailed test H0:  = 0 vs Ha:   0, the decision rule is as follows: If (4 - D) > dU, /2 or D > dU, /2 , conclude H0:  = 0 If (4 - D) < dL, /2 or D < dL, /2 , conclude Ha:   0 If any other result, the test is inconclusive 14 Interpretation of Durbin-Watson D  D = t=2n(et - et-1)2/t=1n et2 = [t=2net2/t=1n et2]+[t=2n et-12/t=1n et2]-2t=2n (etet-1)/t=1n et2  2 - 2t=2n (etet-1)/t=1n et2  2 (1 - r) (1)If the residuals are uncorrelated, t=2n (etet-1)  0 indicating no relationship between et and et-1, the value of D will be close to 2. (2)If the residuals are highly positively correlated, t=2n (etet-1)  t=2n (et)2 (since et  et-1), and the value of D will be near 0. That is, D  2 - 2t=2n (etet-1)/t=1n et2  2 - 2t=2n et2/t=1n et2 = 0. (3)If the residuals are highly negatively correlated, then et  -et-1, so that t=2n (etet-1)  - t=2n (et)2 and D will be approximately equal to 4. 15 Summary of Interpretation D  D = t=2n(et - et-1)2/t=1n et2 Range of D: 0  D  4 (1) If residuals are uncorrelated, D  2 (2) If residuals are positively correlated, D < 2, and if the correlation is very strong, D  0. (3) If residuals are negatively correlated, D > 2, and if the correlation is very strong, D  4. 16 Example  Consider the time series data in the following table which gives sales data for the 35 year history of a company. 17 Example (cont.) 18 Example (cont.) 19 Example (cont.)  In the example, p-1 = 1 and n = 35. Using  = 0.05 for the one-tailed test for positive residual correlation, the table values (Table B.7 on page 1330) are dL = 1.40 and dU = 1.52. The computed value of D = 0.821 < dL = 1.40. Thus we conclude that the residuals of the straight-line model for sales are positively correlated.  Once strong evidence of residual correlation has been established, as the above example, doubt is cast on the least squares results and any inferences drawn from them. Two principal remedial measures are to add more predictor variable(s) or use transformed variables. 20 Example 1  (On pages 488~489) The Blaisdell Company wished to predict its sales by using industry sales as a predictor variable. (Accurate predictions of industry sales are available from the industry’s trade association.) A portion of the seasonally adjusted quarterly data on company sales and industry sales for the period 1998– 2002 is shown in Table 12.2, columns 1 and 2. A scatter plot (not shown) suggested that a linear regression model is appropriate. The market research analyst was, however, concerned whether or not the error terms are positively autocorrelated.. 21 Example 1 (cont.) 22 The Breusch-Godfrey Test  It is a more general test for rth order autocorrelation: ~ N(0, 2)  The null and alternative hypotheses are: H0 : 1 = 0 and 2 = 0 and ... and r = 0 Ha : 1  0 or 2  0 or ... or r  0  The test is carried out as follows: 1. Estimate the linear regression using OLS and obtain the residuals, 2. Regress on all of the regressors from stage 1 (the x’s) plus Obtain R2 from this regression. 3. It can be shown that (T-r)R2  2(r)  If the test statistic exceeds the critical value from the statistical tables, reject the null hypothesis of no autocorrelation. (Refer to 23 Box 5.4 on page 199, “Introductory Econometrics for Finance”) Addition of Predictor Variables  One major cause of autocorrelated error terms is the omission from the model of one or more key predictor variables that have time-order effects on the response variable. When autocorrelated error terms are found to be present, the first remedial action should always be to search for missing key predictor variables.  Use of indicator variables for seasonal effects can be helpful in eliminating or reducing autocorrelation in the error terms when the response variable is subject to seasonal effects (e.g. quarterly sales data). 24 Use of Transformed Variables  Only when use of additional predictor variables is not helpful in eliminating the problem of autocorrelated errors should a remedial action on transformed variables be employed. Three methods will be discussed. The three methods are each based on the following interesting property of the first order autoregressive error term regression model: Yt’ = Yt - Yt-1 = (0 + 1 Xt + ut) -  (0 + 1 Xt-1+ ut-1) = 0 (1- ) + 1(Xt - Xt-1) +(ut - ut-1) Since (ut - ut -1 ) = vt ~ N(0, 2), then Yt’ = 0’ + 1’ Xt’ + vt  Hence, by use of the transformed variables X’t and Y’t, we obtain a standard simple linear regression model with independent error terms. This means that ordinary least squares methods have their usual optimum properties with the above model.  The extension to multiple regression is direct. 25 Use of Transformed Variables (cont.)  In order to be able to use the transformed model, one generally needs to estimate the autocorrelation parameter  since its value is usually unknown. The three methods to be described differ in how this is done. The results obtained with the three methods are often quite similar.  Once an estimate of , denoted by r, has been obtained, transformed variables are obtained using the r Yt’ = Yt - rYt-1 and Xt’ = Xt - rXt-1 , = b’0 + b’1X’ where b0’ = b0 (1 - r), b1’ = b1 , s(b0’) = (1 - r) s(b0) s(b1’) = s(b1) 26 Method One: Cochrance-Orcutt Procedure (1)View the error process ut = ut -1 + vt as a regression through the origin. Use et and et-1 obtained by OLS as the response and predictor variables (ut and ut -1). The estimate of the slop , denoted by r, is r = t=2n (etet-1)/t=2n et-12 [refer to (XiYi/ Xi2) on page 162] (2)Using the estimate r, next we obtain the transformed variables Yt’ and Xt’ and use OLS with these transformed variables to yield the fitted regression function = b0’ + b1’ X’ (3)The Durbin-Watson test is then employed to test whether the error terms for the transformed model are uncorrelated. If the test indicates that they are uncorrelated, the procedure terminates. Otherwise, repeat the procedure. 27 Method Two: Hildreth-Lu Procedure  The Hildreth-Lu procedure estimating the autocorrelation parameter  for use in the transformations, Yt’ = Yt - rYt-1 and Xt’ = Xt - rXt-1, is analogous to the Box-Cox procedure for estimating the parameter  in the power transformation of Y to improve the appropriateness of the standard regression model.  The value of  chosen with this procedure is the one that minimizes the error sum of squares for the transformed regression model: SSE =  (Yt’ - b0’ - b1’ Xt’)2.  Once the value of  that minimizes SSE is found, the fitted regression function corresponding to that value of  is examined to see if the transformation has successfully eliminated the autocorrelation. If so, the fitted regression function in the original variables can then be obtained. 28 Method Three: First Differences Procedure  This procedure is based on the assumption that 1. If =1, ’ = 0 (1- ) = 0, and the transformed model becomes Yt’ = 1’ Xt’ + vt where Yt’ = Yt -Yt-1 and Xt’ = Xt - Xt-1 ,  Thus, the regression coefficient ’1 = 1 can directly estimated by OLS methods, this time based on regression through the origin.  The fitted regression function in the transformed variables = b1’ X’ can be transformed back to the original variables as follows = b0 + b1X where b0 = Y - b1’ X and b1 = b1’ 29 Comparison of The Three Methods (1)All of the estimates of 1 are quite close to each other (2)The estimated standard deviation of b1 based on HildrethLu and first differences transformation methods are quite close to each other; that with Cochrane-Orcutt procedure is somewhat smaller. (3) All three transformation methods provide essentially the same estimate of 2, the variance of the disturbance term vt. (4) The three transformation methods do not always work equally well. 30 Example 1 (cont.)  Method 1: on pages 492~494 Table 12.3 and Table 12.4  Method 2: on pages 495~496 Table 12.5  Method 3: on pages 497~498 Table 12.6 and Table 12.7 31 Forecasting With Autocorrelated Error Terms  One important use of autoregressive error regression model is to make forecasts. With these models, information about the error term in the most recent period n can be incorporated into the forecast for period t+1. This provides a more accurate forecast because, when autoregressive error regression models are appropriate, the error terms in successive periods are correlated. Thus, if sales in period n are above their expected value and successive error terms are positively correlated, it follows that sales in period t+1 will likely be above the expected value also. 32 The Model  Yt = 0 + 1 Xt + ut , where ut = ut -1 + vt  Yt+1 = 0 + 1 Xt+1 + ut + vt+1 = (1) + (2) + (3) (1) = Expected value of Yt+1 = 0 + 1 Xt+1 (2) = ut (3) = vt+1 = independent, random disturbance with mean 0.  (1) can be estimated by b0 + b1Xt+1 (2) can be estimated by ret, where et = Yt - (b0+b1Xt) r = t=2T (etet-1)/t=2T et-12  Thus, the forecast for period t+1 is Ft+1 = t+1 + ret 33 Lecture Notes (1) Introduction 1 Econometrics for Finance • Definition of financial econometrics: – The application of statistical techniques to problems in finance. • Financial econometrics is useful for – – – – Testing theories in finance Determining asset prices or returns Testing hypotheses concerning the relationships between variables Examining the effect on financial markets of changes in economic conditions – Forecasting future values of financial variables and for financial decision-making What are the Special Characteristics of Financial Data? • Frequency & quantity of data Stock market prices are measured every time there is a trade or somebody posts a new quote. Large amount of data. • Quality Recorded asset prices are usually those at which the transaction took place. No possibility for measurement error . • Very “noisy” Financial data are often very noisy, which means that it is more difficult to separate underlying trends or patterns from random and uninteresting features. • Non-normality Financial data are also almost always not normally distributed in spite of the fact that most techniques assume that they are. Types of Data • There are broadly 3 types of data that can be employed in quantitative analysis of financial problems: (a) Time series data; (b) Cross-sectional data; (c) Panel data (combination of (a) & (b)). • Time series data Time series data, as the name suggests, are data that have been collected over a period of time on one or more variables. Time series data have associated with them a particular frequency of observation or collection of data points. It is also generally a requirement that all data used in a model be of the same frequency of observation. • The data may be quantitative (e.g. exchange rates, stock prices, number of shares outstanding), or qualitative (e.g. day of the week). Time series data • Examples of time series data Series – – – – GNP or unemployment government budget deficit money supply value of a stock market index Frequency monthly, or quarterly annually weekly as transactions occur • Problems that could be tackled using time series data - How the value of a country’s stock index has varied with that country’s macroeconomic fundamentals. - How the value of a company’s stock price has varied when it announced the value of its dividend payment. - The effect on a country’s currency of an increase in its interest rate • In all of the above cases, it is clearly the time dimension which is the most important, and the analysis will be conducted using the values of the variables over time. Cross-sectional Data • Cross-sectional data Cross-sectional data are data on one or more variables collected at a single point in time, e.g. - A poll of usage of internet stock broking services - Cross-section of stock returns on the New York Stock Exchange - A sample of bond credit ratings for UK banks • Problems that could be tackled using cross-sectional data - The relationship between company size and the return to investing in its shares - The relationship between a country’s GDP level and the probability that the government will default on its sovereign debt. Panel Data • Panel data Panel Data has the dimensions of both time series and cross-sections, e.g. the daily prices of a number of blue chip stocks over two years. The estimation of panel regressions is an interesting and developing area, and will not be discussed in this course. • Notation It is common to denote each observation by the letter t and the total number of observations by T for time series data, and to denote each observation by the letter i and the total number of observations by N for cross-sectional data. • Quality of Data The researchers should always keep in mind that the results of research are only as good as the quality of the data. Returns in Financial Modelling • It is preferable not to work directly with asset prices, so we usually convert the raw prices into a series of returns. There are two ways to do this: Simple returns or log returns where, Rt denotes the return at time t rt denotes the continuously compounded return at time t pt denotes the asset price at time t ln denotes the natural logarithm • We also ignore any dividend payments, or alternatively assume that the price series have been already adjusted to account for them. Log Returns The returns are also known as log price relatives, which will be used throughout this course. There are a number of reasons for this: 1. They have the nice property that they can be interpreted as continuously compounded returns. 2. Can add them up, e.g. if we want a weekly return and we have calculated daily log returns: Monday return Tuesday return Wednesday return Thursday return Friday return Return over the week r1 = ln p1/p0 = ln p1 - ln p0 r2 = ln p2/p1 = ln p2 - ln p1 r3 = ln p3/p2 = ln p3 - ln p2 r4 = ln p4/p3 = ln p4 - ln p3 r5 = ln p5/p4 = ln p5 - ln p4  r = ln p5/p0 = ln p5 - ln p0 = (r1 + …+ r5) =ri A Disadvantage of using Log Returns • There is a disadvantage of using the log-returns. The simple return on a portfolio of assets is a weighted average of the simple returns on the individual assets: N R pt =  wi Rit i =1 • But this does not work for the continuously compounded returns. The fundamental reason why this is the case is that the log of a sum is not the same as the sum of a log. • In the limit, as the frequency of the sampling of the data is increased so that they are measured over a smaller and smaller time interval, the simple and continuously compounded returns will be identical. Steps involved in formulating an econometric model  Step 1a and 1b: general statement of the problem.  Step 2: collection of data relevant to the model  Step 3: choice of estimation method relevant to the model proposed in step 1  Step 4: statistical evaluation of the model  Step 5: evaluation of the model from a theoretical perspective  Step 6: use of model • • It is important to note that the process of building a robust empirical model is an iterative one, and it is certainly not an exact science. Often, the final preferred model could be very different from the one originally proposed, and need not be unique in the sense that another researcher with the same data and the same initial theory could arrive at a different final specification. Textbook pages 11~12 Points to consider when reading articles in empirical finance • Does the paper involve the development of a theoretical model or is it merely a technique looking for an application so that the motivation for the whole exercise is poor? • Are the data of ‘good quality’? Are they from a reliable source? Is the size of the sample sufficiently large for the model estimation task at hand? • Have the techniques been validly applied? Have tests been conducted for possible violations of any assumptions made in the estimation of the model? • Have the results been interpreted sensibly? Is the strength of the results exaggerated? Do the results actually obtained relate to the questions posed by the author(s)? Can the results be replicated by other researchers? • Are the conclusions drawn appropriate given the results, or has the importance of the results of the paper been overstated? (Textbook page 13) Statistical Package - EViews • EViews is an excellent interactive program, which provide an excellent tool to do time series data analysis. It is simple to use, menu-driven, and will be sufficient to estimate most of the models required for this course. • One of the most important features of EViews that makes it useful for model-building is the wealth of diagnostic (misspecification) tests, that are automatically computed, making it possible to test whether the model is econometrically/statistically valid or not. • Textbook pages 14~23 • Data – UK Average House Price (Ukhp13.xls) Violation of the Assumptions of the CLRM • Recall that we assumed of the CLRM disturbance/error terms: 1. 2. 3. 4. 5. E(ut) = 0 Var(ut) = 2 <  Cov (ui, uj) = 0 for i ≠ j The X matrix is non-stochastic or fixed in repeated samples ut  N(0,2) Investigating Violations of the Assumptions of the CLRM • We will now study these assumptions further, and in particular look at: - How we test for violations - Causes - Consequences in general we could encounter any combination of 3 problems: the coefficient estimates are wrong the associated standard errors are wrong the distribution that we assumed for the test statistics will be inappropriate - Solutions the assumptions are no longer violated we work around the problem so that we use alternative techniques which are still valid Assumption 1: E(ut) = 0  Assumption that the mean of the disturbances is zero.  For all diagnostic tests, we cannot observe the disturbances and so perform the tests of the residuals.  The mean of the residuals will always be zero provided that there is a constant term in the regression. Assumption 2: Var(ut) = 2 <  • We have so far assumed that the variance of the errors is constant, 2 - this is known as homoscedasticity. If the errors do not have a constant variance, we say that they are heteroscedastic e.g. say we estimate a regression and calculate the residuals, . • Var(ut ) = E(ut –E(ut))2 = E(ut2) Detection of Heteroscedasticity: The GQ Test • Graphical methods • Formal tests: There are many of them: we will discuss Goldfeld-Quandt test and White’s test The Goldfeld-Quandt (GQ) test is carried out as follows. 1. Split the total sample of length T into two sub-samples of length T1 and T2. The regression model is estimated on each sub-sample and the two residual variances are calculated. 2. The null hypothesis is that the variances of the disturbances are equal, H0: The GQ Test (Cont’d) 4. The test statistic, denoted GQ, is simply the ratio of the two residual variances where the larger of the two variances must be placed in the numerator. 5. The test statistic is distributed as an F(T1-k, T2-k) under the null of homoscedasticity. A problem with the test is that the choice of where to split the sample is that usually arbitrary and may crucially affect the outcome of the test. Detection of Heteroscedasticity using White’s Test • White’s general test for heteroscedasticity is one of the best approaches because it makes few assumptions about the form of the heteroscedasticity. • The White’s test is carried out as follows: 1. Assume that the regression we carried out is as follows yt = 1 + 2x2t + 3x3t + ut And we want to test Var(ut) = 2. We estimate the model, obtaining the residuals, 2. Then run the auxiliary regression Performing White’s Test for Heteroscedasticity 3. Obtain R2 from the auxiliary regression and multiply it by the number of observations, T. It can be shown that T R2  2 (m) where m is the number of regressors in the auxiliary regression excluding the constant term. 4. If the 2 test statistic from step 3 is greater than the corresponding value from the statistical table then reject the null hypothesis that the disturbances are homoscedastic. (Refer to Box 5.1 and example 5.1 on pages183~184. Testing for Heteroscedasticity using EViews – Bloodpre.xls) Consequences of Using OLS in the Presence of Heteroscedasticity • OLS estimation still gives unbiased coefficient estimates, but they are no longer BLUE. • This implies that if we still use OLS in the presence of heteroscedasticity, our standard errors could be inappropriate and hence any inferences we make could be misleading. • Whether the standard errors calculated using the usual formulae are too big or too small will depend upon the form of the heteroscedasticity. How Do we Deal with Heteroscedasticity? • If the form (i.e. the cause) of the heteroscedasticity is known, then we can use an estimation method which takes this into account (called generalised least squares, GLS). GLS is also known as weighted least squares (WLS). • A simple illustration of GLS is as follows: Suppose that the error variance is related to another variable zt by • To remove the heteroscedasticity, divide the regression equation by zt where • Now is an error term. for known zt. • So the disturbances from the new regression equation will be homoscedastic. Other Approaches to Dealing with Heteroscedasticity • Other solutions include: 1. Transforming the variables into logs or reducing by some other measure of “size”. 2. Use White’s heteroscedasticity consistent standard error estimates. The effect of using White’s correction is that in general the standard errors for the slope coefficients are increased relative to the usual OLS standard errors. This makes us more “conservative” in hypothesis testing, so that we would need more evidence against the null hypothesis before we would reject it. Assumption 3: Cov(ui, uj) = 0 for i ≠ j • Assumption 3 that is made of the CLRM’s disturbance terms is assumed that the errors are uncorrelated with one another. • If the errors are not uncorrelated with one another, it would be stated that they are “autocorrelated” or that they are “serially correlated”. A test of this assumption is therefore required. • Before we proceed to see how formal tests for autocorrelation are formulated, the concept of the lagged value of a variable needs to be defined. • The lagged value of a variable (which may be yt, xt or ut) is simply the value that the variable took during a previous period, e.g. the value of yt lagged one period, written yt-1, can be constructed by shifting all of the observations forward one period in a spreadsheet, as illustrated in the table below: The Concept of a Lagged Value t 1989M09 1989M10 1989M11 1989M12 1990M01 1990M02 1990M03 1990M04 . . . yt 0.8 1.3 -0.9 0.2 -1.7 2.3 0.1 0.0 . . . yt-1 0.8 1.3 -0.9 0.2 -1.7 2.3 0.1 . . . yt 1.3-0.8=0.5 -0.9-1.3=-2.2 0.2--0.9=1.1 -1.7-0.2=-1.9 2.3--1.7=4.0 0.1-2.3=-2.2 0.0-0.1=-0.1 . . . Autocorrelation • We assumed of the CLRM’s errors that Cov (ui , uj) = 0 for ij. This is essentially the same as saying there is no pattern in the errors. • Obviously we never have the actual u’s, so we use their sample counterpart, the residuals . • If there are patterns in the residuals from a model, we say that they are autocorrelated. • Some stereotypical patterns we may find in the residuals are given on the next 3 slides. Positive Autocorrelation Positive Autocorrelation is indicated by a cyclical residual plot over time. Negative Autocorrelation Negative autocorrelation is indicated by an alternating pattern where the residuals cross the time axis more frequently than if they were distributed randomly No pattern in residuals – No autocorrelation No pattern in residuals at all: this is what we would like to see Detecting Autocorrelation: The Durbin-Watson Test The Durbin-Watson (DW) is a test for first order autocorrelation - i.e. it assumes that the relationship is between an error and the previous one ut =  ut-1 + vt (1) where vt  N(0, v2). • The DW test statistic actually tests H0 : =0 and H1 : 0 • The test statistic is calculated by The Durbin-Watson Test: Critical Values • We can also write (2) where is the estimated correlation coefficient. Since implies that -1 is a correlation, it  1. (refer to pages 194~196). • Rearranging for DW from (2) would give 0  DW  4. • If = 0, DW = 2. So roughly speaking, do not reject the null hypothesis if DW is near 2  i.e. there is little evidence of autocorrelation • Unfortunately, DW has 2 critical values, an upper critical value (du) and a lower critical value (dL), and there is also an intermediate region where we can neither reject nor not reject H0. The Durbin-Watson Test: Interpreting the Results • Discuss Example 5.2 on page 197. Another Test for Autocorrelation: The Breusch-Godfrey Test • It is a more general test for rth order autocorrelation: N(0, ) • The null and alternative hypotheses are: H0 : 1 = 0 and 2 = 0 and ... and r = 0 H1 : 1  0 or 2  0 or ... or r  0 • The test is carried out as follows: 1. Estimate the linear regression using OLS and obtain the residuals, 2. Regress on all of the regressors from stage 1 (the x’s) plus Obtain R2 from this regression. 3. It can be shown that (T-r)R2  2(r) • If the test statistic exceeds the critical value from the statistical tables, reject the null hypothesis of no autocorrelation. (Refer to Box 5.4 on page 198) Consequences of Ignoring Autocorrelation if it is Present • The coefficient estimates derived using OLS are still unbiased, but they are inefficient, i.e. they are not BLUE, even in large sample sizes. • MSE may seriously underestimate the variance of the error terms. • Confidence intervals and tests using the t and F distributions, discussed earlier, are no longer strictly applicable. • Thus, if the standard error estimates are inappropriate, there exists the possibility that we could make the wrong inferences. • R2 is likely to be inflated relative to its “correct” value for positively correlated residuals. Dealing with Autocorrelation • If the form of the autocorrelation is known, we could use a GLS procedure – i.e. an approach that allows for autocorrelated residuals e.g., Cochrane-Orcutt approach ( Refer to Box 5.5 on page 201). • Discuss equations (5.18 ~ 5.23) on pages 199~200. • Autocorrelation in EViews –Capm.xls and Sale.xls • Example – CAPM – – – – Y = excess return on shares in Company A (percentage) X1 = excess return on a stock index (percentage) X2 = the sales of Company A (thousands of dollars) X3 = the debt of Company A (thousands of dollars) Example - CAPM Dependent Variable: Y Method: Least Squares Date: 15/08/10 Time: 18:03 Sample: 1 120 Included observations: 120 Coefficient Std. Error t-Statistic Prob. C X1 X2 X3 2.529897 1.747189 -0.000303 -0.022015 1.335296 0.202429 0.000939 0.011795 1.894633 8.631119 -0.322231 -1.866527 0.0606 0.0000 0.7479 0.0645 R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic) 0.410147 0.394892 1.008033 117.8712 -169.1987 26.88639 0.000000 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat 2.143089 1.295861 2.886644 2.979561 2.924378 2.197432 Example – Sale • Consider the time series data in the following table which gives sales data for the 35 year history of a company. Example – Sale (cont.) Dependent Variable: SALE Method: Least Squares Date: 15/08/10 Time: 17:52 Sample: 1 35 Included observations: 35 C YEAR R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic) Coefficient Std. Error t-Statistic Prob. 0.397143 4.295714 2.205591 0.106861 0.180062 40.19900 0.8582 0.0000 0.979987 0.979381 6.384903 1345.310 -113.5209 1615.959 0.000000 Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat 77.72000 44.46515 6.601195 6.690072 6.631875 0.821097 Models in First Difference Form • Another way to sometimes deal with the problem of autocorrelation is to switch to a model in first differences. • Denote the first difference of yt, i.e. yt - yt-1 as yt; similarly for the xvariables, x2t = x2t - x2t-1 etc. • The model would now be yt = 1 + 2 x2t + ... + kxkt + ut Testing the Normality Assumption • Testing for Departures from Normality- Bera Jarque normality test • Skewness and kurtosis are the (standardised) third and fourth moments of a distribution. The skewness and kurtosis can be expressed respectively as: S = E [ u3 ] ( ) s2 3/2 K = E[u4 ] ( ) s2 2 • The Skewness of the normal distribution is 0. The kurtosis of the normal distribution is 3 so its excess kurtosis (K -3) is zero. • We estimate S and K using the residuals from the OLS regression, , b1 (estimator of S) and (b2 -3) (estimator of (K-3)) may be distributed approximately normally, N(0, 6/T) and N(0, 24/T) respectively for large sample sizes. Normal versus Skewed Distributions A normal distribution A skewed distribution Leptokurtic versus Normal Distribution Testing for Normality • Bera and Jarque formalise this by testing the residuals for normality by testing whether the coefficient of skewness and the coefficient of excess kurtosis are jointly zero. • The Bera Jarque test statistic is given by • Testing for non-normality using EViews - Capm.xls and Sales.xls Testing for Normality - Camp.xls Testing for Normality - Sales.xls

Use Quizgecko on...
Browser
Browser