Unit 10+11: Introduction to Time Series Analysis PDF
Document Details
Uploaded by AwesomeCarnelian4810
Martin / Zulehner
Tags
Summary
This document provides an introduction to time series analysis, covering topics like OLS with time series data, trends, and seasonality. It appears to be lecture notes or study material for an introductory econometrics course.
Full Transcript
Unit 10+11: Introduction to time series analysis Martin / Zulehner: Introductory Econometrics 1 / 76 Outline 1 Overview 2 OLS with time series data Autocorrelation Trends and seasonality Example: NY Fish market Example: Expectations augmented Phillips curve...
Unit 10+11: Introduction to time series analysis Martin / Zulehner: Introductory Econometrics 1 / 76 Outline 1 Overview 2 OLS with time series data Autocorrelation Trends and seasonality Example: NY Fish market Example: Expectations augmented Phillips curve 3 Analysis of univariate time series data Additional notation and definitions Stationarity MA(1) process and AR(1) process Efficient Markets Hypothesis (EMH) Martin / Zulehner: Introductory Econometrics 2 / 76 Time series data One or more Variables are repeatedly observed over time The order of the observations is relevant Central property: time difference of the regular census - data frequency Data frequency: Frequency of the census in between a given period of time Martin / Zulehner: Introductory Econometrics 3 / 76 Time series data - Example Example Tabelle: German GDP and National income (in Bil. Euro) Year GDP National income 1997 1,871.60 1,404.63 1998 1,929.40 1,442.17 1999 1,978.60 1,468.22 2000 2,030.00 1,509.45 2001 2,074.00 1,538.35 2002 2,107.30 1,551.88 2003 2,128.20 1,569.26 Source: Sachverständigenrat (2004) Martin / Zulehner: Introductory Econometrics 4 / 76 Special problems with time series data Missing mutual independency, e.g. GDP for the years 1999 and 2000 Time trend: particulary cumbersome for two or more variables Long term trend is often confounded by short term periodic variation e.g. seasonal effects Definition of variables can change over time, caused for example by law reforms or historic events (e.g. German reunification) Martin / Zulehner: Introductory Econometrics 5 / 76 Seasonal effect for the unemployment rate Abbildung: Monthl. unemployment rate in western Germany, Jan. 1980 - Dec. 2005 Martin / Zulehner: Introductory Econometrics 6 / 76 Analysis of time series data Time series data are data collected on the same observational unit at multiple time periods Aggregate consumption and GDP for a country (for example, 20 years of quarterly observations = 80 observations) Yen/$, pound/$ and Euro/$ exchange rates (daily data for 1 year = 365 observations) Cigarette consumption per capita in California, by year (annual data) A Definition of Time Series Econometrics Time series analysis rather than basing our predictions of future movements in a variable by relating it to a set of other variables in a causal framework, we base our prediction solely on the past behaviour of the variable itself. Martin / Zulehner: Introductory Econometrics 7 / 76 Analysis of time series data Example I Take for example the production of a commodity (e.g. Coffee). I Production may have changed in response to changes in prices, disposable income and interest rates (variables we can measure). I It may also have changed due to changes in weather, tastes or seasonal cycles that are less easy to measure. I As such it may be difficult to model the movement of the series through a structural model. I Even if we could construct such a model it may be of little use for forecasting (e.g. if we have to forecast the explanatory variables before being able to forecast the dependent variable). Martin / Zulehner: Introductory Econometrics 8 / 76 Analysis of time series data Example continued I In such cases an alternative approach to modelling a series and obtaining forecasts is needed. I Time series models look to predict a series’ past behaviour, which allows us to forecast the future behaviour of a series. I We do this by observing the past behaviour of the time series and searching for patterns, e.g. does the series exhibit an upward trend? Does the series exhibit cyclical or seasonal behaviour? I Time series models do not offer a structural explanation for the behaviour of the series, but look to replicate past behaviour. BUT: we will also have a look a regression models with time series data Martin / Zulehner: Introductory Econometrics 9 / 76 Dow Jones Industrial Martin / Zulehner: Introductory Econometrics 10 / 76 Dow Jones Daily Returns Martin / Zulehner: Introductory Econometrics 11 / 76 Dow Jones Monthly Returns Martin / Zulehner: Introductory Econometrics 12 / 76 US Monthly Natural Gas Consumption Martin / Zulehner: Introductory Econometrics 13 / 76 US Monthly Total Vehicle Sales Martin / Zulehner: Introductory Econometrics 14 / 76 Real GDP Martin / Zulehner: Introductory Econometrics 15 / 76 Civilian Unemployment Rate Martin / Zulehner: Introductory Econometrics 16 / 76 Consumer Price Index Martin / Zulehner: Introductory Econometrics 17 / 76 3-month Treasury Bill Martin / Zulehner: Introductory Econometrics 18 / 76 Analysis of time series data OLS with time series data: causal dependence between variables I How does yt depend on xt ? I How does yt depend on xt , xt−1 ,... ? I How does xt depend on yt−1 ? Description of time series properties Forecast I Forecast yt on basis of yt−1 , yt−2 ,... I Forecast yt on basis of xt−1 , xt−2 ,... Examples I single times series: AR(I)MA processes, unit-root I multiple times series models: VAR, co-integration, VEC I static model, dynamic causal effects, dynamic panel models I autoregressive-distributed lag (ADL) model Martin / Zulehner: Introductory Econometrics 19 / 76 OLS with time series data Time series vs. cross sectional data Time series data has a temporal ordering, unlike cross-section data. We will need to alter some of our assumptions to take into account that we no longer have a random sample of individuals. Instead, we have one realization of a stochastic (i.e. random) process. Examples of some time series models A static model relates contemporaneous variables: yt = β0 + β1 zt + ut A finite distributed lag (FDL) model allows one or more variables to affect y with a lag: yt = α0 + δ0 zt + δ1 zt−1 + δ2 zt−2 + ut More generally, a finite distributed lag model of order q will include q lags of z. Martin / Zulehner: Introductory Econometrics 20 / 76 OLS with time series data We assume a model that is linear in parameters: yt = β0 + β1 xt1 +... + βk xtk + ut We make a zero conditional mean assumption: E (ut |X ) = 0, t = 1, 2,... , T. I Note that this implies the error term in any given period is uncorrelated with the explanatory variables in all time periods. I This zero conditional mean assumption implies the x’s are strictly exogenous. An alternative assumption, more parallel to the cross-sectional case: E (ut |xt ) = 0 I This assumption would imply the x’s are contemporaneously exogenous. I Contemporaneous exogeneity will only be sufficient in large samples. Martin / Zulehner: Introductory Econometrics 21 / 76 OLS with time series data We also need to assume that no x is constant, and that there is no perfect collinearity. Note that we have skipped the assumption of a random sample. I The key impact of the random sample assumption is that each ui is independent. I Our strict exogeneity assumption takes care of it in this case. Based on these 3 assumptions, when using time-series data, the OLS estimators are unbiased. I Thus, just as it was the case with cross-section data, under the appropriate conditions OLS is unbiased. I Omitted variable bias can be analyzed in the same manner as in the cross-section case. Martin / Zulehner: Introductory Econometrics 22 / 76 Variance of OLS estimator Just as in the cross-section case, we need to add the assumption of homoskedasticity in order to be able to derive variances. Now we assume: Var (ut |X ) = Var (ut ) = σ 2 Thus, the error variance is independent of all the x’s, and it is constant over time. We also need the assumption of no serial correlation: Corr (ut , us |X ) = 0 for t 6= s Under these 5 assumptions, the OLS variances in the time-series case are the same as in the cross-section case. I The estimator of σ 2 is the same I OLS remains BLUE. I With the additional assumption of normal errors, inference is the same. Martin / Zulehner: Introductory Econometrics 23 / 76 OLS with time series data strict exogeneity, homoscedasticity, and no serial correlation are very demanding requirements, especially in the time series context I strict exogeneity rules out feedback from the dependent variable on future values of the explanataory variables (which is very common in economic contexts) I strict exogeneity also precludes the use of lagged dependet variable as regressors → alternative assumption: E (ut |xt ) = 0 and xkt , yt are stationary and weakly dependent I this assumption would imply the x’s are contemporaneously exogenous I contemporaneous exogeneity will only be sufficient in large samples (→ OLS consistent) if there is heteroskedasticity or autocorrelation, I we may obtain an estimate for first-order autocorrelation using Cocrane-Orcutt, Prais-Winsten or Hildreth-Li procedures I standard errors should be corrected using HAC Martin / Zulehner: Introductory Econometrics 24 / 76 OLS with time series data Why do lagged dependent variables violate strict exogeneity? I let’s assume: yt = β0 + β1 yt−1 + ut I contemporanous exogeneity: E (ut | yt−1 ) = 0 I strict exogeneity: E (ut | y0 , y1 ,... , yn−1 ) = 0 ←− Strict exogeneity would imply that the error term is uncorrelated with all yt,t t = 1,... , n − 1 I This leads to a contradiction because: Cov (yt , ut ) = β1 Cov (yt−1 , ut ) + Var (ut ) > 0 OLS estimation in the presence of lagged dependent variables Under contemporaneous exogeneity, OLS is consistent but biased Martin / Zulehner: Introductory Econometrics 25 / 76 Example 1: Static model consider a static model with two explanatory variables: I yt = β0 + β1 zt1 + β2 zt2 + ut I under weak dependence, the condition sufficient for consistency of OLS is E (ut | zt1 , zt2 ) = 0 I this rules out omitted variables that are in ut and are correlated with either zt1 or zt2 weak dependence does not rule out correlation between ut−1 and zt1 I this type of correlation could arise if zt1 is related to past yt−1 , such as zt1 = δ0 + δ1 yt−1 + vt I for example, zt1 might be a policy variable, such as monthly percentage change in the money supply, and this change might depend on last month’s rate of inflation (yt−1 ) I such a mechanism generally causes zt1 and ut−1 to be correlated (as can be seen by plugging in for yt−1 ) I this kind of feedback is allowed under the assumption of weak dependence Martin / Zulehner: Introductory Econometrics 26 / 76 Example 2: Finite distributed lag model in the finite distributed lag model, yt = α0 + δ0 zt + δ1 zt−1 + δ2 zt−2 + ut a very natural assumption is that the expected value of ut , given current and all past values of z, is zero: E (ut | zt , zt−1 , zt−2 , zt−3 ,...) = 0 I this means that, once zt , zt−1 , and zt−2 are included, no further lags of z affect E (yt | zt , zt−1 , zt−2 , zt−3 ,...); I if this were not true, we would put further lags into the equation. for example, yt could be the annual percentage change in investment and zt a measure of interest rates during year t when we set xt = (zt , zt−1 , zt−2 ), the assumption of weak dependence is then satisfied: OLS will be consistent. as in the previous example, weak dependence does not rule out feedback from y to future values of z. Martin / Zulehner: Introductory Econometrics 27 / 76 Autocorrelation - What if Corr (ut , us |X ) 6= 0? the OLS estimator is inefficient, but unbiased and consistent I positive autocorrelation: estimated standard deviations are too small, therefore the t-values are too large I negative autocorrelation: estimated standard deviations are too large, therefore the t-values are too small I if there are additionally lagged dependent variables on the right-hand side, the OLS estimates are biased and inconsistent autocorrelation denotes the correlation between a time series yt and its own lagged values yt−s with s = −∞,... , ∞: yt = ρ yt−1 + ut or consider the error term in a linear model yt = β Xt + t t = ρ t−1 + ut Martin / Zulehner: Introductory Econometrics 28 / 76 How to test for first order autocorrelation calculate the Durbin-Watson statistic to test for first-order serial correlation I H0 : ρ = 0 and H1 : ρ 6= 0 test statistic: PT 2 t=2 (ût−1 + ût ) DW = PT 2 t=2 (ût ) for T → ∞ the test statistic DW → 2 − 2ρ. I no serial correlation: the DW statistic will be around 2. I positive serial correlation: the DW statistic will fall below 2 I negative correlation, the DW statistic will lie somewhere between 2 and 4 critical values are tabulated: for each DW test statistic there are two critical values ρUPPER and ρLOWER depending on the # of parameters and the # of observations to find a decision: I do not reject the null hypothesis, if DW > ρUPPER I reject the null hypothesis, if DW < ρLOWER I no decision, if ρLOWER < DW < ρUPPER Martin / Zulehner: Introductory Econometrics 29 / 76 Cocrane-Orcutt procedure iterative procedure to correct for autocorrelation using OLS estimation I yt = Xt β + t with t = ρ t−1 + ut general-differencing transformation I yt = Xt β + t I yt−1 = Xt−1 β + t−1 | ∗ ρ I → yt − ρyt−1 = (Xt − ρXt−1 )β + t − ρt−1 = (Xt − ρXt−1 )β + ut with ut ∼ N(0, σ 2 ) → OLS is applicable following steps are necessary: I estimation of the original model yt = Xt β + 0,t I generate ˆ0,t = ŷt − yt and estimate ˆ0,t = ρ0 ˆ0,t−1 to obtain ρ̂0 and derive (yt − ρ̂0 yt−1 ) = (Xt − ρ̂0 Xt−1 )β + 1t I generate new residuals I repeat until |ρ̂n − ρ̂n−1 | < δ, δ > 0 Note: this procedure does not necessarily find the global minimum; it is possible to end up with a local one Martin / Zulehner: Introductory Econometrics 30 / 76 Hildreth-Lu procedure a set of grid values for ρ will be specified for example: ρ0 = 0, 0.1,... , 1.0 following steps are then necessary: I estimation of yt = Xt β + t with t = ρ0 t−1 + ut for each ρ0 = 0, 0.1,... , 1.0 I the estimation with the lowest sum of squared residuals (SSR) gives ρ̂0 I find a new set of grid values in the neighborhood of ρ̂0 to get ρ̂1 I generate new estimations I repeat until |ρ̂n − ρ̂n−1 | < δ, δ > 0 Note: it is also possible to find a local, and not the global minimum Martin / Zulehner: Introductory Econometrics 31 / 76 HAC covariances (Newey-West) the White covariance matrix described earlier assumes that the residuals of the estimated equation are serially uncorrelated Newey and West (1987b) have proposed a more general covariance matrix estimator that is consistent in the presence of both heteroscedasticity and autocorrelation of unknown form the Newey-West estimator is given by T Σ̂NW = (X 0 X )−1 ΩX 0 X )−1 , T − (K + 1) where T q T X 2 0 X ν X 0 0 0 0 Ω= ( ˆ xt xt + (1 − ) T (xt ˆt ˆt−ν xt−ν + xt−ν ˆt−ν ˆt xt ))), T − (K + 1) t=1 t ν=1 1 + q t=ν+1 and q, the truncation lag, is a parameter representing the # of autocorrelations used in evaluating the dynamics of the OLS residuals ˆ Note: using the White heteroscedasticity consistent or the Newey-West HAC consistent covariance estimates does not change the point estimates of the parameters, only the estimated standard errors Martin / Zulehner: Introductory Econometrics 32 / 76 Trending time series Economic time series often have a trend. I Just because 2 series are trending together, we can not assume that the relation is causal. I Often, both will be trending because of other unobserved factors. I Even if those factors are unobserved, we can control for them by directly controlling for the trend. Choices to model the trend I linear trend: yt = α0 + α1 t + ut , t = 1, 2,... I exponential trend: log (yt ) = α0 + α1 t + ut , t = 1, 2,... I quadratic trend: yt = α0 + α1 t + α2 t 2 + ut , t = 1, 2,... Martin / Zulehner: Introductory Econometrics 33 / 76 Detrending Adding a linear trend term to a regression is the same thing as using “detrended” series in a regression: I detrending a series involves regressing each variable in the model on t I take the residuals form the detrended series I basically, the trend has been partialled out An advantage to actually detrending the data (vs. adding a trend) involves the calculation of goodness of fit: I time-series regressions tend to have very high R 2 , as the trend is well explained I the R 2 from a regression on detrended data better reflects how well the xt ’s explain yt Martin / Zulehner: Introductory Econometrics 34 / 76 Seasonality Often time-series data exhibits some periodicity, referred to as seasonality. Example: Quarterly data on retail sales will tend to jump up in the 4th quarter. Seasonality can be dealt with by adding a set of seasonal dummies. As with trends, the series can be seasonally adjusted before running the regression. Martin / Zulehner: Introductory Econometrics 35 / 76 Example: NY Fish market C9 The file FISH.RAW contains 97 daily price and quantity observations on fish prices at the Fulton Fish Market in New York City. Use the variable log( avgprc ) as the dependent variable. i Regress log( avgprc ) on four daily dummy variables, with Friday as the base. Include a linear time trend. Is there evidence that price varies systematically within a week? ii Now, add the variables wave 2 and wave3, which are measures of wave heights over the past several days. Are these variables individually significant? Describe a mechanism by which stormy seas would increase the price of fish. iii What happened to the time trend when wave 2 and wave 3 were added to the regression? What must be going on? iv Explain why all explanatory variables in the regression are safely assumed to be strictly exogenous. v Test the errors for AR(1) serial correlation. vi Obtain the Newey-West standard errors using six lags. What happens to the t statistics on wave 2 and wave3? Did you expect a bigger or smaller change compared with the usual OLS t statistics? Martin / Zulehner: Introductory Econometrics 36 / 76 Example: NY Fish market Price for fish Dependent variable Average Average Average Average Average price price price residuals price price in logs in logs in logs from (3) in logs in logs (1) (2) (3) (4) (5) (6) Monday -0.0101 -0.0182 -0.0121 0.0473 -0.0121 (0.129) (0.114) (0.114) (0.089) (0.096) Tuesday -0.0088 -0.0085 -0.0090 -0.0338 -0.0090 (0.127) (0.112) (0.112) (0.087) (0.107) Wednesday 0.0376 0.0500 0.0505 0.0447 0.0505 (0.126) (0.112) (0.112) (0.086) (0.088) Thursday 0.0906 0.1225 0.1242 0.1120 0.1242 (0.126) (0.111) (0.111) (0.086) (0.058)∗∗ Time trend -0.0040 -0.0012 (0.001)∗∗ (0.001) Wave height 2 days before 0.0909 0.0945 0.0770 0.0945 (0.022)∗∗ (0.021)∗∗ (0.017)∗∗ (0.020)∗∗ Wave height 3 days before 0.0474 0.0526 0.0765 0.0526 (0.021)∗∗ (0.020)∗∗ (0.016)∗∗ (0.018)∗∗ Lagged residuals 0.6155 0.6414 (0.081)∗∗ (0.084)∗∗ Constant -0.0730 -0.9203 -1.0228 0.0082 -1.0470 -1.0228 (0.115) (0.190)∗∗ (0.144)∗∗ (0.027) (0.113)∗∗ (0.179)∗∗ Number of observations 97 97 97 96 96 97 R-squared adjusted 0.035 0.255 0.258 0.374 0.561 Standard errors in parentheses Martin / Zulehner: Introductory Econometrics 37 / 76 Example: NY Fish market Various tests based on the etsimations in column (3) F-test: test wave2 wave3 ( 1) wave2 = 0 ( 2) wave3 = 0 F( 2, 90) = 19.10 Prob > F = 0.0000 F-test: test mon tues wed thurs ( 1) mon = 0 ( 2) tues = 0 ( 3) wed = 0 ( 4) thurs = 0 F( 4, 90) = 0.53 Prob > F = 0.7134 Durbin-Watson test Durbin-Watson d-statistic( 7, 97) =.7509271 → ρ =.62453645 Martin / Zulehner: Introductory Econometrics 38 / 76 Expectations augmented Phillips curve A linear version of the expectations augmented Phillips curve can be written as inft − infet = β1 ( unem t − µ0 ) + et , e where µ0 is the natural rate of unemployment and inf t is the expected rate of inflation formed in year t − 1. I this model assumes that the natural rate is constant, something that macroeconomists question. I the difference between actual unemployment and the natural rate is called cyclical unemployment, while the difference between actual and expected inflation is called unanticipated inflation. I The error term, et , is called a supply shock by macroeconomists. I If there is a tradeoff between unanticipated inflation and cyclical unemployment, then β1 < 0. Martin / Zulehner: Introductory Econometrics 39 / 76 Expectations Augmented Phillips Curve to complete this model, we need to make an assumption about inflationary expectations. I Under adaptive expectations, the expected value of current inflation depends on recently observed inflation. I A particularly simple formulation is that expected inflation this year is last year’s inflation: infet = inft−1. Under this assumption, we can write inft − inft−1 = β0 + β1 unem t + et or: ∆inft = β0 + β1 unem t + et , where ∆inft = inft − inft−1 and β0 = −β1 µ0. I β0 is expected to be positive, since β1 < 0 and µ0 > 0 I therefore, under adaptive expectations, the expectations augmented Phillips curve relates the change in inflation to the level of unemployment and a supply shock, et I if et is uncorrelated with unemt , as is typically assumed, then we can consistently estimate β0 and β1 by OLS I we do not have to assume that, say, future unemployment rates are unaffected by the current supply shock Martin / Zulehner: Introductory Econometrics 40 / 76 Expectations augmented Phillips curve Using the data through 1996 in PHILLIPS.RAW we estimate c t =3.03 −.543 unem ∆inf t (1.38) (.230) n =48, R 2 =.108, R̄ 2 =.088. Martin / Zulehner: Introductory Econometrics 41 / 76 Analysis of univariate time series data forecasting and estimation of causal effects are quite different objectives. for estimation of causal effects, we were very concerned about omitted variable bias, control variables, and exogeneity. for forecasting, I omitted variable bias isn’t a problem! I we won’t worry about interpreting coefficients in forecasting models - no need to estimate causal effects if all you want to do is forecast I what is important is, instead, that the model provide an out-of-sample prediction (forecast) that is as accurate as possible. → Inference about properties of yt I But: Just a single realization for every t is observed Martin / Zulehner: Introductory Econometrics 42 / 76 Analysis of univariate time series data concept of stationarity I for inference it is necessary that random variables yt have common (or similar) characteristics, eg. consider yt for t = 1,... , T as a realization of the same distribution I for the in-sample model to be useful out-of-sample, the out-of-sample period (near future) must be like the in-sample data (the historical data) concept of ergodicity I it is desirable to use the observations over time to determine properties for every yt I if the dependence between the yt ’s are not too strong, every single yt contains sufficient information beyond the other elements I we want to make sure that the average behavior observed in our sample is representative of the average behavior in the population, supporting our ability to make meaningful predictions based on limited data Martin / Zulehner: Introductory Econometrics 43 / 76 Time series: additional notation and definitions a time series (TS) is chronologically ordered sequence (set) of observations yt , t ∈ T , of a (random) size: {yt }t∈T I observations at time t : yt I T discrete set ⇒ {yt }t∈T TS in discrete time (data per hour, day, week, month, quarter, year etc.) I special case: T finite, equidistant points in time (T = {1,... , T } → {yt }t∈T = {y1 ,... , yT } properties I each observation yt of the time series is seen as a realization of a random variable Yt I observed time series {yt } is a realization of the family of random variables {Yt } I Note: We only distinguish between random variables Yt and their realizations yt whenever necessary. In all other cases, we stick to the notation yt. Martin / Zulehner: Introductory Econometrics 44 / 76 Time series: additional notation and definitions Complete probabilistic time series model for the sequence of random variables {Y1 , Y2 ,... , } would specify all joint distributions of the random vectors (Y1 ,... , Yt )0 , t = 1, 2,..., i.e., P [Y1 ≤ y1 ,... , Yt ≤ yt ] , −∞ < y1 ,... , yt < ∞, t = 1, 2,... I typically requires far too many parameters to be estimated I Rather specify only the first- and second-order moments, i.e., E [Yt ] and E [Yt+h Yt ] , t = 1, 2,... , h = 0, 1, 2,... I in case of multivariate normality, second-order properties completely determine joint distribution. Martin / Zulehner: Introductory Econometrics 45 / 76 Time series: additional notation and definitions Typical Characteristics of Time Series I yt is typically not independent of yt−1 I “Strength” of dependence between yt and yt−1 is an essential characteristic of times series Examples: I Independence between yt for all t = 1,... , T. I AR(1) process: yt = φyt−1 + εt with |φ| < 1 I Random Walk: yt = yt−1 + εt I Deterministic trend: yt = mt + εt Martin / Zulehner: Introductory Econometrics 46 / 76 Time series: additional notation and definitions benchmark: IID noise I For iid noise random variables Y1 , Y2 ,... : P [Y1 ≤ y1 ,... , Yn ≤ yn ] = P [Y1 ≤ y1 ] · · · P [Yn ≤ yn ] = F (y1 ) · · · F (yn ) where F (·) is the cdf for each of Y1 , Y2 ,... I For all h ≥ 1 and all y , y1 ,... , yn P [Yn+h ≤ y | Y1 = y1 ,... , Yn = yn ] = P [Yn+h ≤ y ] , i.e., y1 ,... , yn has no value for predicting yh+n I IID processes cannot be used for forecasting but are important as building block for more complicated time series models Martin / Zulehner: Introductory Econometrics 47 / 76 Time series: additional notation and definitions benchmark: IID random walk I a time series {yt , t = 0, 1, 2,...} starting at zero is called random walk if it is obtained by cumulatively summing iid random variables εt I a random walk with zero mean is obtained by defining y0 = 0 and yt = ε1 + ε2 + · · · + εt for t = 1, 2,... where {εt } ∼ IID 0, σ 2 → yt = yt−1 + εt simulated random walk Martin / Zulehner: Introductory Econometrics 48 / 76 Bitcoin Martin / Zulehner: Introductory Econometrics 49 / 76 Amazon Martin / Zulehner: Introductory Econometrics 50 / 76 Time series: additional notation and definitions Let {yt } be a time series with E yt2 < ∞. The mean function of {yt } is µy (t) = E [yt ] The covariance function of {yt } is Cov [yt , ys ] := γ(t, s) := E [(yt − µy (t)) (ys − µy (s))]. The covariance function or autocovariance function describes the linear dependence of observations The autocorrelation function is Cov [yt , ys ] γ(t, s) Corr [yt , ys ] := ρ(t, s) := 1 1 = 1 1 Var [yt ] Var [ys ]2 2 γ(t, t) 2 γ(s, s) 2 with γ(t, t) := Var [yt ] := E (yt − E [yt ])2. Properties: I Symmetry: γ(t, s) = γ(s, t) I γ(t, s) = 0 ⇒ ys and yt are uncorrelated but can be dependent on each other Martin / Zulehner: Introductory Econometrics 51 / 76 Testing for the absence of autocorrelation If yt is stationary with yt = µ + qi=0 ψi εt−i , where ψ0 = 1, q is a non-negative P integer, and {εj } is Gaussian WN, then q a X ρ̂j ∼ N(0, (1 + 2 ρ2i )/T ) i=1 for j > q and T denoting the sample range (Bartlett’s formula). a If yt is i.i.d. with E yt2 < ∞, then ρ̂j ∼ N 0, T −1 Testing for the absence of autocorrelation: H0 : ρ 1 = ρ2 =... = ρk = 0 H1 : ρj 6= 0 for some j ∈ {1,... , k} Martin / Zulehner: Introductory Econometrics 52 / 76 Testing for the absence of autocorrelation Pk Box and Pierce (1970) propose the Portmanteau statistic QBP (k) = T j=1 ρ̂2j a I If {yt } is i.i.d., we have QBP (k) ∼ χ2(k−p) , where p is the number of parameters to be estimated. Ljung and Box (1978): k X ρ̂2j a QLB (k) = T (T + 2) ∼ χ2(k−p) j=1 T −j I The Ljung-Box statistic has a higher power in small samples. I However, asymptotically, QBP (k) and QLB (k) are equivalent. Martin / Zulehner: Introductory Econometrics 53 / 76 How to test for higher order autocorrelation autocorrelation coefficient T 1 X 0 rl = ût ût−l , l = 1,... , L with L the highest lag û 0 û t=l+1 the correlogram is the series of the autocorrelation coefficients: I the 95% confidence interval shows the significance of each autocorrelation coefficient; if one of the coefficients is significant, then there is correlated Ljung-Box Q-statistics: l X rj2 Ql = T (T + 2) , l = 1,... , L j=1 T −j with rj = Corr (yt , yt−j ) and L the highest lag; the test statistic Ql is χ2l distributed under the null hypothesis of no autocorrelation if there is no serial correlation in the residuals, the autocorrelations and partial autocorrelations at all lags should be nearly zero, and all Q-statistics should be insignificant with large p-values Martin / Zulehner: Introductory Econometrics 54 / 76 Real GDP the first four autocorrelations are: ρ̂1 = 0.33, ρ̂2 = 0.26, ρ̂3 = 0.10, and ρ̂4 = 0.11 Martin / Zulehner: Introductory Econometrics 55 / 76 Stationarity Definition: The time series {yt }t∈Z is called weakly (covariance) stationary if: I E y2 < ∞ t ∀t ∈ Z I E [yt ] = µ ∀t ∈ Z I γ(s, t) = γ(s + r , t + r ) ∀s, t, r ∈ Z Other definition: The times series {yt }t∈Z is called weakly stationary if: I µy (t) is independent of t I γ(t + h, t) is independent of t for each h Under stationarity: γ(s, t) = γ(t − s, 0) = γ(h, 0) = γ(−h, 0) ∀s, t Martin / Zulehner: Introductory Econometrics 56 / 76 Stationarity Let {yt } be a weakly stationary times series. Then, the autocovariance function (ACVF) of {yt } at lag h is defined by γh := γ(h, 0) = Cov [yt , yt−h ] ∀t, h ∈ Z The autocorrelation function (ACF) of {yt } at lag h is γh ρh := γ0 I the autocorrelation function provides a partial description of the stochastic process for modelling purposes. I the autocorrelation function tells us how much correlation (i.e. interdependency) there is between neighbouring data points in the series y Martin / Zulehner: Introductory Econometrics 57 / 76 Stationarity Let y1 ,... , yn be observations of a time series. The sample mean of y1 ,... , yT is T 1X ȳ = yt. n t=1 The sample autocovariance function is T −|h| 1 X γ̂h := yt+|h| − ȳ (yt − ȳ ) , −T < h < T. n t=1 The sample autocorrelation function is γ̂h ρ̂h = , −T < h < T γ̂0 Martin / Zulehner: Introductory Econometrics 58 / 76 ACF of Dow Jones index and returns Martin / Zulehner: Introductory Econometrics 59 / 76 Other examples Martin / Zulehner: Introductory Econometrics 60 / 76 Stationarity Definition: A stochastic process {yt }t∈Z is called strict stationary if for all n ∈ N and for all t1 ,... , tn , h ∈ Z the (finite-dimensional boundary) distribution of (yt1 ,... , ytn )0 and (yt1 +h ,... , ytn +h )0 are identical, meaning d (yt1 ,... , ytn )0 = (yt1 +h ,... , ytn +h )0. I thus, strict stationarity implies that the xt ’s are identically distributed and that the nature of any correlation between adjacent terms is the same across all periods weakly (covariance) stationarity requires only that the mean and variance are constant across time, and the covariance just depends on the distance across time I weakly dependent time series: a stationary time series is weakly dependent if yt and yt+h are “almost independent” as h increases I if for a covariance stationary process Corr(xt , xt+h ) → 0 as h → ∞, we will say this covariance stationary process is weakly dependent. Martin / Zulehner: Introductory Econometrics 61 / 76 Stationarity Example: White Noise Let {yt } be a sequence of uncorrelated random variables, each with zero mean and variance σ 2. then, {yt } is stationary with ( σ2 if h = 0 γh = 0 6 0 if h = The sequence is referred to as white noise: yt ∼ WN 0, σ 2 IID Noise i.i.d. 0, σ 2 , it is strictly stationary. If {εt } is iid noise with zero mean, i.e., εt ∼ Ifor iid noise we assume each sample has the same probability distribution while, white noise samples could follow different probability distribution If E ε2t = σ 2 < ∞, it is also weakly stationary. Every IID 0, σ 2 noise is WN 0, σ 2 , but not every white noise is iid noise. Martin / Zulehner: Introductory Econometrics 62 / 76 Simulated White Noise Martin / Zulehner: Introductory Econometrics 63 / 76 Stationarity Example: Random Walk If {yt } is a random walk yt = yt−1 + εt , εt ∼ IID 0, σ 2 , then, E [yt ] = 0 and E yt2 = tσ 2 < ∞ for all t and for h ≥ 0 y1 = y0 + ε1 y2 = y1 + ε2 = (y0 + ε1 ) + ε2 ↓ yt = y0 + ε1 + ε2 +... + εt E (yt ) = E (y0 + ε1 + ε2 +... + εt ) = E (y0 ) = 0 Variance: var (yt ) = var (y0 ) + var (ε1 ) +... + var (εt ) var (Xt ) = 0 + σ 2 +... + σ 2 = tσ 2 γy (t + h, t) = Cov [yt+h , yt ] = Cov [yt + εt+1 + · · · + εt+h , yt ] = Cov [yt , yt ] = tσ 2. ⇒ Since γy (t + h, t) depends on t, the series {yt } is not stationary. Martin / Zulehner: Introductory Econometrics 64 / 76 ACF of a Random Walk Martin / Zulehner: Introductory Econometrics 65 / 76 Stationarity and the autocorrelation function To build a time series model for a homogenous non-stationary process we can difference the series until it is stationary and then construct a model on the differenced series. We can then make a forecast using the model and integrate (i.e. undifference) the model to arrive back at a forecast for the original series But how do we decide whether a series is non-stationary and how many times we need to difference a series? Plot the series: This may show signs of non-stationarity, such as a pronounced trend or the series may wander without a constant long-run mean or variance Plot the autocorrelation function (ACF) (i.e. a correlogram): For stationary series the ACF drops off as k, the number of lags, increases. This is not usually the case for non-stationary processes (i.e. the ACF doesn’t tend to zero as k becomes large) Martin / Zulehner: Introductory Econometrics 66 / 76 Ergodicity Definition: A weakly stationary times series {yt } is called ergodic for the expectation µ = E [yt ] if T 1 X p ȳ = yt → µ. T t=1 Sufficient condition: ∞ X |γh | < ∞ h=0 Definition: A weakly stationary process {yt } is called ergodic for the second moment if T 1 X p (yt − µ) (yt−h − µ) → γ(h) ∀h T −h t=h+1 Martin / Zulehner: Introductory Econometrics 67 / 76 MA(1) process and AR(1) process A moving average process of order one [MA(1)] can be characterized as one where xt = et + α1 et−1 , t = 1, 2,... with et being an iid sequence with mean 0 and variance σe2. I This is a stationary, weakly dependent sequence as variables 1 period apart are correlated, but 2 periods apart they are not. I ACF describes MA processes An autoregressive process of order one [AR(1)] can be characterized as one where yt = ρyt−1 + et , t = 1, 2,... with et being an iid sequence with mean 0 and variance σe2. I For this process to be weakly dependent, it must be the case that |ρ| < 1 I Corr (xt , xt+h ) = Cov (xt , xt+h )/(σy σy ) = ρh1 which becomes small as h increases. I PACF describes AR processes Martin / Zulehner: Introductory Econometrics 68 / 76 Example 3: AR(1) model Consider the AR(1) model, yt = β0 + β1 yt−1 + ut (1), where the error ut has a zero expected value, given all past values of y : E (ut | yt−1 , yt−2 ,...) = 0 (2) Combined, these two equations imply that E (yt | yt−1 , yt−2 ,...) = E (yt | yt−1 ) = β0 + β1 yt−1 (3) I This results means that, once y lagged one period has been controlled for, no further lags of y affect the expected value of yt. (This is where the name “first order” originates.) I Second, the relationship is assumed to be linear. Martin / Zulehner: Introductory Econometrics 69 / 76 Example 3: AR(1) model because xt contains only yt−1 , equation (2) implies that weak dependance holds the strict exogeneity assumption needed for unbiasedness does not hold. I Since the set of explanatory variables for all time periods includes all of the values on y except the last, (y0 , y1 ,... , yn−1 ), strict exogeneity requires that, for all t, ut is uncorrelated with each of y0 , y1 ,... , yn−1. This cannot be true. I In fact, because ut is uncorrelated with yt−1 under (2), ut and yt must be correlated. In fact, it is easily seen that Cov (yt , ut ) = Var (ut ) > 0. Therefore, a model with a lagged dependent variable cannot satisfy the strict exogeneity assumption. For the weak dependence condition to hold, we must assume that |β1 | < 1, as we will discuss in unit 20 I If this condition holds, then we can show that the OLS estimator from the regression of yt on yt−1 produces consistent estimators of β0 and β1. I β̂1 is biased, and this bias can be large if the sample size is small or if β1 is near 1. (For β1 near 1, β̂1 can have a severe downward bias.) In moderate to large samples, β̂1 should be a good estimator of β1. extention: AR(p) model with yt = β0 + β1 yt−1 +... + βp yt−p + ut Martin / Zulehner: Introductory Econometrics 70 / 76 Trends and seasonality revisited properties of times series that trend I a trending series can not be stationary, since the mean is changing over time I a trending series can be weakly dependent I if a series is weakly dependent and is stationary about its trend, we will call it a trend-stationary process detrending I trend smoothing with MA filter I exponential smoothing I polynominal trend fitting I trend elimination by differencing additionally we may want to eliminate seasonality Martin / Zulehner: Introductory Econometrics 71 / 76 Unemployment Rate in the US Martin / Zulehner: Introductory Econometrics 72 / 76 Seasonality and the Autocorrelation Function Seasonality is cyclical behaviour that occurs on a regular calendar basis. Sometimes this is easy to spot by plotting the series, but not always (particularly if the series varies a lot). Recognising seasonality can be made easier by considering the ACF. If a monthly time series exhibits annual seasonality for example, the data points in the series should show some degree of correlation with the corresponding data points that lead or lag by 12 months (i.e. we expect a correlation between yt and yt+12 , as well as between yt+12 and yt+24 , etc). These correlations show up in the ACF which will show peaks at k = 12, 24, 36,... Martin / Zulehner: Introductory Econometrics 73 / 76 Correlogram of Quarterly Unemployment Rate Data Martin / Zulehner: Introductory Econometrics 74 / 76 Efficient Markets Hypothesis (EMH) The EMH in a strict form states that information observable to the market prior to week t should not help to predict the return during week t. A simplification assumes in addition that only past returns are considered as relevant information to predict the return in week t.This implies that E ( return t | return t−1 , return t−2 ,...) = E ( return t ) A simple way to test the EMH is to specify an AR(1) model. Under the EMH assumption, weak dependance holds so that an OLS regression can be used to test whether this week’s returns depend on last week’s. returnt =.180 +.059 return tt−1 (.081) (.038) n = 689, R =.0035, R̄ 2 =.0020 2 Martin / Zulehner: Introductory Econometrics 75 / 76 Efficient Markets Hypothesis (EMH) In the previous example, using an AR(1) model to test the EMH might not detect correlation between weekly returns that are more than one week apart. It is easy to estimate models with more than one lag. For example, an autoregressive model of order two, or AR(2) model, is yt =β0 + β1 yt−1 + β2 yt−2 + ut E (ut | yt−1 , yt−2 ,...) = 0. There are stability conditions on β1 and β2 that are needed to ensure that the AR(2) process is weakly dependent, but this is not an issue here because the null hypothesis states that the EMH holds: H0 : β1 = β2 = 0 If we add the homoskedasticity assumption Var (ut | yt−1 , yt−2 ) = σ 2 , we can use a standard F statistic to test (11.18). If we estimate an AR(2) model for return , we obtain return \ t =.186 +.060 return t−1 −.038 return t−2 (0.081) (.038) (.038) n = 688, R 2 =.0048, R̄ 2 =.0019 Martin / Zulehner: Introductory Econometrics 76 / 76