Applied Econometrics ARIMA Models Lecture Handout 5 PDF

Applied Econometrics ARIMA Models Lecture handout 5 Autumn 2023 D.Sc. (Econ.) Elias Oikarinen Professor (Associate) of Economics University of Oulu, Oulu Business School 1 This Handout • Univariate time series modelling: • Autoregressive processes: AR models • Moving average processes: MA models • Autoregressive moving average processes: ARMA models • The Box-Jenkins estimation approach (last part of the handout): Read independently before going through the empirical exercises! • Additional readings marked with * • Brooks, Chapter 6, sections 6.3-6.8 • Enders, Chapter 2, sections 2-8 • Eviews UG II, Chapter 22 (POLL) 2 Memory Types in Time Series Processes • Long term (AR): the effect of a shock disappears gradually over time. • Short term (MA): the market immediately corrects itself after a shock. • Permanent (non-stationary): the effect of shocks does not disappear (at least totally) 3 1: AR(p) Model 4 AR(p) Model: Basic Idea • Autoregressive model (Yule, 1927) is the most traditional and a very common time series model • AR(p) model is based on the assumption that previous and current values of time series contain predictive power w.r.t. future values • For instance: this quarter’s housing price movements have predictive power w.r.t. next quarter’s housing price growth • If time series observations are independent of each other (i.e. there is no autocorrelation; series are white noise), previous observations yt-1, yt-2, … yt-i do not contain any information regarding yt  AR(p) model does not describe the DGP 5 AR(p) Model: Basic Idea What if there is a link between yt-1 and yt? Easy to see from a graph (S&P 500 price index – a non-stationary series): X-axis: t-1 Y-axis: t 6 AR(p) Model: Basic Idea Autocorrelated (of order 1) or not? (Helsinki quarterly housing price change) 0,15 0,1 t 0,05 0 -0,05 -0,1 -0,15 -0,15 -0,1 -0,05 0 0,05 0,1 0,15 t-1 7 AR(p) Model: Basic Idea Autocorrelated (of order 1) or not? (Neste share daily return) 0,1 0,05 0 -0,05 -0,1 -0,15 -0,15 POLL -0,1 -0,05 0 0,05 0,1 8 AR(p) Model: Basic Idea • In AR model, yt is predicted/explained by its own previous values • Error terms, , are assumed to be white noise, t  NID(0,2) (normally and independently distributed) • More flexible assumption: t  IID(0,2) (identically and independently distributed) • Basic form: • • • • • yt = c + 1yt-1 + 2yt-2 + … + pyt-p + t c = drift / constant i = parameter to be estimated yt is a stationary process p is the model lag length In capital market data, p is typically 1 or 2, max 3 (depends also on data frequency) 9 Seasonal Variation • Brooks, p. 204, Ch 10.2-10.3; Gujarati 3.6-3.7; Enders 2.11 • Possible time variation in yt can be captured by seasonal dummies: yt = c + 1yt-1 + 2yt-2 + … + pyt-p + S + t S is a vector of seasonal dummies  is a vector of parameters on the seasonal dummies • For instance • monthly inflation rate: 12-1=11 dummies (assuming that a constant is present) • Quarterly GDP growth: 4-1=3 dummies (assuming …) • In Brooks, p. 493, it is assumed that the model does not include a constant – hence there would be 4 seasonal dummies in quarterly data • In some cases, an observed autocorrelation in yt can be solely due to seasonal variation rather than being AR(p) process: yt = c +  S +  t 10 A Simple Example • Let’s assume that AR(1) model describes the data well (reflects well the DGP) yt = c + 1yt-1 + t • Parameter estimates are c = 0.02, 1 = 0.5 • Standard error of the residual = 0.015 • When yt-1 = 0.08, predicted yt: E(yt) = 0.02 + 0.5*0.08 = 0.06 • 6% is the point estimate (of the predicted value) of yt • 95% confidence interval for the predicted value = [0.03,0.09], i.e. 0.06  2*0.015 • Mean of the series () can be computed from the model coefficients: E(yt) =  = c / (1 – 1 – 2 – … – p) Here:  = c / (1 – 1) = 0.02 / 0.5 = 0.04 ( c!) (more precisely 0.06  1.96*0.015) 11 Reaction to a Shock • Here, 1> 0  value of y that is greater than the mean will often be followed by another value that is greater than the mean • Similarly, a value of y that is smaller than the mean will be followed by another value that is smaller than the mean • Negative and positive shocks (t) "throw" the series from one side of the mean value to the other (see the next slide for an example) • The result is a wandering series that moves around its mean • The reaction to a shock of 0.1 in period t: Mean = 4%, Coeff = 0.5 0,16 0,14 0.1 shock in time 1 0,12 0,1 0,08 0,06 0,04 0,02 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Time period 12 Example of AR(p) Process • Helsinki Metro Area real housing price change 13 Reaction to a Shock • The greater 1 is, the longer-lasting the effect of a shock 0,16 0,14 Coefficient 0.5 0,12 Coefficient 0.1 Coefficient 0.9 0,1 0,08 0,06 0,04 0,02 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Time period 14 Reaction to a Shock, 1< 0 • If 1< 0, a value of y that is greater than the mean will be followed by a value that is below the mean • A value of y that is smaller than the mean will be often followed by a value that is greater than the mean  The series bounces from one side of its mean to the other • This is also reflected in the autocorrelations • Reaction to a 0.1 shock: 0,15 Coefficient -0.9 Coefficient -0.1 0,1 Coefficient -0.5 0,05 0 -0,05 -0,1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Time period 15 Stationarity of AR Model • Enders, pp. 57-60 • AR(p) model also can be presented: yt – 1yt-1 – … – pyt-p = c + t • Stationary series: E(yt) = E(yt-1) =  <    – 1  – … – p  = c + E(t) • Since E(t) = 0: if c ≠ 0, AR model has a solution if 1 – ∑i ≠ 0, i.e. AR model is stationary if ∑i ≠ 1 • if ∑i = 1,  is not defined, e.g. AR(1): 16 Non-Stationary AR Process Simulated process: yt = 0.1 + yt-1 + t , =0.5 • • Observed series in left • Autocorrelation function in right 17 Explosive AR Process Simulated process: yt = 0 + 1.05*yt-1 + t , =0.5 • • Observed series in left • Autocorrelation function in right 18 *Yule-Walker Equations • Brooks, p. 262; Enders, pp. 62-64 • A group of equations, called Yule-Walker equations, can be used to solve for the autocorrelations / AR coefficients • Autocorrelations are expressed as functions of autoregressive coefficients 19 Partial Autocorrelation Function • The partial autocorrelation function, or PACF, measures the correlation between an observation k periods ago and the current observation, after controlling for observations at intermediate lags (i.e. all lags<k) • That is, the correlation between yt and yt−k, after removing the effects of yt−k+1, yt−k+2, . . . , yt−1 • For example, the PACF for lag 3 would measure the correlation between yt and yt−3 after controlling for the effects of yt−1 and yt−2 • For more formal explanation, see Enders, pp. 64-67 20 Partial Autocorrelation Function of AR(p) Process • Partial autocorrelations (PAC) help to assess how many lags an AR model should include 1. PAC = 1 in model yt = c + 1yt-1 + t , i.e. in AR(1) model 2. PAC = 2 in model yt = c + 1yt-1 + 2yt-2 + t , i.e. in AR(2) model • Generally: PAC at lag i is i from equation  PACF is computed by estimating models AR(1), AR(2)… and taking always the coefficient for the longest lag • PACs that are (statistically significantly) greater than zero indicate the lags to be included in the AR model 21 Theoretical ACFs and PACFs for AR Processes - Exmples Note: ACF and PACF computed from real life AR(q) process never reflect exactly the theoretical ones: with actual data, it is not this easy to detect the form of the process… 22 Empirical Example: Housing Price Change in Helsinki Metro Area • • First difference of log real housing price index, 1995Q1-2020Q2 Estimated with Maximum Likelihood: to be described later on Clearly stationary series (SIGMASQ = residual variance) 23 Empirical Example: Inflation rate (differenced CPI, seasonally unadjusted) Looks stationary. Note the notable positive PAC at the 4th lag: sign of seasonal variation. 24 Empirical Example: Inflation rate (differenced CPI) Note that the AR(4) coefficient differs somewhat from the corresponding PAC (0.499). Eviews correlogram provides an approximation, running the AR models gives the accurate ones; see Eviews UG I, pp. 422-423. SIGMASQ is the estimate of the residual variance – we do not focus on this information. 25 2: MA(q) Model 26 MA(q) Model: Basic Idea • Moving average model (Slutzky, 1927) also is a common time series model • In MA(q) model, right-hand side variables are lagged error terms (i.e. prediction errors), where q is the lag lenght: yt = + 1 t-1 + 2 t-2 + … + q t-q + t  = drift = mean of the time series i = parameter to be estimated from observed data yt is a stationary process t  NID(0,2) • Grounded on the idea that the process reacts to earlier prediction errors 27 MA(q) Model: Simple Example • Consider the MA(1) model: yt =  + 1 t-1 + t • If 1 > 0 •  t-1 = 0 : no prediction error (in the previous period)  E(yt) =  •  t-1 > 0 : prediction for yt-1 was too low (i.e. yt-1 was greater than expected)  E(yt) >  •  t-1 < 0 : prediction for yt-1 was too high (i.e. yt-1 was smaller than expected)  E(yt) <  • Too high prediction is corrected downwards, and too small prediction is corrected upwards • The shocks are not permanent  always a stationary process • If 1 < 0 •  t-1 > 0  E(yt) <  •  t-1 < 0  E(yt) >  28 Simulated Example, MA(1) Process      determines the value around which the time series vary over time 2 determines how large are the swings around  1 determines the autocorrelation in the series Below:  = 2, 1 = 0.9,  = 0.5 (1st order autocorrelation app. 0.5) 29 Simulated Example, MA(1) Process   = 2, 1 = -0.9,  = 0.5 (1st order autocorrelation app. -0.5) 30 Simulated Example, MA(1) Process   = 2, 1 = 0.1,  = 0.5 (1st order autocorrelation app. 0.2) 31 Example from True Data: MA(2)   Change in U.S. default risk premium, monthly frequency, sample period 2000m1 to 2010m11  = 0, 1 = 0.5, 2 = 0.1 ,  = 0.2 (1st order autocorrelation app. 0.3) 2,0 1,5 1,0 0,5 0,0 -0,5 -1,0 -1,5 -1,5 32 -1,0 -0,5 0,0 0,5 1,0 1,5 2,0 *Mean and Variance of MA Process  MA(1)  E(yt) = E( + 1 t-1) =  +1E( t-1) =   Var(yt) = E(yt – )2 = E(t2 + 122t-1) = (1 + 12 )2  MA(q)  E(yt) = E( + 1 t-1 + 2 t-2 + … + q t-q + t) =  +1E( t-1) + … + qE( t-q)+ E(t) =   Var(yt) = E(yt – )2 = E(t2 + 122t-1 + … + q22t-q) = (1 + 12 + … + q2)2 33 Autocorrelation of MA(q) process   In MA(1) process both yt and yt-1 are dependent on  t-1 : yt-1 =  + 1 t-2 + t-1 yt =  + 1 t-1 + t  Even if error terms are IID, the time series yt is autocorrelated yt is affected by t , t-1 , … , t-q ; yt-j by t-j , … , t-j-q  Since error terms are IID (i.e. independent = not autocorrelated), yt and yt-j are not correlated if j > q Autocorrelation of a MA(q) process drops to zero after q lags  VS. ACF of AR(p) process that decays slowly  34 Theoretical ACF and PACF for MA(1) Process  Note: ACF and PACF computed from real life MA(q) process never reflect exactly the theoretical ones: with actual data, it is not this easy to detect the form of the process… 35 Example from True Data, Cont’d from p. 32   Change in U.S. default risk premium, monthly frequency, sample period 2000m1 to 2010m11 Could well be MA(1), or AR(1) / AR(2) as well… 36 *Connection between AR and MA Models  AR(1) model has MA() presentation: yt = c + 1yt-1 + t yt-1 = c + 1yt-2 + t-1     Plug in yt-1 : yt = c + 1(c + 1yt-2 + t-1) + t = c + 1c + 12yt-2 + 1t-1 + t Plug in yt-2 , yt-3 etc. : yt = c + t + 1(c + t-1) + 12(c + t-2) + … = c + c1 + c12 + … + t + 1t-1 + 12t-2 + … = c / (1-1) + t + 1t-1 + 12t-2 + … t + 1t-1 + 12t-2 + … is MA() presentation, where j = j Due to this link, high-order MA model can often be replaced with lowerorder AR model – and typically more parsimonious models (i.e. models with smaller number of parameters to be estimated) are preferable 37 3: ARMA(p,q) Model 38 ARMA(p,q): Combination of AR(p) and MA(q)  Combine AR(p) MA(q) models  ARMA(p,q) model:  In practice, it is often difficult to distinguish between two (or more) almost similar models, e.g. between ARMA(1,2) or ARMA(2,1)  Lag length often depends on the data frequency  p and q tend to be 0, 1 or 2 for financial time series  Long lag length may be due to seasonal variation (e.g. lag length 4 with quarterly data) – by controlling for seasonal variation the actual lag length may be much shorter (recall the inflation rate example, pp. 24-25)  Structural changes in the data / DGP may also yield long lag lengths 39 ARIMA(p,d,q)   More generally: ARMA(p,q) is ARIMA(p,d,q) process / model  Autoregressive Integrated Moving Average  p = number of autoregressive parameters  q = number of moving average parameters  d = degree of differencing (to get the stationary process that is modelled) For instance, Helsinki housing prices (p. 23):  Price index: ARIMA(1,1,0)  Price change: ARIMA(1,0,0) = ARMA(1,0) = AR(1) 40 Theoretical ACF and PACF for ARMA process  Both ACF and PACF decay slowly: harder to spot the correct p and q  If it seems that pure AR or MA model is not sufficient, estimation can begin with an ARMA(1,1) model 41 Additional Issues  Other explanatory variables can be included in a ARIMA model: ARIMAX   May improve prediction accurancy More discussion on seasonal variation in ARMA models: Enders pp. 96-102 42 4: Box-Jenkins Procedure 43 Box-Jenkings Procedure to Estimate ARMA Models  Enders: pp. 76-79; Brooks pp. 273-276 EViews UG II, pp. 106-138  This handout: brief explanations of the procedure  Box-Jenkins procedure is commonly used in the estimation of ARMA models  1) Identification of model: inspect graph and ACF/PACF values to get insight on correct p and q. 2) Estimation of model: normally with maximum likelihood 3) Diagnostic checks  The goal is to find a model that  Is parsimonious: as few parameters / explanatory variables as possible  Reflects the DGP as well as possible  Produces white noise error terms 44 Identification  Usually log-transformed series applied  Graphical inspection of the time series   Missing observations, outliers  Structural breaks  Stationarity – differencing or removing a deterministic trend if necessary  ACF, PACF  Seasonal variation Formal tests  Ljung-Box test: autocorrelation  Unit root tests: stationarity  Tests for structural change (e.g. Chow test)  Tests for seasonal variation 45 Example: Finnish GDP Growth (seasonally adjusted; the data are included in Moodle week III exercise folder) Clear autocorrelation AR(3)? Or maybe ARMA(1,1)? 46 Estimation  Identification stage often suggests several alternative models  In estimation stage, each alternative model is estimated and inspected & compared to the other options  Maximum Likelihood estimation typically applied  ”Overdifferencing” not recommended (valuable information is lost): if the series is stationary, do not take the difference  If seasonal variation present, either  Include a SAR(p) / SMA(q) term  Add S-1 seasonal dummy variables when the model includes a constant term (where S is the number of periods per year, e.g. 4-1=3 for quarterly data); with Eviews, add text ”@expand(@quarter, @droplast)” in the estimation box  Model seasonally adjusted series (not recommended if the aim is to estimate a model for forecasting purposes) 47 Selecting the Best Model  Parsimony of the model (small number of parameters)  Statistical significance of parameters  Model fit, Adj. R2 can be compared between models (that have the same dependent variable!)  Information criteria  Out-of-sample forecast accuracy  Diagnostic checks 48 Parsimony  Key idea of the procedure  Aim is to reflect the DGP, and not to explain precisely the observed data  Do not maximize R2: it automatically increases with more variables in the model (while degrees of freedom gets smaller)  Parsimonious models typically produce more accurate forecasts (outof-sample)  Parameters should generally be statistically significant (sometimes insignificant ones need to be included to get rid of remaining autocorrelation in the residuals)  Single large autocorrelations at long lags are often just coincidence (due to e.g. a large shock, GFC for instance, during the sample period; e.g. GDP growth PAC at lag 12, p. 46) 49 Information Criteria  Following the parsimony principle, information criteria are the preferable model selection method  Akaike and Schwartz information criteria are commonly used  Akaike Information Criteria (AIC) AIC = T ln(ssr) + 2*reg T = number of observations ssr = sum of squared residuals reg = number of estimated parameters (inc. constant)  Schwartz Bayesian Information Criteria (SBC, SC, SIC) SBC = T ln(ssr) + ln(T)*reg  Smaller information criteria values are better  SBC punishes more for additional paramaters than AIC (”punishment”: ln(T) vs. 2)  parameters selected by SBC ≤ by AIC  If degrees of freedom is relatively small, SBC is preferred in case SBC and AIC would select a different model (as they often do!) 50 Information Criteria  IMPORTANT: when comparing the models, one should always use one and the same sample period and fixed T in the estimations  Eviews does that automatically, if explanatory variables are written as AR(p) / MA(q) Instead, if you do not use AR / MA notation, this is more complicated: Here AIC / SBC cannot Be compared between the two models  51 GDP Growth Example Cont’d  AIC prefers AR(3), while SBC selects ARMA(1,1): AR(3) and ARMA(1,1) processes are pretty similar… 52 Diagnostic Checks  If the model describes the DGP well (is ”well specified”), error term is white noise, IID  Diagnostic checks aim to detect whether the IID assumption holds, and the model reflects DGP well  Both visual inspection of graphs and formal tests 53 Visual Inspection  Outlier observations / residuals  Does the model work better in some sub-samples and worse in some other?  Do the residuals indicate structural change  Do the residuals seem autocorrelated  Seasonal variation in residuals?  Does the variance of residuals vary notably over time (i.e. are the signs of volatility clustering – heteroscedasticity)? 54 Autocorrelation  Testing for autocorrelation is particularly important: Evidence of residual autocorrelation implies that the model does does not cater for some systematic process in the time series  The model should have non-autocorrelated (i.e. independent) error terms  Autocorrelated series is not IID, and the parameter std. errors are not reliable  ACF & Ljung-Box Q-test  Durbin-Watson test is not suitable for models that include own lags of the dependent variable (i.e. for ARIMA models)  If not a ”pure” ARMA model, i.e. if the model includes some other stochastic variables as well, also e.g. LM test (Lagrange Multiplier test, also called Breusch-Godfrey test) 55 GDP Growth Example Cont’d  Residuals: ARMA(1,1) in left, ARMA(3,0) in right  No significant residual autocorrelation observed 56 Residual Normality  Often NID error term assumed: normal distribution assumption  In many cases, especially with high frequency financial time series, error term has fat tails (i.e. kurtosis, 4th moment)  Visual inspection: Comparing with the theoretical normal distribution    Histogram Q-Q plot Formal testing: Jarque-Bera test is (most) often applied   Tests whether both skewness and (excess) kurtosis are zero Many other formal tests exist 57 GDP Growth Example Cont’d  ARMA(1,1) in left, ARMA(3,0) in right  Normality assumption rejected Outlier due to GFC [both in AR(3) and ARMA(1,1)] 58 Residual Normality  If the error term is not normally distributed     Confidence intervals of parameter estimates, and therefore the pvalues, are not reliable Not one unique approach to ”act” Does not generally cause notable complications if the number of observations is large If non-normality is due to e.g. one or two outliers, point dummy variables can be included in the model to control for the corresponding observations   One should be careful not to remove important information from the data, though The influence of one time unique shocks, such as 11 Sep 2001, can be removed / controlled  Students t-distribution can be used instead of normal distribution to compute the confidence intervals: in t-distribution, the probability of extreme observations is greater (i.e. captures the possible flat tails, i.e. excess kurtosis, better)  May imply ARCH effect (heteroscedasticity) in the time series 59 GDP Growth Example Cont’d Outlier due to GFC [both in AR(3) and ARMA(1,1)] 60 GDP Growth Example Cont’d  Include point dummy variable taking value of zero except for being 1 for 2009Q1 to capture the outlier due to GFC  Both AIC and SBC now prefer AR(3) (and including the dummy)  Normality assumption accepted now 61 Residual Homoscedasticity vs. Heteroskedasticity  Basic assumption is a homoscedastic residual series, i.e. Var(t) = 2 = constant over time  In practise, many time series (especially financial time series) are often heteroscedastic, in which case t from ARMA(p,q) model is heteroskedastic  Clustered / autocorrelated volatility: A large error term (in absolute value) is followed by a large error term (in absolute value)  Time series has a time dependent conditional variance  The long-term (unconditional) variance can be constant though, i.e. a stationary time series can be heteroscedastic  To be discussed more in the GARCH section of the course! 62 Residual Homoscedasticity vs. Heteroskedasticity  Due to heteroscedasticity, the estimated parameters standard errors are not reliable  t- & p-values are not reliable  Visual inspection: do the residuals seem heteroscedastic  Formal test of whether the squared error term, t2, is autocorrelated     ARCH-test Q-testi Null hypothesis: no autocorrelation in t2; if rejected, evidence of heteroscedasticity If error term is heteroscedastic, a GARCH model can be estimated to capture for it 63 GDP Growth Example Cont’d  The hypothesis of no autocorrelation in squared residuals (i.e. of homoscedastic residuals) is clearly accepted 64 Residual Homoscedasticity vs. Heteroskedasticity  Newey-West HAC estimator (heteroscedasticity and autocorrelation consistent estimator)  If autocorrelation and heteroscedasticity cannot be removed from the residuals, the HAC estimator should be used to compute parameter standard errors reliably  ”Newey-West standard errors”  In pure ARMA models, autocorrelation in t should not be present (as otherwise there is predictability in the modelled series that is not captured by the model), but there are other kinds of time series regressions where t may be autocorrelated  White (1980) covariance estimates, too, provide heteroscedasticity consistent standard errors 65 Structural Change  If the number of observations is large enough, the time series can be divided into two (or more) subsamples and estimate similar ARMA model separately for both subsamples  The aim is then to investigate, whether the model parameters differ between the subsamples; F-test can be applied: RSS = RSS for the model that is based on the whole sample period RSS1 = RSS for the 1st subsample model RSS2 = RSS for the 2nd subsample model RSS – RSS1 – RSS2 is small, if there is no structural change in the parameters  This is the Chow Breakpoint test in Eviews: View-Stability Diagnostics-Chow Breakpoint test 66 GDP Growth Example Cont’d  The hypothesis of no structural break for the AR(3) model over 1990-2020 is accepted (assumed break approximately in the middle of the sample period, i.e. 2005Q1):  However, clearly there has been a structural change during the whole sample period of 1960-2020 (so it does not make sense to estimate an ARMA model for this whole sample period): 67

Applied Econometrics ARIMA Models Lecture Handout 5 PDF

Document Details

Tags

Related

Summary

Full Transcript