Time Series Regression PDF

TIME SERIES REGRESSION INTRODUCTION  Model that relate the dependent variable, yt, to functions of time.  Used when the parameters describing the time series to be forecast remain constant over time. i.e. i) If a time series exhibits a linear trend, then the slope of trend line remains constant over time. ii) If a time series can be described by using monthly seasonal parameters, then the seasonal parameter for each month remain same from one year to the next. MODELING TREND USING POLYNOMIAL FUNCTION  A time series yt can be describe by using a trend model, which can be written as yt = TR t +  t where yt = the value of the time series in period t TR t = the trend in time period t  t = the error term in time period t  This model says that the time series, yt, can be represent by an average level, t = TR t , and the error term,  t.  The error term is a random fluctuation that come from the deviation between yt values and average level, μt. SOME COMMON TREND  No trend: Implies there is no long run growth or decline in the time series over time, TR t =  0  Linear trend: Implies there is straight-line long run growth or decline, TR t = 0 + 1t  Quadratic trend: Implies there is quadratic long run change (growth or decline at an increasing or decreasing rate), TR t =  0 + 1t +  2t 2 yt = 1 yt = 1 + t yt = 1 − t yt = 1 + t + t 2 yt = 100t − t 2 yt = 1 + t − t 2 yt = −100t − t 2  p-th order polynomial trend: Indicate one or more reversal in curvature, TR t = 0 + 1t + 2t 2 + +  pt p  The parameters can be estimated by using the regression techniques (i,.e. least squares method).  Error term  t will be assumed satisfies the constant variance, independence & normality assumptions. No Trend EXAMPLE 6.1  The bay City Seafood Company owns a fleet of fishing trawler and operates a fish processing plant. In order to forecast its minimum and maximum possible revenues from cod sales and to plan the operations of its fish processing plant, the company desires to make both point forecast and prediction interval forecast of its monthly cod catch (measured in tons). The company has recorded the monthly cod catch for the previous two years (years 1 and 2). The cod catch history is given in Table 6.1. When these data are plotted, they appear to fluctuate randomly around a constant average level. Since the company subjectively believes that this data pattern will continue in the future, it seems reasonable to use the regression model yt = TR t +  t = 0 +  t to forecast the cod catch in the future months.  Least squares point estimate of β0 is: ˆ y1 + y2 + + y24 326 + 381 + + 365 0 = y = = = 351.29 24 24  Thus, the point prediction of the cod catch (yt) in any future month is: yˆt = y = 351.29  And, the 100(1-α)% prediction interval for yt is: (𝑇−1) 1 𝑦ǉ ± 𝑡[𝛼/2] 𝑠 1+ 𝑇 For a 95% prediction interval: (𝑇−1) (24−1) (23) 𝑡[𝛼/2] = 𝑡[0.05/2] = 𝑡[0.025] = 2.069 σ𝑇𝑡=1 𝑦𝑡 − 𝑦ǉ 2 𝑠= 𝑇−1 326 − 351.29 2 + 381 − 351.29 2 + ⋯ + 365 − −351.29 2 = 24 − 1 = 33.82 (𝑇−1) 1 1 𝑦ǉ ± 𝑡[𝛼/2] 𝑠 1+ = 351.29 ± 2.09 33.82 1+ 𝑇 24 = 279.88,422.71 install.packages("remotes") remotes::install_github("robjhyndman/fpp3-package") library(fpp3) y |t|) (Intercept) 351.292 6.904 50.88 % ggplot(aes(x = t)) + geom_line(aes(y = cod_catch, colour = "Observation")) + geom_line(aes(y =.fitted, colour = "Fitted")) + xlab("Time(Month)") + ylab(NULL) + ggtitle("Cod Catch (in Tons)") + guides(colour=guide_legend(title=NULL)) fc_y % autoplot(y) + ggtitle("Forecast of Cod Catch") + xlab("Time (Month") + ylab(NULL) Standard error of the regression Prediction intervals Linear Trend EXAMPLE 6.2  For the past two years, Smith’s Department Stores, Inc., has carried a new type of electronic calculator called Bismark X-12. Sales of this calculator have generally been increasing over these two years. Smith.s uses an inventory policy that attempts to ensure that its stores will have enough Bismark X-12 calculators to meet practically all of the demand for the Bismark X-12, while ensuring that Smith’s does not needlessly tie up its money by ordering many more calculators than it can reasonably expect to sell. In order to implement this inventory policy in future months, Smith’s requires both point predictions and prediction intervals for total monthly Bismark X-12.demand. Month 2012 2013 Jan 197 296 Feb 211 276 Mac 203 305 Apr 247 308 May 239 356 Jun 269 393 Jul 308 363 Aug 262 386 Sep 258 443 Oct 256 308 Nov 261 358 Dec 288 384  The monthly calculator demand data for the past two years are given in the table.  The time series plot for data are found to fluctuate randomly around an average level that increase over time in a linear fashion.  Smith’s believes that this trend will continue for at least the next two years.  Thus, it is reasonable to use the regression model: yt = TR t +  t = 0 + 1t +  t to forecast calculator sales in future months.  Least squares point estimate of β1 and β0 is:  24  24    t   yt  tyt −  t =1  t =1  24 SSty  24 1 = b1 = = t =1 2 = 8.07435 SStt  24    t  t =1  24  t =1 t 2 − 24 and 24 24 y t t ˆ0 = y − b1t = t =1 − b1 t =1 = 198.02899 24 24  Thus, the point forecast in the future can be obtain by: yˆt = b0 + b1t = 198.02899 + 8.07435t  And the prediction interval is: (𝑇−2) 1 𝑇 + ℎ − 𝑡ǉ 2 𝑦ො𝑡+ℎ ± 𝑡[𝛼/2] 𝑠 1+ + 24 𝑇 σ𝑡=1 𝑡 − 𝑡ǉ 2 i.e. point forecast of Bismark X-12 demand in January of year 3 are: yˆ25 = 198.02899 + 8.07435 ( 25) = 399.9 and 95% prediction interval for y25 is:    ( − ) 2   yˆ 25  t[0.05/ (24 − 2)   1 25 t  = 328.6, 471.2 2] s 1 +   + 24   24    ( − ) 2  t t   t =1  ###Example 2### calc = tsibble(t=1:24, sales=c(197,211,203,247,239,269,308,262,258,256,261,288, 296,276,305,308,356,393,363,386,443,308,358,384), index = t) autoplot(calc, sales) + xlab("Time (Monthly)") + ylab("Sales") + autolayer(calc, mean(sales), color="red") fit_calc % model(TSLM(sales ~ trend())) report(fit_calc) Series: sales Model: TSLM Residuals: Min 1Q Median 3Q Max -67.665 -19.227 -6.958 17.682 75.410 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 198.0290 13.3444 14.840 6.10e-13 *** trend() 8.0743 0.9339 8.646 1.59e-08 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 31.67 on 22 degrees of freedom Multiple R-squared: 0.7726, Adjusted R-squared: 0.7623 F-statistic: 74.75 on 1 and 22 DF, p-value: 1.5893e-08 augment(fit_calc) %>% ggplot(aes(x = t)) + geom_line(aes(y = sales, colour = "Observation")) + geom_line(aes(y =.fitted, colour = "Fitted")) + xlab("Time(Monthly)") + ylab(NULL) + ggtitle("Calculator Sales") + guides(colour=guide_legend(title=NULL)) fc_calc % autoplot(calc) + ggtitle("Forecast of Calculator Sales") + xlab("Time (Month") + ylab(NULL) Quadratic Trend Example: The State University Credit Union to predict monthly loan request in future months. Month Year 1 Year 2 Y Jan 297 808 1,200 Feb 249 809 Mar 340 867 1,000 Apr 406 855 May 464 965 800 Jun 481 921 Jul 549 956 600 Aug 553 990 Sep 556 1019 400 Oct 642 1021 Nov 670 1033 200 2 4 6 8 10 12 14 16 18 20 22 24 Dec 712 1127 The plot suggests that loan requests for the past two years tend to fluctuate around an average level that increases at a decreasing rate over time. Therefore, the regression model 𝑦𝑡 = 𝑇𝑅𝑡 + 𝜀𝑡 = 𝛽0 + 𝛽1 𝑡 + 𝛽2 𝑡 2 + 𝜀𝑡 is reasonable. Loan |t|) (Intercept) 199.6196 20.8480 9.575 4.12e-09 *** t 50.9366 3.8424 13.256 1.14e-11 *** t2 -0.5677 0.1492 -3.805 0.00104 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 31.25 on 21 degrees of freedom Multiple R-squared: 0.9871, Adjusted R-squared: 0.9859 F-statistic: 802.3 on 2 and 21 DF, p-value: < 2.22e-16 The point forecast of future loans request 𝑦𝑡 is 𝑦ො𝑡 = 𝑏0 + 𝑏1 𝑡 + 𝑏2 𝑡 2 = 199.6196 + 50.9366𝑡 − 0.5677𝑡 2 It can be shown that the 95% prediction intervals for 𝑦25 and 𝑦26 are [1040.09,1196.32] and [1057.70, 1222.68]. augment(fit.loan) %>% ggplot(aes(x = t)) + geom_line(aes(y = req, colour = "Data")) + geom_line(aes(y =.fitted, colour = "Fitted")) + xlab("Time (Monthly") + ylab(NULL) + ggtitle("Monthly Loan Request") + guides(colour=guide_legend(title=NULL)) future_loan % mutate(t=25:30, t2=(25:30)^2) fc.loan % autoplot(req) + autolayer(fc.loan) ggtitle("Forecast of Loan Request") + xlab("Time (Month") + ylab(NULL) Model Evaluations  The validity of the regression model requires the independent assumption to be satisfies.  For time series data, this assumption is always violated.  It is quite common for the time-ordered error terms to be autocorrelated.  Positive autocorrelation: If a positive/negative error term in time period t tends to be followed by another positive/negative term error in time period t+k.  Negative autocorrelation: If a positive/negative error term in time period t tends to be followed by another negative/positive term error in time period t+k. positive correlation negative correlation The independence assumption says that the time- ordered error terms display no positive or negative autocorrelation. This says that the error terms occur in a random pattern over time. Example: the residual plot of the loan request example. fit.loan %>% gg_tsresiduals() The plot does not exhibit a well-defined cyclical pattern or a well-defined alternating pattern. Therefore we conclude that little or no autocorrelation exists and that the independence assumption holds.  The exixtence of +ve or –ve autocorrelation violated the independence assumptions.  The residual plot against time can be used to to detect the violations of independence assumption.  Another way is to look at the signs of the time- ordered residuals.  However, the ‘best’ way is to use a formal statistical test: Durbin-Watson test. FIRST-ORDER AUTOCORRELATION  One type of +ve or –ve autocorrelation is called first-order autocorrelation.  The error term in time period t is related to the error term in time period t-1, which can be written as:  t =  t −1 + at  where, φ is the correlation coefficient between error term separated by one time period.  a1, a2, …. an are values randomly & independently selected from normal distribution with mean 0, and variance independent of time.  For this types of autocorrelation, Durbin-Watson test can be used to determine +ve or –ve first-order autocorrelation DURBIN WATSON TEST  Case 1: Positive Autocorrelation (1st order)  Hypothesis: H 0 : the error terms are not autocorrelated H1 : the error terms are positively autocorrelated  The Durbin-Watson statistic is σ𝑇𝑡=2 𝑒𝑡 − 𝑒𝑡−1 2 𝑑= σ𝑇𝑡=1 𝑒𝑡 2  Where e1, e2,…. ,en are the time-ordered residuals. If α is a type I error, then Durbin & Watson have been determine a point dU,α and dL,α such that: i) If d < dL,α , reject H0 ii) If d > dU,α , do not reject H0 iii) If dL,α ≤d ≤ dU,α , the test is inclusive. *Refer to Tables A5 and A6 for values of dL,α and dU,α.  Case 2: Negative Autocorrelation (1st order)  Hypothesis: H : the error terms are not autocorrelated 0 H1 : the error terms are negatively autocorrelated  If α is a type I error, then Durbin & Watson have been determine a point dU,α and dL,α such that: i) If (4 - d) < dL,α , reject H0 ii) If (4 - d) > dU,α , do not reject H0 iii) If dL,α ≤ (4 - d) ≤ dU,α , the test is inclusive.  Case 3: Positive or Negative Autocorrelation (1st order)  Hypothesis H 0 : the error terms are not autocorrelated H1 : the error terms are positively or negatively autocorrelated  If α is a type I error, then Durbin & Watson have been determine a point dU,α and dL,α such that: i) If d < dL,α or (4 - d) < dL,α , reject H0 ii) If d > dU,α or (4 - d) > dU,α , do not reject H0 iii) If dL,α ≤d ≤ dU,α or dL,α ≤ (4 - d) ≤ dU,α , the test is inclusive. EXAMPLE 6.5  Consider the calculator sales analysis of Example 6.2. The linear trend model of this example yields the prediction equation yˆt = 198.02899 + 8.07435t  The residuals obtained using this model are: et = yt − yˆt = yt − (198.02899 + 8.07435t )  Thus, we have: e1 = y1 − yˆ1 = 197 − (198.02899 + 8.07435 (1) ) = −9.1033 e2 = y2 − yˆ 2 = 211 − (198.02899 + 8.07435 ( 2 ) ) = −3.1777 e23 = y23 − yˆ 23 = 358 − (198.02899 + 8.07435(23)) = −25.7390 e24 = y24 − yˆ 24 = 384 − (198.02899 + 8.07435 ( 24 ) ) = −7.8133  H0 = The error term are not autocorrelated  H1 = The error term are positively autocorrelated  The Durbin-Watson statistic is: 24  (e − e ) 2 t t −1 d= t =2 24 e t =2 t 2 ( −3.177 − (−9.1033) ) + ( −19.2520 − (−3.177) ) + + ( −7.8133 − (−25.7390) ) 2 2 2 = (−9.1033) 2 + (−3.177) 2 + + ( −7.8133) = 1.682  To test for +ve autocorrelation at α=0.05, use Table A5:  We have dL,0.05=1.27 and dU,0.05=1.45.  Since d=1.682> dU,0.05=1.45, we do not reject H0.  There is no evidence of +ve autocorrelation. Critical Values for the Durbin-Watson Statistic (d) Level of Significance  =.05 k = l k = 2 k = 3 k = 4 k = 5 n dL dU dL dU dL dU dL dU dL dU 6 0.61 1.40 7 0.70 1.36 0.47 1.90 8 0.76 1.33 0.56 1.78 0.37 2.29 9 0.82 1.32 0.63 1.70 0.46 2.13 0.30 2.59 10 0.88 1.32 0.70 1.64 0.53 2.02 0.38 2.41 0.24 2.82 11 0.93 1.32 0.66 1.60 0.60 1.93 0.44 2.28 0.32 2.65 12 0.97 1.33 0.81 1.58 0.66 1.86 0.51 2.18 0.38 2.51 13 1.01 1.34 0.86 1.56 0.72 1.82 0.57 2.09 0.45 2.39 14 1.05 1.35 0.91 1.55 0.77 1.78 0.63 2.03 0.51 2.30 15 1.08 1.36 0.95 1.54 0.82 1.75 0.69 1.97 0.56 2.21 16 1.10 1.37 0.98 1.54 0.86 1.73 0.74 1.93 0.62 2.15 17 1.13 1.38 1.02 1.54 0.90 1.71 0.78 1.90 0.67 2.10 18 1.16 1.39 1.05 1.53 0.93 1.69 0.92 1.87 0.71 2.06 19 1.18 1.4 1.08 1.53 0.97 1.68 0.86 1.85 0.75 2.02 20 1.20 1.41 1.10 1.54 1.00 1.68 0.90 1.83 0.79 1.99 21 1.22 1.42 1.13 1.54 1.03 1.67 0.93 1.81 0.83 1.96 22 1.24 1.43 1.15 1.54 1.05 1.66 0.96 1.80 0.96 1.94 23 1.26 1.44 1.17 1.54 1.08 1.66 0.99 1.79 0.90 1.92 24 1.27 1.45 1.19 1.55 1.10 1.66 1.01 1.78 0.93 1.90 25 1.29 1.45 1.21 1.55 1.12 1.66 1.04 1.77 0.95 1.89 26 1.30 1.46 1.22 1.55 1.14 1.65 1.06 1.76 0.98 1.88 27 1.32 1.47 1.24 1.56 1.16 1.65 1.08 1.76 1.01 1.86 28 1.33 1.48 1.26 1.56 1.18 1.65 1.10 1.75 1.03 1.85 29 1.34 1.48 1.27 1.56 1.20 1.65 1.12 1.74 1.05 1.84 30 1.35 1.49 1.28 1.57 1.21 1.65 1.14 1.74 1.07 1.83 31 1.36 1.50 1.30 1.57 1.23 1.65 1.16 1.74 1.09 1.83 32 1.37 1.50 1.31 1.57 1.24 1.65 1.18 1.73 1.11 1.82 33 1.38 1.51 1.32 1.58 1.26 1.65 1.19 1.73 1.13 1.81 Residual vs Fitted Plots  The plot should also show no pattern, otherwise, there may be ‘heteroscedasticity’ in the errors – variance of the residuals are not constant.  If occur, transformation of the forecast variable (i.e. log or square root) may be required. augment(fit.loan) %>% ggplot(aes(x=.fitted, y=.resid)) + geom_point() + labs(x = "Fitted", y="Residuals") TYPES OF SEASONAL VARIATION  Sometimes , there exist a time series data that exhibits a seasonal variation.  2 types of seasonal variation: i) Constant seasonal variation - Magnitude of seasonal swing does not depend on the level of time series. i) Increasing seasonal variation - Magnitude of seasonal swing are depend on the level of time series. Y_QUARTIC Y 6.0 1,200 5.8 1,100 1,000 5.6 900 5.4 800 5.2 700 5.0 600 4.8 500 4.6 400 25 50 75 100 125 150 25 50 75 100 125 150 TRANSFORMATION OF THE DATA  When time series displays increasing seasonal variation, it is common practice to apply transformation of the form y* = yt where 0    1 to produce a time series with a constant seasonal variation.  i.e. 1 i) Square root transformation: y* = yt 2 Quartic root transformation: y* = yt 0.25 ii) iii) Natural logarithm transformation: y* = ln yt Example: Traveler’s Rest to obtain short-term forecast of the number of occupied rooms in the hotels. Data: monthly average of occupied rooms for the previous 15 years Y Y_SQRT 1,200 34 1,100 32 1,000 30 900 28 800 26 700 24 600 500 22 400 20 25 50 75 100 125 150 25 50 75 100 125 150 Y_QUARTIC Y_LOG 6.0 7.2 5.8 7.0 5.6 6.8 5.4 6.6 5.2 6.4 5.0 6.2 4.8 4.6 6.0 25 50 75 100 125 150 25 50 75 100 125 150 The plot suggests that square root transformation is not strong enough to equalize the seasonal variation. The quartic root and the log transformation seem to produce a transformed series with constant seasonal variation. Caution: The log transformation may be over-transforming the data (note a slight funneling in appearance in the plot after t =120 or so) MODELING SEASONAL VARIATION BY USING DUMMY VARIABLES  To analyzing a time series with a constant seasonal variation, a common model is given by yt = TR t + SNt +  t where yt = observed time series value at time period t TR t = trend at time period t SNt = seasonal factor at time period t  t = error term at time period t  Assumption for  t - constant variance, independence, normality.  To model seasonal patterns, we need to use dummy variables.  Assuming that there are L seasons, then the seasonal factor can be expressed using dummy variables such as: SN t =  s1 xs1,t +  s 2 xs 2,t + +  s( L −1) xs ( L −1),t where xs1,t , xs 2,t ,..., xs( L −1),t are dummy variables: 1 if time period t is season 1 xs1,t =  0 otherwise 1 if time period t is season 2 xs 2,t =  0 otherwise 1 if time period t is season L -1 xs( L −1),t = 0 otherwise EXAMPLE 6.6  Data for Traveler’s Rest, Inc. hotel in midwestern city.  The plot of the data exhibit the amount of seasonal variation is increasing with the level of the time series.  The transformation has been applied in order to obtain a transformed series that displays constant seasonal variation (y*t=ln (yt)). 1. Using the log transformation, we consider the following model: 𝑦𝑡∗ = 𝑇𝑅𝑡 + 𝑆𝑁𝑡 + 𝜀𝑡 = 𝛽0 + 𝛽1 𝑡 + 𝛽2 𝑀1 + ⋯ + 𝛽12 𝑀11 + 𝜀𝑡 where 𝑦𝑡∗ = ln 𝑦𝑡 and M1-M11 are dummy variables. For example 1 if period 𝑡 is January 𝑀1 = ቊ 0 otherwise library(fpp3) hotel % ggplot(aes(x = t)) + geom_line(aes(y = ln_y, colour = "Observation")) + geom_line(aes(y =.fitted, colour = "Fitted")) + xlab("Time(Month)") + ylab("ln(Room Averages)") + ggtitle("Hotel Room Averages") + guides(colour=guide_legend(title=NULL)) yˆt = b0 + b1t + b2 M 1 + b3 M 2 + + b12 M 11 = 6.288 + 0.002725t − 0.04161M 1 − 0.1121M 2 + − 0.1122M 11 where 1 if t in January M1 =  0 if t is not in January 1 if t in November M 11 =  0 if t is not in November The output shows that the model is significant (p-value of the F-test is 0.0000) and that the linear trend and each of the seasonal dummy variables are significant. ∗ For example, a point forecast of 𝑦169 is ∗ 𝑦169 = 𝑏0 + 𝑏1 169 + 𝑏2 1 = 6.7065 Therefore, a point prediction of 𝑦169 is 𝑦ො169 = 𝑒 6.7065 = 817.70 Confidence Interval at time t is ˆ   yt  t /2 se xt ( X X ) xt  (T −( k +1) ) T T −1 where k is the number of independent variable and the standard error is, σ𝑇𝑡=1 𝑦𝑡 − 𝑦ො 2 𝑠𝑒 = 𝑇 − (𝑘 + 1) Prediction Interval at time T+h: ˆ  (T −( k +1) ) ( ) −1  yT + h  t  se 1 + xT T + h X T X xT +h   /2 where h is the time step and T is the length of the time series. For example, time series regression with:  linear trend, k=1.  quadratic trend, k=2.  linear trend and seasonality component, where the seasonal period is L, k=1+(L-1) yˆ169 * = 6.7065   (168−(12 +1) ) ( ) −1  6.7065  t0.05/2 0.02119 1 + x T T +h X T X xT +h   t155 0.025 = 1.975387 xTT+ h = 1 169 1 0 0 0 0 0 0 0 0 0 0  1 1 1 0 1 2 0 0   X = 1 3 0 0     1 168 0 0  ∗ The 95% prediction interval for 𝑦169 is [6.6628, 6.7503]. It follows that 95% prediction interval for 𝑦169 is 𝑒 6..6628 , 𝑒 6.7503 =[782.74, 854.32]. fc_lnhotel % autoplot(ln_hotel) + ggtitle("Forecast of Hotel Room Averages") + xlab("Time (Month") + ylab(NULL) However, it can be shown that Durbin-Watson test indicates that there is autocorrelation problem. fit_lnhotel %>% gg_tsresiduals() fit_trends % model( linear = TSLM(y ~ trend() + season(period=12)), exponential = TSLM(log(y) ~ trend() + season(period=12)), squareroot = TSLM(sqrt(y) ~ trend() + season(period=12)), quarticroot = TSLM(y^(1/4) ~ trend() + season(period=12)) ) fc_trends % forecast(h=10) hotel %>% autoplot(y) + geom_line(aes(y=.fitted, colour=.model), data = fitted(fit_trends)) + autolayer(fc_trends, alpha = 0.5, level = 95) + xlab("time (Monthly)") + ylab("Room Averages") + ggtitle("Monthly Hotel Room Averages") + guides(colour=guide_legend(title="Model")) report(select(fit_trends, linear)) MODELING THE SEASONAL VARIATION BY USING A TRIGONOMETRIC FUNCTION  Regression model that involves trigonometric term can be used to model time series exhibiting constant or increasing seasonal variation.  Two common models:  2 t   2 t  yt =  0 + 1t +  2 sin  +  3  cos   + t (1)  L   L   2 t   2 t   4 t   4 t  yt =  0 + 1t +  2 sin  +  3  cos   +  4 sin  +  5  cos   + t ( 2)  L   L   L   L   Where L is the number of seasons in a year.  These model assumes a linear trend, but they can be altered to handle other trends.  Model (1) is useful in modeling a very regular seasonal time series that exhibit constant seasonal variation.  Model (2) is useful in modeling a time series that exhibit constant seasonal variation but exhibit more complicated seasonal pattern. Example: For the hotel example, we consider the trigonometric regression model 2𝜋𝑡 2𝜋𝑡 4𝜋𝑡 𝑦𝑡∗ = 𝛽0 + 𝛽1 𝑡 + 𝛽2 𝑠𝑖𝑛 + 𝛽3 𝑐𝑜𝑠 + 𝛽4 𝑠𝑖𝑛 + 12 12 12 4𝜋𝑡 𝛽5 𝑐𝑜𝑠 + 𝜀𝑡 12 where 𝑦𝑡∗ = ln 𝑦𝑡. max(K)=period/2 fit_fourierhotel % model(TSLM(log(y) ~ trend() + fourier(period=12,K=2))) report(fit_fourierhotel) augment(fit_fourierhotel) %>% ggplot(aes(x = t)) + geom_line(aes(y = y, colour = "Observation")) + geom_line(aes(y =.fitted, colour = "Fitted")) + xlab("Time(Month)") + ylab("Room Averages") + ggtitle("Hotel Room Averages") + guides(colour=guide_legend(title=NULL)) Trigonometric time series models sometimes give useful predictions. However, the author generally consider dummy variable regression to be superior to trigonometric models for modeling seasonal variation. This is because dummy variable models use a different parameter to model the effect of each different season in a year. The value of Durbin-Watson statistic, d = 2.62. At.05 level, obviously 𝑑 > 𝑑𝑈,.05 (since 1.80 < 𝑑𝑈,.05 < 1.82) Therefore, we can not reject H0: The error terms are not autocorrelated, and conclude that there is no evidence of positive or negative (first-order) autocorrelation among the errors. GROWTH CURVE MODEL  All models that has been presented describe trend and seasonal effect using deterministic function of time that are linear in the parameters 0 , 1 , 3 , , 12  i.e. yt = TRt + SNt +  t = 0 + 1t +  2 M1 + 3 M 2 + + 12 M11 +  t  This means that each term in the model is the product of a model and a numeric value determined by the observed data.  However, some models are not linear in the parameters.  Growth curve model given as yt =  0 ( 1t )  t is one of the common model that are not linear in the parameters.  This model is not linear in the parameter since the independent variable t enters as an exponent and  t β0 is multiplied by 1.  See Figure 6.21.  To apply the techniques of estimation and prediction, the model must be transformed to a linear model.  This can done by taking logarithms on both sides: log ( yt ) = log ( 0 ) + log ( 1 ) t + log ( t )  Let 0 = log ( 0 ) , 1 = log ( 1 ) , ut = log ( t ), the transformed model becomes: log ( yt ) = 0 + 1t + ut  Hence this model is linear in the parameters α0 and α1. Example 6.9  Western Steakhouse, a fast-food chain, opened 15 years ago. Each year since the number of steakhouse in operation, yt , was recorded. An analyst for the firm wishes to use these data to predict the number of steakhouses that will be in operation next year.  The steakhouses data (yt) are plotted againts time (t) in Figure 6.22.  This plot shows an exponential increase (growth model with β1 >1).  This suggest a model: yt =  0 ( 1t )  t  Transformed this model using natural logarithm, we have: ln ( yt ) = 0 + 1t + ut Year (t) yt ln yt 1 11 2.398 2 14 2.639 3 16 2.773 4 22 3.091 5 28 3.332 6 36 3.584 7 46 3.829 8 67 4.205 9 82 4.407 10 99 4.595 11 119 4.779 12 156 5.050 13 257 4.549 14 284 5.649 15 403 5.999  Least point estimate of α0 and α1 are: ˆ0 = 2.07012 and ˆ1 = 0.25688  However, since 0 = log ( 0 ) , 1 = log ( 1 ) , we have: ˆ0 = eˆ = e 2.07012 = 7.92577 0 ˆ1 = eˆ = e0.25688 = 1.293 1  Thus, our estimated growth model is: yt = 7.92577 (1.293t )  t  However, some modification can be done to this model: yt = 0 ( 1t )  t =  0 ( 1t −1 ) 1 t   yt −1  1 t  Thus, we have: yt   yt −1  1 t yt   yt −1 1.293 t  We estimate yt to be approximately 1.293 times yt-1. ( )  Thus, we have 100 ˆ1 − 1 % = 100 (1.293 − 1) % = 29.3% point estimate of growth rate.  The point prediction of y16 is: ln ( yˆ16 ) = ˆ 0 + ˆ1t = 2.07012 + 0.25688 (16 ) = 6.189 thus : yˆ16 = e6.189 = 483.09  Using the same formula in linear time series regression, the prediction interval for ln ( yˆ16 ) is: 5.9945, 6.3659  Thus, the prediction interval for y16 is: e5.9945 , e6.3659  =  401.22, 581.67   Final step, we need to determine whether the error is autocorrelated or not by using the Durbin Watson test. The Durbin-Watson statistic is d=1.87>dU,.05. Thus, we do not reject the null hypothesis H0: The error terms are not autocorrelated and conclude that there is no evidence that the error terms display first-order autocorrelation. Selecting Predictors  Adjusted R2 – choose models with higher adjusted R2 value.  MSE – Select the model with lower MSE. (Recommended to use Cross-Validation where the MSE that is important is the one calculated from the testing data).  Akaike’s Information Criterion (AIC) - choose models with the minimum AIC value.  AICc – corrected AIC for small sample size. 2𝑘 2 + 2𝑘 AICc = AIC + 𝑇−𝑘−1  BIC – Same as AIC, best model is the one with lowest BIC value. ത 2 is less suitable for forecasting. 𝑅  Many statisticians like to use the BIC because it has the feature that if there is a true underlying model, the BIC will select that model given enough data. However, in reality, there is rarely, if ever, a true underlying model.  It’s recommended to use AICc, AIC or MSE (with CV), since their objective is for forecasting.

Time Series Regression PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue