DAT320: Forecasting - Course Overview PDF
Document Details
Uploaded by WittyAloe
Norwegian University of Life Sciences
2024
Kristian Hovde Liland
Tags
Summary
This document provides an overview of the course DAT320: Sequential and Time Series Data Analysis. It covers basic concepts, forecasting methods, and outlier detection techniques, suitable for an undergraduate-level course.
Full Transcript
Course overview Part 1: Basics Part 2: Forecasting I recap on statistics & I exponential smoothing introduction to R I ARIMA models and variants I exploratory data analysis I Granger causality I preprocessing temporal...
Course overview Part 1: Basics Part 2: Forecasting I recap on statistics & I exponential smoothing introduction to R I ARIMA models and variants I exploratory data analysis I Granger causality I preprocessing temporal data Part 4: Outlier detection Part 3: Segmentation I types of outliers I stochastic processes I outlier detection in general I Hidden Markov Models I outlier detection in time I machine learning approaches series data 8 Norwegian University of Life Sciences Exam Date: 13.12.2024, 09:00-12:30 Facts I 3.5 hours I counts 60% to the final grade I written, closed-book exam in WiseFlow I paper & pencil and programming exercises I permitted aids: calculator I contents from lecture and compulsory assignments 17 Norwegian University of Life Sciences DAT320: Sequential and Time Series Data Analysis Recap on statistical fundamentals Kristian Hovde Liland [email protected] Autumn 2024 Norwegian University of Life Sciences Discrete random variables I discrete random variable X Discrete probability distributions I defined on a discrete space D, e.g., D = Z I described by a probability mass function (pmf) fX : D æ [0, 1], fX (x) = P (X = x) = px q I it holds that px = 1 xœD Figure 1: pmf of a binomial I examples: binomial or distribution (p = 0.4, n = 10) Poisson distribution 2 Norwegian University of Life Sciences Continuous random variables I continuous random variable X I defined on a continuous space D, e.g., D = R I described by a probability density function (pdf) fX : D æ RØ0 I it holds that s I fX (x)dx = 1 D sb I for an interval [a, b] µ D, P (a Æ X < b) = fX (x)dx a I examples: normal or lognormal distribution 3 Norwegian University of Life Sciences Relations between random variables I expected value & variance describe characteristics of one random variable I independence, covariances & correlations are relations between two (or more) random variables 11 Norwegian University of Life Sciences Linear regression Multiple linear regression model (1) (k) yi = —0 + —1 xi + · · · + —k xi + Ái , i = 1,... , n y = X + " (vector notation) 1 2 I X = 1n x(1)... x(k)... design matrix I... regression coefficients 19 Norwegian University of Life Sciences Likelihood I under the following assumptions: I fX1 |◊ = · · · = fXn |◊ =: fX|◊ (identical distribution) and I X1 ,... , Xn are independent I it holds that n Ÿ L(◊; x1 ,... , xn ) = fX|◊ (xi ) i=1 Example: linear regression error terms Ái in linear regression models are... I i.i.d. (independent and identically distributed) I fÁi | is a Gaussian distribution with mean 0 and variance ‡ 2 31 Norwegian University of Life Sciences Purpose of likelihoods Parameter estimation / model training I Maximum likelihood estimation: ◊ˆML = arg max L(◊; x1 ,... , xn ) ◊œ Model selection / hyperparameter tuning I Information criteria based on likelihood values are used to compare models of the same type I most prominent examples: I Akaike’s Information criterion (AIC) I Bayesian Information criterion (BIC) 32 Norwegian University of Life Sciences Maximum-likelihood estimation I Which ◊ œ is the "best" one? I maximize the likelihood value for a given set of data I alternative to OLS estimation procedure I Maximum-likelihood (ML) estimate: ◊ˆML = arg max L(◊; x1 ,... , xn ) ◊œ I determine the expression for the likelihood from the model assumptions (e.g., Gaussian pdf) I set the first derivative of the log-likelihood to 0: ˆ ¸(◊; x1 ,... , xn ) = 0 ˆ◊ I solve the equation for ◊ 35 Norwegian University of Life Sciences DAT320: Forecasting Basic concepts Kristian Hovde Liland [email protected] Autumn 2024 Norwegian University of Life Sciences Forecasting I prediction model I target variable: future values xt+1 , xt+2 ,... I predictors: present and past values (history) xt , xt≠1 ,... 2 Norwegian University of Life Sciences Forecasting I prediction horizon = maximum number of time steps h to predict ahead I in general: the more steps ahead, the more uncertain Single-step ahead Multi-step ahead I target variable xt+1 I target variable xt+h , h > 1 3 Norwegian University of Life Sciences Uni- and multivariate time series I which information should be used for predictors? I history / present state of (xt )tœT I history / present state of other variables (exogeneous variables, covariates) (zt )tœT Univariate forecasting Univariate forecasting with covariates Multivariate forecasting I predictors xs , s Æ t I predictor xs and zs , I predictors xs , s Æ t I target variable sÆt I target variable xt+1 ,... , xt+h I target variable xt+1 ,... , xt+h xt+1 ,... , xt+h 4 Norwegian University of Life Sciences Baseline models I 4 baseline models should be evaluated as minimum benchmarks for any more complex forecasting models I average method I drift method I naïve method I seasonal naïve method 5 Norwegian University of Life Sciences Average and drift method (a) Average method (b) Drift method Figure 1: Baseline forecasts – passenger data 8 Norwegian University of Life Sciences Naïve and seasonal naïve method (a) Naïve method (b) Seasonal naïve method Figure 2: Baseline forecasts 11 Norwegian University of Life Sciences Train-test splits I no iid data ∆ no "random" splitting possible I instead, use cut-off point ttrain œ T I training set {x1 ,... , xttrain } (usually approx. 80%) I test set {xttrain +1 ,... , xtmax } (usually approx. 20%) training test time ttrain 13 Norwegian University of Life Sciences Cross-validation fold 1 training test fold 2 training test fold 3 training test fold 4 training test t1 t2 t3 t4 time 15 Norwegian University of Life Sciences Model evaluation: training data I fitted values = predictions on training data x̂t I residuals xt ≠ x̂t I requirements (similar to linear regression) I uncorrelated I mean 0 I constant variance I normally distributed I see [Hyndman and Athanasopoulos, 2021, ch. 5.4] 16 Norwegian University of Life Sciences Model evaluation: training data Figure 4: Fitted values versus residuals. 18 Norwegian University of Life Sciences DAT320: Forecasting Exponential smoothing Kristian Hovde Liland [email protected] Autumn 2024 Norwegian University of Life Sciences Exponential Smoothing Exponential smoothing methods I Simple exponential smoothing I Exponential smoothing with trend (Holt’s method) I Exponential smoothing with seasonality (Holt-Winters’ method) 4 Norwegian University of Life Sciences Performance evaluation Figure 13: Prediction intervals of exponential smoothing methods 27 Norwegian University of Life Sciences Simple exponential smoothing (SES) I Exponential smoothing as a recursion x̂t+1 = –xt + –(1 ≠ –)xt≠1 + –(1 ≠ –)2 xt≠2 +... = –xt + (1 ≠ –) (–xt≠1 + –(1 ≠ –)xt≠2 +... ) = –xt + (1 ≠ –)x̂t I Hence, x̂t+1 is a weighted average between xt and x̂t (with weights – and (1 ≠ –)) I Initial condition x̂1 = –x0 + (1 ≠ –)x̂0 I x̂0 = ¸0 negligible, since –(1 ≠ –)t ≠æ 0 tæŒ 7 Norwegian University of Life Sciences Simple exponential smoothing (SES) (a) – = 0.05 (b) – = 0.25 (c) – = 0.8 Figure 2: Simple exponential smoothing weights for distinct parameters –. 6 Norwegian University of Life Sciences Simple exponential smoothing (SES) I General component form x̂t+h = ¸t forecast ¸t = –xt + (1 ≠ –)¸t≠1 smoothing I x̂t+1 = x̂t+2 = · · · = x̂t+h I Model parameters I –: smoothing parameter I ¸0 : initialization 8 Norwegian University of Life Sciences Exponential smoothing with trend I Holt’s method I Introduces new parameters — œ [0, 1] and b0 I Extension of SES by adding a trend component: x̂t+h = ¸t + hbt forecast ¸t = –xt + (1 ≠ –)(¸t≠1 + bt≠1 ) smoothing bt = —(¸t ≠ ¸t≠1 ) + (1 ≠ —)bt≠1 trend I x̂t+h = x̂t+h≠1 + bt = · · · = x̂t+1 + (h ≠ 1)bt 12 Norwegian University of Life Sciences Exponential smoothing with damped trend I Damped trend = reduction in trend for multi-step ahead predictions I Introduces new parameter „ œ (0, 1): x̂t+h = ¸t + („ + „2 + · · · + „h )bt forecast ¸t = –xt + (1 ≠ –)(¸t≠1 + „bt≠1 ) smoothing bt = —(¸t ≠ ¸t≠1 ) + (1 ≠ —)„bt≠1 trend I x̂t+h = x̂t+h≠1 + „h bt = · · · = x̂t+1 + („2 + · · · + „h )bt 15 Norwegian University of Life Sciences Exponential smoothing with seasonality I Holt-Winters’ additive method for seasonal period p I Introduces new parameters “ œ [0, 1] and s≠(p≠1) ,... , s0 x̂t+h = ¸t + hbt + st+h≠p(k+1) forecast ¸t = –(xt + st≠p ) + (1 ≠ –)(¸t≠1 + bt≠1 ) smoothing bt = —(¸t ≠ ¸t≠1 ) + (1 ≠ —)bt≠1 trend st = “(xt ≠ ¸t≠1 ≠ bt≠1 ) + (1 ≠ “)st≠p seasonality I k =  h≠1 p Ê 20 Norwegian University of Life Sciences Exponential smoothing with seasonality I Holt-Winters’ multiplicative method x̂t+h = ¸t + hbt + st+h≠p(k+1) forecast A B xt ¸t = – + (1 ≠ –)(¸t≠1 + bt≠1 ) smoothing st≠p bt = —(¸t ≠ ¸t≠1 ) + (1 ≠ —)bt≠1 trend 3 4 xt st = “ + (1 ≠ “)st≠p seasonality ¸t≠1 + bt≠1 23 Norwegian University of Life Sciences DAT320: Forecasting ARIMA Kristian Hovde Liland [email protected] Autumn 2024 Norwegian University of Life Sciences Stationarity Stationarity A time series is stationarity, if it fulfills 3 conditions: 1. constant mean value µ over time 2. constant variance ‡ over time 3. constant autocorrelation across all parts of the time series I see [Hyndman and Athanasopoulos, 2021, ch. 9] 2 Norwegian University of Life Sciences Stationarity I How to obtain stationarity? I Differencing D(xt ) = xt ≠ B(xt ) = (1 ≠ B)(xt ) ˙ B k is the lag operator (also denoted L in some literature), e.g., B 2 (xt ) = xt≠2 I Seasonal differencing Ds (xt ) = xt ≠ B p (xt ) = (1 ≠ B p )(xt ) (p... seasonal period) 4 Norwegian University of Life Sciences Autoregressive model (AR) I Regression model with I Current time point as target I Previous time points as predictors I AR(k) model: xt = Ï0 + Ï1 B(xt ) + Ï2 B 2 (xt ) + · · · + Ïk B k (xt ) + Át = Ï0 + Ï1 xt≠1 + Ï2 xt≠2 + · · · + Ïk xt≠k + Át xt≠4 xt≠3 xt≠2 xt≠1 xt AR(1) t 6 Norwegian University of Life Sciences Moving-Average model (MA) I Regression model with I Current time point as target I Previous errors as predictors I M A(q) model: xt = ◊0 + ◊1 Át≠1 + ◊2 Át≠2 + · · · + ◊q Át≠q + Át Át≠4 Át≠3 Át≠2 Át≠1 Át xt≠4 xt≠3 xt≠2 xt≠1 xt M A(2) t 9 Norwegian University of Life Sciences Autoregressive Moving-Average model I Combination of AR(k) and M A(q), given as ARM A(k, q) xt = Ï0 +Ï1 xt≠1 + · · · + Ïk xt≠k + ◊1 Át≠1 + · · · + ◊q Át≠q + Át I Combines properties of AR and M A I assumption: stationarity Át≠4 Át≠3 Át≠2 Át≠1 Át xt≠4 xt≠3 xt≠2 xt≠1 xt ARM A(2, 1) t 13 Norwegian University of Life Sciences Non-stationarity Non-stationary time series I How to handle non-stationarity in ARMA models? I trend: differencing D(.), i.e., discrete derivative, e.g. first order xt ≠ xt≠1. I seasonality: seasonal differencing Ds (.) 15 Norwegian University of Life Sciences ARMA model with integration: ARIMA I Accounts for d-fold differencing in ARM A(k, q), given as ARIM A(k, d, q) Dd (xt ) = Ï0 +Ï1 Dd (xt≠1 ) + · · · + Ïk Dd (xt≠k )+ ◊1 Át≠1 + · · · + ◊q Át≠q + Át I hyperparameters: I k order of autoregressive model (AR(k)) I d degree of differencing (Dd ) I q order of moving-average model (M A(q)) Prediction with differences 16 Norwegian University of Life Sciences Backshift notation of ARIMA I Rephrase model in Backshift operator notation for simplified forecasting Dd (xt ) = Ï0 +Ï1 Dd (xt≠1 ) + · · · + Ïk Dd (xt≠k )+ ◊1 Át≠1 + · · · + ◊q Át≠q + Át (1 ≠ B)d (xt ) = Ï0 +Ï1 (1 ≠ B)d (B(xt )) + · · · + Ïk (1 ≠ B)d (B k (xt ))+ ◊1 B(Át ) + · · · + ◊q B q (Át ) + Át 18 Norwegian University of Life Sciences ARIMA model I KPSS test I p-value > 0.05 ∆ no differencing (d = 0) I ACF & PACF: I sinusoidal ACF ∆ try AR(k) I largest significant lag in PACF is 2 ∆ try AR(2) (a) ACF (b) PACF 30 Norwegian University of Life Sciences DAT320: Forecasting Seasonal ARIMA Kristian Hovde Liland [email protected] Autumn 2024 Norwegian University of Life Sciences Seasonal ARIMA model (SARIMA) SARIM A(0, 0, 0)(1, 0, 1)p Át≠2p... Át≠p... Át≠3 Át≠2 Át≠1 Át xt≠2p... xt≠p... xt≠3 xt≠2 xt≠1 xt t 5 Norwegian University of Life Sciences Seasonal ARIMA model (SARIMA) SARIM A(1, 1, 1)(1, 1, 1)p Multiplicative combination of terms: (1 ≠ Ï1 B)(1 ≠ 1B p )(1 ≠ B)(1 ≠ B p )xt = (1 + ◊1 B)(1 + 1B p )Át I (1 ≠ Ï1 B ≠ · · · ≠ Ïk B k )... AR(k) terms I (1 ≠ 1 B p ≠ · · · ≠ K B kp )... SAR(K) terms I (1 ≠ B)d... differencing terms I (1 ≠ B p )D... seasonal differencing terms I (1 + ◊1 B + · · · + ◊q B q )... M A(q) terms I (1 + 1B p + ··· + QB Qp )... SM A(Q) terms 4 Norwegian University of Life Sciences DAT320: Forecasting Multivariate extensions of ARIMA Kristian Hovde Liland [email protected] Autumn 2024 Norwegian University of Life Sciences Models with lagged predictors (overview) model name predictors xt ≥... xt≠1 ,... zt zt≠1 ,... linear regression 7 X 7 distributed lag model (DLM) 7 X X ARIMA X 7 7 dynamic regression X X 7 dynamic regression with lagged predictors X X X 18 Norwegian University of Life Sciences ARIMAX model I ARIM A(k, d, q) (1 ≠ Ï1 B ≠ · · · ≠ Ïk B k )(1 ≠ B)d xt = (1 + ◊1 B + · · · + ◊q B q )Át I ARIM AX(k, d, q) (with single exogeneous variable zt ) (1 ≠ Ï1 B ≠ · · · ≠ Ïk B k )(1 ≠ B)d xt = —zt + (1 + ◊1 B + · · · + ◊q B q )Át I ARIM AX(k, d, q) (with multiple exogeneous variables zt ) (1 ≠ Ï1 B ≠ · · · ≠ Ïk B k )(1 ≠ B)d xt = T zt + (1 + ◊1 B + · · · + ◊q B q )Át , (1) (r) where Tz t = —1 zt + · · · + —r zt 5 Norwegian University of Life Sciences Dynamic regression with ARIMA errors I Alternative: combine "classical" regression setup with ARIMA error model I xt = —0 + —1 zt(1) + · · · + —r zt(r) + ÷t I ÷t ≥ ARIM A(k, d, q): (1 ≠ Ï1 B ≠ · · · ≠ Ïk B k )(1 ≠ B)d ÷t = (1 + ◊1 B + · · · + ◊q B q )Át I Át : i.i.d. model error I Exchanges i.i.d. error assumption of regression model by ARIMA (allowing autocorrelation in error terms ÷t ) I Note that differencing should be applied to make all input variables stationary (use maxd across all inputs) I See [Hyndman and Athanasopoulos, 2021, ch. 10] 7 Norwegian University of Life Sciences Distributed lag model (DLM) xt = ‹0 + —0 zt + —1 zt≠1 + · · · + —u zt≠u + Át Át ≥ N (0, ‡ 2 ) i.i.d I In contrast to the models discussed previously, the DLM model assumes that xt is exclusively described by the history of zt I This leads to a simplification compared to the dynamic regression model: standard linear regression setup 14 Norwegian University of Life Sciences VARIMA model I "Vector ARIMA": multivariate forecasting using ARIMA I Multivariate target variable xt I See [Hyndman and Athanasopoulos, 2021, ch. 12.3] and [Mills, 2019, ch. 13] xt≠1 xt≠1 zt (1) (2) xt≠1 xt≠1 xt xt (1) (2) xt xt (a) ARIMA (b) dynamic regression (c) VARIMA 19 Norwegian University of Life Sciences Granger causality I Why is this concept useful? Granger causality test I H0 : xT is not informative for predicting yT (lags of xT should not be used when predicting yt based on yt≠1 ,... ) yt = Ï0 + Ï1 B(yt ) + · · · + Ïu B u (yt ) + Át I HA : lags l1 ,... , l2 of xT are informative for predicting yT yt =Ï0 + Ï1 B(yt ) + · · · + Ïu B u (yt )+ —l1 B l1 (xt ) + · · · + —l2 B l2 (xt ) + Át 26 Norwegian University of Life Sciences DAT320: Outlier detection Introduction to outlier detection Kristian Hovde Liland [email protected] Autumn 2024 Norwegian University of Life Sciences Point outliers Figure 1: Point outliers 4 Norwegian University of Life Sciences Collective outliers Figure 2: Collective outliers 5 Norwegian University of Life Sciences Contextual outliers Figure 3: Contextual outliers 6 Norwegian University of Life Sciences Outlier detection (in multivariate data) Outlier detection approaches I Statistical / distribution-based (e.g., z-scores) I Density-based (e.g., Local Outlier Factor) I Model-based (e.g., one-class SVM) 8 Norwegian University of Life Sciences Mahalanobis distance I How about multivariate outliers? I With z-scores, |z| can be interpreted as a distance from the distribution mean µ I Related concepts are available for multivariate data I Idea: quantify "distance from distribution mean" µ I Assumption: X ≥ N (µ, ⌃) I Mahalanobis distance I dM (x, y) = (x ≠ y)T ⌃≠1 (x ≠ y) I "Weighted" Euclidean distance by inverse covariances I Outlier detection: dM (x, µ) > t 12 Norwegian University of Life Sciences k-Nearest neighbor y (2) y (3) x y (1) y (4) Figure 7: k-nearest neighbors 17 Norwegian University of Life Sciences DBSCAN Á Á Á (a) core point (b) border point (c) noise point Figure 8: Categories of points in DBSCAN (t = 5). 19 Norwegian University of Life Sciences Local outlier factor I Compare "local density" to neighbors I LOF(x) < 1: higher density than neighbors I LOF(x) > 1: lower density than neighbors I concept: I Use kNN to determine k-nearest neighbors, Nk (x) I Determine "local density" for a data point (local reachability density, LRD) I Compare LRD of a sample to LRD of its k-nearest neighbours Q q R≠1 dk (x, y) c yœNk (x) d LRDk (x) = a b |Nk (x)| q LRDk (y) LRDk (x) yœNk (x) LOF = |Nk (x)| 21 Norwegian University of Life Sciences Isolation forest S0 S1 S2 S1 S6 S4 S3 S4 S7 x S5 S6 S7 S8 h(x) = 3 (b) Dataset (a) Isolation tree 26 Norwegian University of Life Sciences One-class SVM I Apply non-linear (spherical) kernel I Train an SVM (centered around origin) to contain all training data points I Outliers: outside boundaries Figure 15: One-class SVM with spherical kernel 30 Norwegian University of Life Sciences DAT320: Outlier detection Point outliers in time series data Kristian Hovde Liland [email protected] Autumn 2024 Norwegian University of Life Sciences Outliers in time series (uni-/multivariate) Approaches for point outliers in time series I Ignore temporal component I Temporal windowing I Customized outlier detectors I Univariate vs multivariate time series I Global vs local (contextual) outlier I Based on [Blázquez-García et al., 2021] 2 Norwegian University of Life Sciences Temporal windowing value ” time Figure 3: Temporal windowing 7 Norwegian University of Life Sciences Example Temporal window (a) z-scores (b) Time series 9 Figure 5: Temporal window with z-scoresNorwegian (windowUniversity length 5). of Life Sciences Density-based approaches value ±Á time Figure 6: Temporal windowing with density-based approach 11 Norwegian University of Life Sciences Model-based approaches I Concept: time series models are able to describe outliers well xt is outlier … |xt ≠ x̂t | > Á I Estimation-based I Prediction-based I Model is trained based on all values I Model is trained only on history xT {x1 ,... , xt≠1 } I Outliers produce large residuals I Outliers produce inaccurate xt ≠ x̂t predictions x̂t 13 Norwegian University of Life Sciences Model-based approaches value training fitted time Figure 8: Estimation-based outlier detection 14 Norwegian University of Life Sciences Model-based approaches value training forecast time Figure 11: Prediction-based outlier detection 17 Norwegian University of Life Sciences Distribution-based approaches I Concept: histograms become more accurate when removing outliers I Histograms along the time axis: I Binning of the time axis ”1 = 1 < ”2 < · · · < ”K = tmax I Histogram value hi , i = 1,... , K ≠ 1 are given as: 1 ÿ hi = xt ”i+1 ≠ ”i tœ[”i ,”i+1 ) 19 Norwegian University of Life Sciences Distribution-based approaches I xt is an outlier, if EX (HX ú ) > EX\{xt } (HX\{x ú t} ) (a) Full time-histogram (b) Time-histogram with removed xt 21 Norwegian University of Life Sciences DAT320: Outlier detection Collective outliers in time series data Kristian Hovde Liland [email protected] Autumn 2024 Norwegian University of Life Sciences Collective outliers in time series Key properties of collective outliers (subsequences) I Length of the subsequence I Fixed-length versus variable-length I Representation of the subsequence I Models, transformations I Periodicity of subsequence outliers I Non-periodic sequences I Periodic sequences 2 Norwegian University of Life Sciences Discord detection I Concept: determine "most unusual subsequence" (discord) Ë 2 I Define time series subsequences by sliding window Tk = tk ≠ 2” , tk + 2” for some tk and fixed ” I Discord subset Tkı (the subset which is most different to its most similar subset) A B Tkı = arg max min d(xTi , xTj ) Ti Tj :Tj flTi =ÿ I min d(xTi , xTj ) acts as a 1-nearest neighbor distance (single linkage between Tj :Tj flTi =ÿ Ti and all possible non-overlapping subsets Tj ) 5 Norwegian University of Life Sciences Discord detection value d(.,.) time Figure 2: Discord detection 6 Norwegian University of Life Sciences Discord detection I How to find discord subsets? I Option 1: brute-force search (compare all pairs of subsets to each other) I Option 2: HOT-SAX algorithm (heuristics / pruning) I Limitation of discord detection: I Predefined window width ” I Delivers exactly one "most unusual sequence" 7 Norwegian University of Life Sciences Distance-based approaches I Discord detection: pairwise comparison between subsequences I Concept of distance-based outlier detection: comparison of subsequence to reference ("normal" subsequence x̂) d(xTi , x̂) > · I How to obtain the reference x̂? I Reference from same time series æ clustering I Reference from external time series æ feature-based classification/clustering, or dictionary of patterns 9 Norwegian University of Life Sciences Distance-based approaches (a) reference value time (b) time series Figure 4: Distance-based approaches 10 Norwegian University of Life Sciences Model-based approaches value training forecast time Figure 7: Model-based approaches 14 Norwegian University of Life Sciences