DAT320: Outlier Detection in Time Series Data PDF
Document Details

Uploaded by WittyAloe
Norwegian University of Life Sciences
2024
Kristian Hovde Liland
Tags
Summary
This document discusses outlier detection methods in time series data, focusing on discord detection, distance-based, and model-based approaches. The document includes examples and figures to illustrate the concepts. The target audience appears to be undergraduate students in a data analysis course.
Full Transcript
DAT320: Outlier detection Collective outliers in time series data Kristian Hovde Liland [email protected] Autumn 2024 Norwegian University of Life Sciences Collective outliers in time series Discord detection Distance-based approaches Model-bas...
DAT320: Outlier detection Collective outliers in time series data Kristian Hovde Liland [email protected] Autumn 2024 Norwegian University of Life Sciences Collective outliers in time series Discord detection Distance-based approaches Model-based approaches 1 Norwegian University of Life Sciences Collective outliers in time series Key properties of collective outliers (subsequences) I Length of the subsequence I Fixed-length versus variable-length 2 Norwegian University of Life Sciences Collective outliers in time series Key properties of collective outliers (subsequences) I Length of the subsequence I Fixed-length versus variable-length I Representation of the subsequence I Models, transformations 2 Norwegian University of Life Sciences Collective outliers in time series Key properties of collective outliers (subsequences) I Length of the subsequence I Fixed-length versus variable-length I Representation of the subsequence I Models, transformations I Periodicity of subsequence outliers I Non-periodic sequences I Periodic sequences 2 Norwegian University of Life Sciences Collective outliers in time series Key properties of collective outliers (subsequences) I Length of the subsequence I Fixed-length versus variable-length I Representation of the subsequence I Models, transformations I Periodicity of subsequence outliers I Non-periodic sequences I Periodic sequences I Based on [Blázquez-García et al., 2021] and [Gupta et al., 2014] 2 Norwegian University of Life Sciences Approaches for collective outliers Approaches to detecting collective outliers (subsequences) I Discord detection I Distance-based approaches I Model-based approaches 3 Norwegian University of Life Sciences Example Figure 1: Example dataset of ECG signals. 4 Norwegian University of Life Sciences Discord detection I Concept: determine "most unusual subsequence" (discord) h I Define time series subsequences by sliding window Tk = tk − 2δ , tk + δ 2 for some tk and fixed δ 5 Norwegian University of Life Sciences Discord detection I Concept: determine "most unusual subsequence" (discord) h I Define time series subsequences by sliding window Tk = tk − 2δ , tk + 2δ for some tk and fixed δ I Discord subset Tk? (the subset which is most different to its most similar subset) ! Tk? = arg max min d(xTi , xTj ) Ti Tj :Tj ∩Ti =∅ 5 Norwegian University of Life Sciences Discord detection I Concept: determine "most unusual subsequence" (discord) h I Define time series subsequences by sliding window Tk = tk − 2δ , tk + 2δ for some tk and fixed δ I Discord subset Tk? (the subset which is most different to its most similar subset) ! Tk? = arg max min d(xTi , xTj ) Ti Tj :Tj ∩Ti =∅ I min d(xTi , xTj ) acts as a 1-nearest neighbor distance (single linkage between Tj :Tj ∩Ti =∅ Ti and all possible non-overlapping subsets Tj ) 5 Norwegian University of Life Sciences Discord detection value time Figure 2: Discord detection 6 Norwegian University of Life Sciences Discord detection value d(.,.) time Figure 2: Discord detection 6 Norwegian University of Life Sciences Discord detection value d(.,.) time Figure 2: Discord detection 6 Norwegian University of Life Sciences Discord detection value d(.,.) time Figure 2: Discord detection 6 Norwegian University of Life Sciences Discord detection value d(.,.) time Figure 2: Discord detection 6 Norwegian University of Life Sciences Discord detection value d(.,.) time Figure 2: Discord detection 6 Norwegian University of Life Sciences Discord detection value d(.,.) time Figure 2: Discord detection 6 Norwegian University of Life Sciences Discord detection value d(.,.) time Figure 2: Discord detection 6 Norwegian University of Life Sciences Discord detection value d(.,.) time Figure 2: Discord detection 6 Norwegian University of Life Sciences Discord detection I How to find discord subsets? I Option 1: brute-force search (compare all pairs of subsets to each other) I Option 2: HOT-SAX algorithm (heuristics / pruning) 7 Norwegian University of Life Sciences Discord detection I How to find discord subsets? I Option 1: brute-force search (compare all pairs of subsets to each other) I Option 2: HOT-SAX algorithm (heuristics / pruning) I Limitation of discord detection: I Predefined window width δ I Delivers exactly one "most unusual sequence" 7 Norwegian University of Life Sciences Discord detection Figure 3: Brute force Discord, window = 100, #discords = 1 8 Norwegian University of Life Sciences Distance-based approaches I Discord detection: pairwise comparison between subsequences I Concept of distance-based outlier detection: comparison of subsequence to reference ("normal" subsequence x̂) d(xTi , x̂) > τ 9 Norwegian University of Life Sciences Distance-based approaches I Discord detection: pairwise comparison between subsequences I Concept of distance-based outlier detection: comparison of subsequence to reference ("normal" subsequence x̂) d(xTi , x̂) > τ I How to obtain the reference x̂? I Reference from same time series → clustering 9 Norwegian University of Life Sciences Distance-based approaches I Discord detection: pairwise comparison between subsequences I Concept of distance-based outlier detection: comparison of subsequence to reference ("normal" subsequence x̂) d(xTi , x̂) > τ I How to obtain the reference x̂? I Reference from same time series → clustering I Reference from external time series → feature-based classification/clustering, or dictionary of patterns 9 Norwegian University of Life Sciences Distance-based approaches (a) reference value time (b) time series Figure 4: Distance-based approaches 10 Norwegian University of Life Sciences Distance-based approaches (a) reference value time (b) time series Figure 4: Distance-based approaches 10 Norwegian University of Life Sciences Distance-based approaches (a) reference value time (b) time series Figure 4: Distance-based approaches 10 Norwegian University of Life Sciences Distance-based approaches (a) reference value time (b) time series Figure 4: Distance-based approaches 10 Norwegian University of Life Sciences Distance-based approaches (a) reference value time (b) time series Figure 4: Distance-based approaches 10 Norwegian University of Life Sciences Distance-based approaches Figure 5: Reference: seasonal average – heartbeat (could have used median) 11 Norwegian University of Life Sciences Distance-based approaches Figure 6: Distance-based collective outlier detection (mean absolute deviation from reference). 12 Norwegian University of Life Sciences Model-based approaches I Related to distance-based approach with reference from same time series (history) I Concept: train predictive model and measure distance between prediction and reference in interval t+δ X |xi − x̂i | > τ i=t+1 13 Norwegian University of Life Sciences Model-based approaches I Related to distance-based approach with reference from same time series (history) I Concept: train predictive model and measure distance between prediction and reference in interval t+δ X |xi − x̂i | > τ i=t+1 I Train model on {1,... , t} I Predict δ steps into the future x̂t+1 ,... , x̂t+δ 13 Norwegian University of Life Sciences Model-based approaches value time Figure 7: Model-based approaches 14 Norwegian University of Life Sciences Model-based approaches value training forecast time Figure 7: Model-based approaches 14 Norwegian University of Life Sciences Model-based approaches value training forecast time Figure 7: Model-based approaches 14 Norwegian University of Life Sciences Model-based approaches value training forecast time Figure 7: Model-based approaches 14 Norwegian University of Life Sciences Model-based approaches Figure 8: Model-based collective outlier detection (seasonal naive model, window = 148). 15 Norwegian University of Life Sciences R code library ( jmotif ) ecg