Podcast
Questions and Answers
What is the main method used in brute-force search for discord detection?
What is the main method used in brute-force search for discord detection?
Which algorithm is mentioned as a method that utilizes heuristics in discord detection?
Which algorithm is mentioned as a method that utilizes heuristics in discord detection?
What is a limitation of the discord detection process?
What is a limitation of the discord detection process?
How is the reference subsequence $ar{x}$ obtained in distance-based outlier detection?
How is the reference subsequence $ar{x}$ obtained in distance-based outlier detection?
Signup and view all the answers
In the context of distance-based outlier detection, what condition indicates an outlier?
In the context of distance-based outlier detection, what condition indicates an outlier?
Signup and view all the answers
What defines a time series subsequence in the context of discord detection?
What defines a time series subsequence in the context of discord detection?
Signup and view all the answers
In discord detection, what does the discord subset Tk represent?
In discord detection, what does the discord subset Tk represent?
Signup and view all the answers
What is the purpose of finding the discord subset in a time series?
What is the purpose of finding the discord subset in a time series?
Signup and view all the answers
Which mathematical operation is used to optimize the selection of the discord subset Tk?
Which mathematical operation is used to optimize the selection of the discord subset Tk?
Signup and view all the answers
How is the sliding window defined for extracting subsequences in discord detection?
How is the sliding window defined for extracting subsequences in discord detection?
Signup and view all the answers
What is one key property of collective outliers in time series data?
What is one key property of collective outliers in time series data?
Signup and view all the answers
Which of these represents different approaches to detecting collective outliers?
Which of these represents different approaches to detecting collective outliers?
Signup and view all the answers
What aspect of subsequence outliers relates to their repetition patterns?
What aspect of subsequence outliers relates to their repetition patterns?
Signup and view all the answers
Which option correctly differentiates the types of subsequences in collective outlier detection?
Which option correctly differentiates the types of subsequences in collective outlier detection?
Signup and view all the answers
What is discord detection primarily used for?
What is discord detection primarily used for?
Signup and view all the answers
What do model-based approaches involve in the context of collective outliers?
What do model-based approaches involve in the context of collective outliers?
Signup and view all the answers
In collective outlier detection, what does the representation of subsequences pertain to?
In collective outlier detection, what does the representation of subsequences pertain to?
Signup and view all the answers
Which of the following properties is not typically associated with collective outliers?
Which of the following properties is not typically associated with collective outliers?
Signup and view all the answers
What method is utilized when referencing the same time series for clustering?
What method is utilized when referencing the same time series for clustering?
Signup and view all the answers
Which approach focuses on utilizing external time series?
Which approach focuses on utilizing external time series?
Signup and view all the answers
What does a reference value relate to in distance-based approaches?
What does a reference value relate to in distance-based approaches?
Signup and view all the answers
In the context of distance-based approaches, what could be used instead of seasonal average?
In the context of distance-based approaches, what could be used instead of seasonal average?
Signup and view all the answers
What does collective outlier detection typically rely on?
What does collective outlier detection typically rely on?
Signup and view all the answers
Which approach is NOT mentioned in relation to time series?
Which approach is NOT mentioned in relation to time series?
Signup and view all the answers
What is a significant advantage of referencing data patterns externally?
What is a significant advantage of referencing data patterns externally?
Signup and view all the answers
Which of the following is NOT a distance-based approach?
Which of the following is NOT a distance-based approach?
Signup and view all the answers
What does the expression $\text{arg max } \text{min } d(x_{Ti}, x_{Tj})$ represent in the context of discord detection?
What does the expression $\text{arg max } \text{min } d(x_{Ti}, x_{Tj})$ represent in the context of discord detection?
Signup and view all the answers
In discord detection, what role does the expression $d(x_{Ti}, x_{Tj})$ serve?
In discord detection, what role does the expression $d(x_{Ti}, x_{Tj})$ serve?
Signup and view all the answers
What condition must the subsets $T_i$ and $T_j$ satisfy in the given expression?
What condition must the subsets $T_i$ and $T_j$ satisfy in the given expression?
Signup and view all the answers
What does the term 'discord detection' refer to in this context?
What does the term 'discord detection' refer to in this context?
Signup and view all the answers
What implication does the term 'single linkage' have in the context of distance measurement?
What implication does the term 'single linkage' have in the context of distance measurement?
Signup and view all the answers
How can one effectively find discord subsets based on the information provided?
How can one effectively find discord subsets based on the information provided?
Signup and view all the answers
Which of the following best describes the method associated with discord detection?
Which of the following best describes the method associated with discord detection?
Signup and view all the answers
Which statement is NOT true regarding the non-overlapping subsets in discord detection?
Which statement is NOT true regarding the non-overlapping subsets in discord detection?
Signup and view all the answers
What is a critical step in the model-based approach for predictive modeling?
What is a critical step in the model-based approach for predictive modeling?
Signup and view all the answers
In the context of model-based approaches, what does the symbol $|xi − x̂i| > τ$ represent?
In the context of model-based approaches, what does the symbol $|xi − x̂i| > τ$ represent?
Signup and view all the answers
What does the term 'δ steps into the future' refer to in model-based forecasting?
What does the term 'δ steps into the future' refer to in model-based forecasting?
Signup and view all the answers
How does the model-based approach calculate the prediction error?
How does the model-based approach calculate the prediction error?
Signup and view all the answers
What type of model is indicated by 'seasonal naive model' in the context of model-based approaches?
What type of model is indicated by 'seasonal naive model' in the context of model-based approaches?
Signup and view all the answers
What is the primary focus of the model-based approach in predictive analysis?
What is the primary focus of the model-based approach in predictive analysis?
Signup and view all the answers
What is likely indicated by the reference to 'window = 148' in the figure mentioned?
What is likely indicated by the reference to 'window = 148' in the figure mentioned?
Signup and view all the answers
Which statement about the model-based approach is correct?
Which statement about the model-based approach is correct?
Signup and view all the answers
Study Notes
Norwegian University of Life Sciences
- Norwegian University of Life Sciences is a university of applied sciences.
- It is located in Norway.
DAT320: Outlier detection
- Focuses on detecting collective outliers in time series data.
- Course code: DAT320
- Instructor: Kristian Hovde Liland
- Autumn 2024
Collective outliers in time series
- Key properties of collective outliers (subsequences):
- Length of the subsequence
- Fixed-length versus variable-length
- Representation of the subsequence
- Models, transformations
- Periodicity of subsequence outliers
- Non-periodic sequences
- Periodic sequences
- Based on [Blázquez-García et al., 2021] and [Gupta et al., 2014]
Approaches for detecting collective outliers
- Discord detection
- Distance-based approaches
- Model-based approaches
Discord detection
- Concept: determine "most unusual subsequence" (discord).
- Define time series subsequences by sliding window T = [t −2, t + 2) for some tk and fixed δ.
- Discord subset Tk (the subset which is most different to its most similar subset)
Distance-based approaches
- Discord detection: pairwise comparison between subsequences.
- Concept of distance-based outlier detection: comparison of subsequence to reference ("normal" subsequence x)
- d(xT1, x) > τ
- How to obtain the reference x?: clustering
- Reference from external time series → feature-based classification/clustering, or dictionary of patterns
Model-based approaches
- Related to distance-based approach with reference from same time series (history).
- Concept: train predictive model and measure distance between prediction and reference in interval.
- Train model on {1,...,t}.
- Predict &h steps into the future Xt+1,..., Xt+h.
DAT320: Point outliers
- Approaches for point outliers in time series
- Temporal windowing
- Model-based approaches
- Distribution-based approaches
- Multivariate time series
- Outliers in time series (uni/multivariate)
- Ignore temporal component
- Temporal windowing
- Customized outlier detectors
- Univariate vs multivariate time series
- Global vs local (contextual) outlier
- Based on [Blázquez-García et al., 2021]
Time series as random samples
- Baseline concept: ignore temporal dependencies.
- Works well for global outliers.
- Easy to implement.
- Is not capable of detecting contextual outliers.
- May be distorted by trend & seasonalities.
- Same methodology as for random samples (uni-/multivariate data)
Temporal windowing
- Concept: divide time axis into windows [t-δ/2, t+δ/2] such that δ>0.
- applies non-temporal method to each window.
- Works well for seasonal patterns.
- Same methodology as for random samples (uni-/multivariate data).
- Hard to determine the window size δ
Density-based approaches
- Concept: outliers have only few neighbors.
- xt is an outlier ⇔|{x ∈ {Xt-t, ..., Xt+}\d(x, xt) ≤ ε} ) ≤ τ .
- Applied in temporal window
- KNN distance
- DBSCAN
- LOF
Model-based approaches
- Concept: time series models are able to describe outliers well.
- xt is outlier | Xt - Xt | ->τ.
- Estimation-based
- Model is trained based on all values
- Outliers produce large residuals
- Prediction-based
- Model is trained only on history {x₁,..., xt-1}
- Outliers produce inaccurate predictions Xt
Distance-based approaches
- Based on distance from a reference model
- Reference from the original time series. -> apply clustering
- Reference from an external source such as another dataset, external data sources. -> Feature based classification or clustering or dictionary of patterns
R code
- Provides code snippets for outlier detection using various methods.
- Includes libraries for time series analysis.
Literature
- Includes references to research papers and books that discuss time series analysis, outlier detection and other relevant topics.
R code (Forecasting)
- Provides R code examples for forecasting.
- Uses various functions & packages such as library(forecast), library(datasets), library(imputeTS), etc.
R code
- Includes various forecasting models: ARIMA, ARIMAX, dynamic regression, and others
Forecasting using ARIMA
- How to handle non-stationarity to predict future values of target variables:
ARIMA & dynamic regression model
- ARIMAX is an ARIMA with exogenous inputs
- Variable of interest: xt
- Additional variable with influence on xt (covariate, exogenous input): zt
Dynamic regression with ARIMA errors
- Alternative: Combine "classical" regression setup with ARIMA error model
- Uses i.i.d. model error assumption
Distributed lag model (DLM)
- Distributed lag model assumes that xt is exclusively described by the history of zt
- Leads to simplification compared to dynamic regression or to the standard linear regression setup
VARIMA model
- Multivariate forecasting using ARIMA
Causality versus correlation
- Correlation indicates a relationship but not necessarily causation.
Granger causality
- Concept
- Usefulness
R code (Granger Causality Testing)
- Provide example codes for calculating Granger causality
Further reading
- Provides links to various resources for further learning.
Literature (including references)
- Lists research papers and books on time series analysis and forecasting.
DAT320: Segmentation | Classification | Clustering: Distance-based Methods
- Classification & clustering problems in time series data
- Comparing time series segments to each other (clustering)
- Modeling a (time-independent) target variable from a time series segment (classification)
- Time series segmentation & change-point detection
- Approaches
- Distance-based time series clustering and classification
- Feature-based
- Model-based
Distance measures
- How to quantify "distance" or "similarity"?
- Recap: Distance metrics.
- Minkowski distance
- Euclidean distance
- Manhattan distance
- Maximum distance
Dynamic Time Warping (DTW)
- DTW calculates mapping from one time series to another time series, while also accounting for the temporal order.
Correlation-based measures
- Autocorrelation-based measure: Quantify similarity using autocorrelation function (ACF).
- Properties of ACF-based distance metrics.
- Not suitable for evaluating single patterns e.g., peaks or direction of trend.
Model-based measures
- Mainly based on ARIMA models
- Training an ARIMA model with optimized parameters (AIC)
- Determine distance between model parameters
- Variants of ARIMA models
- Maharaj distance
- Properties of model-based measures
- Robust to scaling or shifts in the time axis
- Cannot evaluate single patterns (e.g., peaks, direction trends)
- Additional requirements
Clustering of time series
- Distance measures are key to clustering algorithms.
- Recap: Clustering algorithms.
- Hierarchical clustering
- Centroid-based clustering (k-means, k-medoids)
- Density-based clustering (DBSCAN, OPTICS)
- Dendrogram
- Cluster evaluation
R code (Clustering)
- Provides example codes for hierarchical clustering using R packages.
Further Reading
- Provides links to resources for further reading.
Literature (Citations)
- Includes appropriate citations for the relevant research papers and books.
DAT320: Basics: Preprocessing: Missing Values, Imputation and Interpolation
- Missing values in time series
- Global & local replacement
- Interpolation
- Different methods of handling missing values in time series
- Imputation and interpolation methods
- Related packages for R
Upsampling and Downsampling
- Upsampling (increasing resolution)
- Downsampling (decreasing resolution)
- Interpolation
Handling missing values
- Option 1: Remove missing values.
- Option 2: Replace missing values
- Replace by a fixed value, global mean/median, nearest neighbours (carry-forward or carry-back), rolling mean/median
- Linear interpolation
- Spline interpolation
Global & local replacement
- Description of global and local missing value replacement methods
Local missing value replacement-Carry
- Last observation carried forward
Local missing value replacement-Rolling average
- Different types of rolling average methods
- Linearly weighted rolling average
- Exponentially weighted rolling average
Interpolation
- Linear interpolation method.
- Spline interpolation method
Further reading
- Provides links to websites and resources for further learning.
DAT320: Basics: Exploratory Analysis of Time Series Data
- Covers various aspects of exploratory time series analysis
- Explores different representations of time series data
- Explores time axis and dimensionality
- Explores Data
Time series data
- Main characteristic: Temporal order
Trend & seasonality, Data representation
- Exploring characteristic features of time series data such as trend, seasonality and representation
Additional references and practical codes
- Additional resources for further research
- Practical examples for codes in different languages
Other topics in the course materials
- Other topics or aspects of the course material that were not mentioned above
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
This quiz explores various methods and algorithms used in discord detection within time series data. It covers concepts like brute-force search, distance-based outlier detection, and the characteristics of subsequence and collective outliers. Test your knowledge on the limitations and key properties involved in detecting anomalies in time series data.