Discord Detection in Time Series Analysis
42 Questions
2 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the main method used in brute-force search for discord detection?

  • Using heuristics for optimization
  • Comparing all pairs of subsets to each other (correct)
  • Employing machine learning techniques
  • Analyzing time series trends
  • Which algorithm is mentioned as a method that utilizes heuristics in discord detection?

  • Random Forest
  • HOT-SAX algorithm (correct)
  • Support Vector Machines
  • K-means clustering
  • What is a limitation of the discord detection process?

  • Always requires clustering
  • Utilizes a predefined window width δ (correct)
  • Requires a large dataset
  • Delivers multiple unusual sequences
  • How is the reference subsequence $ar{x}$ obtained in distance-based outlier detection?

    <p>Using a cluster from the same time series</p> Signup and view all the answers

    In the context of distance-based outlier detection, what condition indicates an outlier?

    <p>When $d(x_Ti, ar{x}) &gt; au$</p> Signup and view all the answers

    What defines a time series subsequence in the context of discord detection?

    <p>A sliding window defined by specific time points and a fixed interval</p> Signup and view all the answers

    In discord detection, what does the discord subset Tk represent?

    <p>The subset which is most different to its most similar subset</p> Signup and view all the answers

    What is the purpose of finding the discord subset in a time series?

    <p>To determine patterns that are unrelated to the majority of the data</p> Signup and view all the answers

    Which mathematical operation is used to optimize the selection of the discord subset Tk?

    <p>Minimum distance between dissimilar subsequences</p> Signup and view all the answers

    How is the sliding window defined for extracting subsequences in discord detection?

    <p>Tk = tk - 2δ to tk + 2δ for some tk and fixed δ</p> Signup and view all the answers

    What is one key property of collective outliers in time series data?

    <p>Length of the subsequence</p> Signup and view all the answers

    Which of these represents different approaches to detecting collective outliers?

    <p>Distance-based approaches</p> Signup and view all the answers

    What aspect of subsequence outliers relates to their repetition patterns?

    <p>Periodicity of subsequence outliers</p> Signup and view all the answers

    Which option correctly differentiates the types of subsequences in collective outlier detection?

    <p>Fixed-length versus variable-length</p> Signup and view all the answers

    What is discord detection primarily used for?

    <p>Detecting anomalies in subsequences</p> Signup and view all the answers

    What do model-based approaches involve in the context of collective outliers?

    <p>Creating models for anomaly detection</p> Signup and view all the answers

    In collective outlier detection, what does the representation of subsequences pertain to?

    <p>The models and transformations applied</p> Signup and view all the answers

    Which of the following properties is not typically associated with collective outliers?

    <p>Temperature regulations</p> Signup and view all the answers

    What method is utilized when referencing the same time series for clustering?

    <p>Clustering</p> Signup and view all the answers

    Which approach focuses on utilizing external time series?

    <p>Feature-based classification</p> Signup and view all the answers

    What does a reference value relate to in distance-based approaches?

    <p>A benchmark for comparison</p> Signup and view all the answers

    In the context of distance-based approaches, what could be used instead of seasonal average?

    <p>Median</p> Signup and view all the answers

    What does collective outlier detection typically rely on?

    <p>Mean absolute deviation from a reference</p> Signup and view all the answers

    Which approach is NOT mentioned in relation to time series?

    <p>Seasonal decomposition</p> Signup and view all the answers

    What is a significant advantage of referencing data patterns externally?

    <p>Enhanced clustering capabilities</p> Signup and view all the answers

    Which of the following is NOT a distance-based approach?

    <p>Feature-based classification</p> Signup and view all the answers

    What does the expression $\text{arg max } \text{min } d(x_{Ti}, x_{Tj})$ represent in the context of discord detection?

    <p>The maximum value of the minimum distance between non-overlapping subsets</p> Signup and view all the answers

    In discord detection, what role does the expression $d(x_{Ti}, x_{Tj})$ serve?

    <p>It acts as a 1-nearest neighbor distance.</p> Signup and view all the answers

    What condition must the subsets $T_i$ and $T_j$ satisfy in the given expression?

    <p>They must be non-overlapping.</p> Signup and view all the answers

    What does the term 'discord detection' refer to in this context?

    <p>Finding anomalies within non-overlapping data subsets.</p> Signup and view all the answers

    What implication does the term 'single linkage' have in the context of distance measurement?

    <p>It reflects the shortest distance between two clusters.</p> Signup and view all the answers

    How can one effectively find discord subsets based on the information provided?

    <p>By identifying minimum distances in non-overlapping subsets.</p> Signup and view all the answers

    Which of the following best describes the method associated with discord detection?

    <p>It compares non-overlapping subsets to find anomalies.</p> Signup and view all the answers

    Which statement is NOT true regarding the non-overlapping subsets in discord detection?

    <p>They can have identical elements.</p> Signup and view all the answers

    What is a critical step in the model-based approach for predictive modeling?

    <p>Train the model on historical data.</p> Signup and view all the answers

    In the context of model-based approaches, what does the symbol $|xi − x̂i| > τ$ represent?

    <p>The distance between the actual and predicted values exceeding a threshold.</p> Signup and view all the answers

    What does the term 'δ steps into the future' refer to in model-based forecasting?

    <p>The number of future points predicted based on the model.</p> Signup and view all the answers

    How does the model-based approach calculate the prediction error?

    <p>By comparing predicted values to a reference in a specified interval.</p> Signup and view all the answers

    What type of model is indicated by 'seasonal naive model' in the context of model-based approaches?

    <p>A model that predicts based on past seasonal trends.</p> Signup and view all the answers

    What is the primary focus of the model-based approach in predictive analysis?

    <p>Utilizing previous data effectively to predict future outcomes.</p> Signup and view all the answers

    What is likely indicated by the reference to 'window = 148' in the figure mentioned?

    <p>The length of the historical data used for analysis.</p> Signup and view all the answers

    Which statement about the model-based approach is correct?

    <p>It requires training on historical data to predict future values.</p> Signup and view all the answers

    Study Notes

    Norwegian University of Life Sciences

    • Norwegian University of Life Sciences is a university of applied sciences.
    • It is located in Norway.

    DAT320: Outlier detection

    • Focuses on detecting collective outliers in time series data.
    • Course code: DAT320
    • Instructor: Kristian Hovde Liland
    • Autumn 2024

    Collective outliers in time series

    • Key properties of collective outliers (subsequences):
      • Length of the subsequence
      • Fixed-length versus variable-length
      • Representation of the subsequence
      • Models, transformations
      • Periodicity of subsequence outliers
      • Non-periodic sequences
      • Periodic sequences
    • Based on [Blázquez-García et al., 2021] and [Gupta et al., 2014]

    Approaches for detecting collective outliers

    • Discord detection
    • Distance-based approaches
    • Model-based approaches

    Discord detection

    • Concept: determine "most unusual subsequence" (discord).
    • Define time series subsequences by sliding window T = [t −2, t + 2) for some tk and fixed δ.
    • Discord subset Tk (the subset which is most different to its most similar subset)

    Distance-based approaches

    • Discord detection: pairwise comparison between subsequences.
    • Concept of distance-based outlier detection: comparison of subsequence to reference ("normal" subsequence x)
    • d(xT1, x) > τ
    • How to obtain the reference x?: clustering
    • Reference from external time series → feature-based classification/clustering, or dictionary of patterns

    Model-based approaches

    • Related to distance-based approach with reference from same time series (history).
    • Concept: train predictive model and measure distance between prediction and reference in interval.
    • Train model on {1,...,t}.
    • Predict &h steps into the future Xt+1,..., Xt+h.

    DAT320: Point outliers

    • Approaches for point outliers in time series
      • Temporal windowing
      • Model-based approaches
      • Distribution-based approaches
      • Multivariate time series
    • Outliers in time series (uni/multivariate)
      • Ignore temporal component
      • Temporal windowing
      • Customized outlier detectors
      • Univariate vs multivariate time series
      • Global vs local (contextual) outlier
      • Based on [Blázquez-García et al., 2021]

    Time series as random samples

    • Baseline concept: ignore temporal dependencies.
    • Works well for global outliers.
    • Easy to implement.
    • Is not capable of detecting contextual outliers.
    • May be distorted by trend & seasonalities.
    • Same methodology as for random samples (uni-/multivariate data)

    Temporal windowing

    • Concept: divide time axis into windows [t-δ/2, t+δ/2] such that δ>0.
    • applies non-temporal method to each window.
    • Works well for seasonal patterns.
    • Same methodology as for random samples (uni-/multivariate data).
    • Hard to determine the window size δ

    Density-based approaches

    • Concept: outliers have only few neighbors.
      • xt is an outlier ⇔|{x ∈ {Xt-t, ..., Xt+}\d(x, xt) ≤ ε} ) ≤ τ .
    • Applied in temporal window
      • KNN distance
      • DBSCAN
      • LOF

    Model-based approaches

    • Concept: time series models are able to describe outliers well.
      • xt is outlier | Xt - Xt | ->τ.
    • Estimation-based
      • Model is trained based on all values
      • Outliers produce large residuals
    • Prediction-based
      • Model is trained only on history {x₁,..., xt-1}
      • Outliers produce inaccurate predictions Xt

    Distance-based approaches

    • Based on distance from a reference model
    • Reference from the original time series. -> apply clustering
    • Reference from an external source such as another dataset, external data sources. -> Feature based classification or clustering or dictionary of patterns

    R code

    • Provides code snippets for outlier detection using various methods.
    • Includes libraries for time series analysis.

    Literature

    • Includes references to research papers and books that discuss time series analysis, outlier detection and other relevant topics.

    R code (Forecasting)

    • Provides R code examples for forecasting.
    • Uses various functions & packages such as library(forecast), library(datasets), library(imputeTS), etc.

    R code

    • Includes various forecasting models: ARIMA, ARIMAX, dynamic regression, and others

    Forecasting using ARIMA

    • How to handle non-stationarity to predict future values of target variables:

    ARIMA & dynamic regression model

    • ARIMAX is an ARIMA with exogenous inputs
    • Variable of interest: xt
      • Additional variable with influence on xt (covariate, exogenous input): zt

    Dynamic regression with ARIMA errors

    • Alternative: Combine "classical" regression setup with ARIMA error model
    • Uses i.i.d. model error assumption

    Distributed lag model (DLM)

    • Distributed lag model assumes that xt is exclusively described by the history of zt
    • Leads to simplification compared to dynamic regression or to the standard linear regression setup

    VARIMA model

    • Multivariate forecasting using ARIMA

    Causality versus correlation

    • Correlation indicates a relationship but not necessarily causation.

    Granger causality

    • Concept
    • Usefulness

    R code (Granger Causality Testing)

    • Provide example codes for calculating Granger causality

    Further reading

    • Provides links to various resources for further learning.

    Literature (including references)

    • Lists research papers and books on time series analysis and forecasting.

    DAT320: Segmentation | Classification | Clustering: Distance-based Methods

    • Classification & clustering problems in time series data
      • Comparing time series segments to each other (clustering)
      • Modeling a (time-independent) target variable from a time series segment (classification)
    • Time series segmentation & change-point detection
    • Approaches
      • Distance-based time series clustering and classification
      • Feature-based
      • Model-based

    Distance measures

    • How to quantify "distance" or "similarity"?
    • Recap: Distance metrics.
    • Minkowski distance
    • Euclidean distance
    • Manhattan distance
    • Maximum distance

    Dynamic Time Warping (DTW)

    • DTW calculates mapping from one time series to another time series, while also accounting for the temporal order.

    Correlation-based measures

    • Autocorrelation-based measure: Quantify similarity using autocorrelation function (ACF).
    • Properties of ACF-based distance metrics.
    • Not suitable for evaluating single patterns e.g., peaks or direction of trend.

    Model-based measures

    • Mainly based on ARIMA models
    • Training an ARIMA model with optimized parameters (AIC)
    • Determine distance between model parameters
    • Variants of ARIMA models
    • Maharaj distance
    • Properties of model-based measures
    • Robust to scaling or shifts in the time axis
    • Cannot evaluate single patterns (e.g., peaks, direction trends)
    • Additional requirements

    Clustering of time series

    • Distance measures are key to clustering algorithms.
    • Recap: Clustering algorithms.
      • Hierarchical clustering
      • Centroid-based clustering (k-means, k-medoids)
      • Density-based clustering (DBSCAN, OPTICS)
    • Dendrogram
    • Cluster evaluation

    R code (Clustering)

    • Provides example codes for hierarchical clustering using R packages.

    Further Reading

    • Provides links to resources for further reading.

    Literature (Citations)

    • Includes appropriate citations for the relevant research papers and books.

    DAT320: Basics: Preprocessing: Missing Values, Imputation and Interpolation

    • Missing values in time series
      • Global & local replacement
      • Interpolation
    • Different methods of handling missing values in time series
    • Imputation and interpolation methods
    • Related packages for R

    Upsampling and Downsampling

    • Upsampling (increasing resolution)
    • Downsampling (decreasing resolution)
    • Interpolation

    Handling missing values

    • Option 1: Remove missing values.
    • Option 2: Replace missing values
      • Replace by a fixed value, global mean/median, nearest neighbours (carry-forward or carry-back), rolling mean/median
      • Linear interpolation
      • Spline interpolation

    Global & local replacement

    • Description of global and local missing value replacement methods

    Local missing value replacement-Carry

    • Last observation carried forward

    Local missing value replacement-Rolling average

    • Different types of rolling average methods
    • Linearly weighted rolling average
    • Exponentially weighted rolling average

    Interpolation

    • Linear interpolation method.
    • Spline interpolation method

    Further reading

    • Provides links to websites and resources for further learning.

    DAT320: Basics: Exploratory Analysis of Time Series Data

    • Covers various aspects of exploratory time series analysis
    • Explores different representations of time series data
    • Explores time axis and dimensionality
    • Explores Data

    Time series data

    • Main characteristic: Temporal order

    Trend & seasonality, Data representation

    • Exploring characteristic features of time series data such as trend, seasonality and representation

    Additional references and practical codes

    • Additional resources for further research
    • Practical examples for codes in different languages

    Other topics in the course materials

    • Other topics or aspects of the course material that were not mentioned above

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    Description

    This quiz explores various methods and algorithms used in discord detection within time series data. It covers concepts like brute-force search, distance-based outlier detection, and the characteristics of subsequence and collective outliers. Test your knowledge on the limitations and key properties involved in detecting anomalies in time series data.

    More Like This

    Use Quizgecko on...
    Browser
    Browser