DAT320: Handling Missing Values
48 Questions
1 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What does MCAR stand for in the context of missing values?

  • Missing Can Avoid Randomness
  • Missing Conditioned on Randomness
  • Missing Completely at Random (correct)
  • Missing Consistently at Random
  • Which type of missing value means the probability of missingness relates to the process studied?

  • MNAR (correct)
  • MCAR
  • NA
  • MAR
  • What is the primary function of the R package imputeTS?

  • To handle missing values (correct)
  • To perform statistical tests
  • To visualize time series data
  • To analyze data trends
  • Which method is used to replace missing values during upsampling?

    <p>Interpolation</p> Signup and view all the answers

    In downsampling, when is interpolation necessary?

    <p>When a non-integer factor is applied</p> Signup and view all the answers

    What does MAR indicate in the context of missing values?

    <p>Missing at Random</p> Signup and view all the answers

    Which of the following best describes a scenario of MCAR?

    <p>Data missing purely by chance</p> Signup and view all the answers

    What is the purpose of using interpolation in time series data?

    <p>To estimate values for missing data points</p> Signup and view all the answers

    What does the notation $w_i$ represent in the context of linearly weighted moving average calculations?

    <p>The weight assigned to time series values at different lags</p> Signup and view all the answers

    Which formula correctly defines the weighted average based on the information provided?

    <p>$ ext{µ}_w(x) = w^T x$</p> Signup and view all the answers

    In the context of rolling statistics for missing value replacement, what value does ℓ represent?

    <p>The range within which to calculate the average</p> Signup and view all the answers

    What is the purpose of using exponentially weighted moving averages?

    <p>To reduce the impact of older observations on the average</p> Signup and view all the answers

    Which of the following correctly describes the weights for $i > ℓ$ in the exponentially weighted moving average?

    <p>$w_i = C imes (1 - eta)^{i - ℓ - 1}$</p> Signup and view all the answers

    What happens if all observations in the current window are NA?

    <p>The window size is automatically increased until there are two non-NA values.</p> Signup and view all the answers

    What is the default weighting method used for imputation?

    <p>Exponential weighting</p> Signup and view all the answers

    What is the purpose of the maxgap parameter in na_* functions?

    <p>To limit the maximum size of gaps allowed for imputation.</p> Signup and view all the answers

    Which formula correctly interpolates a missing value xt given its preceding and following non-missing values?

    <p>xt = (t - s1)/(s2 - s1) * xs2 + (s2 - t)/(s2 - s1) * xs1</p> Signup and view all the answers

    Which of the following statements regarding interpolation is true?

    <p>Missing values are filled using values from adjacent non-missing observations.</p> Signup and view all the answers

    What is a common issue when performing downsampling with a non-integer factor?

    <p>Creation of missing values</p> Signup and view all the answers

    Which of the following is NOT a method for replacing missing values?

    <p>Randomly deleting data points</p> Signup and view all the answers

    What does the Last Observation Carried Forward (LOCF) method involve?

    <p>Replacing missing data with the prior non-missing observation</p> Signup and view all the answers

    What is a major drawback of using global mean or median for missing value replacement?

    <p>It does not account for variations within time series</p> Signup and view all the answers

    What type of interpolation involves using non-missing neighboring values?

    <p>Linear interpolation</p> Signup and view all the answers

    Which of the following replacement methods considers the time sequence of data?

    <p>Spline interpolation</p> Signup and view all the answers

    What is a potential source of bias when using default values for missing data replacement?

    <p>It may be too high or low compared to actual values</p> Signup and view all the answers

    When would you most likely utilize carry-backward interpolation?

    <p>To fill gaps in historical data</p> Signup and view all the answers

    In time series analysis, why is simply removing missing values usually not an option?

    <p>It disrupts the regular time axis</p> Signup and view all the answers

    Which of the following best describes local missing value replacement techniques?

    <p>They rely on historical data rather than overall data trends</p> Signup and view all the answers

    What does the parameter ℓ represent in the context of rolling average imputation?

    <p>The length of the averaging window</p> Signup and view all the answers

    Which method of imputation can give more weight to recent observations?

    <p>Exponential weighted average</p> Signup and view all the answers

    In the context of the provided content, which statement about α is true?

    <p>α affects the weight assigned in the exponential weighted average</p> Signup and view all the answers

    What does the function na_ma in the package imputeTS primarily focus on?

    <p>Performing rolling average imputation</p> Signup and view all the answers

    What is the significance of the parameter k in the na_ma function?

    <p>It defines the range of rolling average calculations considered</p> Signup and view all the answers

    Which of these visual representations primarily shows imputed values over original data?

    <p>Density plot with separate lines for original and imputed values</p> Signup and view all the answers

    In a plain rolling average with ℓ = 2, how many data points are averaged for each observation?

    <p>Two observations, one on each side</p> Signup and view all the answers

    What kind of data alteration does missing value replacement via rolling averages perform?

    <p>It estimates and replaces missing values</p> Signup and view all the answers

    What impact does using a larger ℓ in rolling averages have on the resulting values?

    <p>It decreases the impact of fluctuations</p> Signup and view all the answers

    Which type of average is less sensitive to outliers in a dataset?

    <p>Linear weighted average</p> Signup and view all the answers

    What is the primary advantage of spline interpolation over linear interpolation?

    <p>It can use higher degree polynomials for more accuracy.</p> Signup and view all the answers

    What degree of polynomial is typically used in spline interpolation?

    <p>Cubic</p> Signup and view all the answers

    Which of the following is NOT a requirement for spline interpolation?

    <p>At least three data points</p> Signup and view all the answers

    In spline interpolation, what must be true about the polynomials at the knots?

    <p>They must meet specific derivative conditions.</p> Signup and view all the answers

    What is one of the main limitations of using high-degree polynomial interpolation?

    <p>It can lead to overfitting.</p> Signup and view all the answers

    What is the purpose of fitting polynomial parameters in spline interpolation?

    <p>To ensure the polynomial accurately intersects the data points.</p> Signup and view all the answers

    How does spline interpolation handle regions of missing data?

    <p>It utilizes multiple piecewise cubic polynomials.</p> Signup and view all the answers

    What does the term ‘knots’ refer to in spline interpolation?

    <p>The points where the polynomial changes form.</p> Signup and view all the answers

    Which of the following approaches is suitable when connecting more than two points in spline interpolation?

    <p>Utilize spline functions with cubic polynomials for segments.</p> Signup and view all the answers

    How does the derivative condition at the knots affect the spline function?

    <p>It guarantees that the splines smoothly connect each segment.</p> Signup and view all the answers

    Study Notes

    Introduction to DAT320

    • Course: DAT320: Basics
    • Topic: Preprocessing: Missing Values, Imputation, and Interpolation
    • Institution: Norwegian University of Life Sciences
    • Semester: Autumn 2024
    • Instructor: Hans Ekkehard Plesser

    Missing Values in Time Series

    • Missing values in time series data require careful handling.
    • Methods include global and local replacement, and interpolation.

    Different Ways Missing Values Occur

    • Missing Completely At Random (MCAR): Missing values unrelated to the data.
    • Missing At Random (MAR): Missing values depend on other data.
    • Missing Not At Random (MNAR): Missing values depend on the variable itself.

    Handling Missing Values

    • Removing Missing Values: Not generally useful for time series.
    • Replacing Missing Values: Substituting with a fixed value, or mean/median or interpolation, or using rolling mean/median/weighted mean, or linear/spline interpolation.
    • Interpolation: Using known values to predict missing values.

    Handling Missing Values in Time Series

    • Consecutive Missing Values (sub-periods of missing values)
    • Single Missing Values
    • Using packages in R like imputeTS to handle missing values in time series

    Upsampling and Downsampling

    • Upsampling (increasing resolution): Missing values need interpolation.
    • Downsampling (decreasing): Can require interpolation if non-integer factor used.

    Global Missing Value Replacement

    • Default (often 0 or 1)
    • Global mean/median may be computed from the data
    • Using random data might not be suitable as it doesn't consider trends or seasonality.

    Local Missing Value Replacement - Carry

    • Last Observation Carried Forward (LOCF): Replaces missing values with the last observed value.
    • Next Observation Carried Backward (NOCB): Replaces missing values with the next observed value.

    Local Missing Value Replacement - Rolling Average

    • Rolling Average/Replacement (moving average): Replaces missing values with the mean from local neighbors.
    • Other methods include median, weighted average.
    • Linearly and Exponentially weighted moving average may be used.

    Interpolation

    • Linear Interpolation: Missing values are interpolated linearly between previous and next non-missing values. The formula for calculating the missing values is provided.
    • Issues with linear interpolation: It is not always sufficient to capture the behavior of the time series.
    • Spline Interpolation: An alternative approach where it uses piecewise polynomial functions rather than connecting straight lines. The process of using spline interpolation is explained in detail.
    • Choosing an interpolation method: Considerations when selecting linear or spline methods for interpolation.

    Spline Interpolation

    • Spline interpolation approach.
    • A method for fitting polynomials to data.
    • Polynomial degree considerations.
    • Function parameters.
    • Detail about the use of knot coordinates and polynomial degrees to predict values correctly.

    Software details

    • Package (R package imputeTS): for missing value handling in time series data.
    • na_ma function and parameter k in a R package imputeTS for rolling average imputation.
    • maxgap parameter for preventing imputation of long gaps.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Description

    This quiz explores methods for preprocessing missing values in time series data, focusing on imputation and interpolation techniques. Understand the different scenarios in which missing values occur and how to effectively handle them in data analysis. Perfect for students of DAT320 at the Norwegian University of Life Sciences.

    More Like This

    Missing Values Challenge
    11 questions

    Missing Values Challenge

    AccomplishedBixbite avatar
    AccomplishedBixbite
    Missing Values Mastery
    6 questions

    Missing Values Mastery

    AccomplishedBixbite avatar
    AccomplishedBixbite
    Use Quizgecko on...
    Browser
    Browser