Podcast
Questions and Answers
What does MCAR stand for in the context of missing values?
What does MCAR stand for in the context of missing values?
Which type of missing value means the probability of missingness relates to the process studied?
Which type of missing value means the probability of missingness relates to the process studied?
What is the primary function of the R package imputeTS?
What is the primary function of the R package imputeTS?
Which method is used to replace missing values during upsampling?
Which method is used to replace missing values during upsampling?
Signup and view all the answers
In downsampling, when is interpolation necessary?
In downsampling, when is interpolation necessary?
Signup and view all the answers
What does MAR indicate in the context of missing values?
What does MAR indicate in the context of missing values?
Signup and view all the answers
Which of the following best describes a scenario of MCAR?
Which of the following best describes a scenario of MCAR?
Signup and view all the answers
What is the purpose of using interpolation in time series data?
What is the purpose of using interpolation in time series data?
Signup and view all the answers
What does the notation $w_i$ represent in the context of linearly weighted moving average calculations?
What does the notation $w_i$ represent in the context of linearly weighted moving average calculations?
Signup and view all the answers
Which formula correctly defines the weighted average based on the information provided?
Which formula correctly defines the weighted average based on the information provided?
Signup and view all the answers
In the context of rolling statistics for missing value replacement, what value does ℓ represent?
In the context of rolling statistics for missing value replacement, what value does ℓ represent?
Signup and view all the answers
What is the purpose of using exponentially weighted moving averages?
What is the purpose of using exponentially weighted moving averages?
Signup and view all the answers
Which of the following correctly describes the weights for $i > ℓ$ in the exponentially weighted moving average?
Which of the following correctly describes the weights for $i > ℓ$ in the exponentially weighted moving average?
Signup and view all the answers
What happens if all observations in the current window are NA?
What happens if all observations in the current window are NA?
Signup and view all the answers
What is the default weighting method used for imputation?
What is the default weighting method used for imputation?
Signup and view all the answers
What is the purpose of the maxgap parameter in na_* functions?
What is the purpose of the maxgap parameter in na_* functions?
Signup and view all the answers
Which formula correctly interpolates a missing value xt given its preceding and following non-missing values?
Which formula correctly interpolates a missing value xt given its preceding and following non-missing values?
Signup and view all the answers
Which of the following statements regarding interpolation is true?
Which of the following statements regarding interpolation is true?
Signup and view all the answers
What is a common issue when performing downsampling with a non-integer factor?
What is a common issue when performing downsampling with a non-integer factor?
Signup and view all the answers
Which of the following is NOT a method for replacing missing values?
Which of the following is NOT a method for replacing missing values?
Signup and view all the answers
What does the Last Observation Carried Forward (LOCF) method involve?
What does the Last Observation Carried Forward (LOCF) method involve?
Signup and view all the answers
What is a major drawback of using global mean or median for missing value replacement?
What is a major drawback of using global mean or median for missing value replacement?
Signup and view all the answers
What type of interpolation involves using non-missing neighboring values?
What type of interpolation involves using non-missing neighboring values?
Signup and view all the answers
Which of the following replacement methods considers the time sequence of data?
Which of the following replacement methods considers the time sequence of data?
Signup and view all the answers
What is a potential source of bias when using default values for missing data replacement?
What is a potential source of bias when using default values for missing data replacement?
Signup and view all the answers
When would you most likely utilize carry-backward interpolation?
When would you most likely utilize carry-backward interpolation?
Signup and view all the answers
In time series analysis, why is simply removing missing values usually not an option?
In time series analysis, why is simply removing missing values usually not an option?
Signup and view all the answers
Which of the following best describes local missing value replacement techniques?
Which of the following best describes local missing value replacement techniques?
Signup and view all the answers
What does the parameter ℓ represent in the context of rolling average imputation?
What does the parameter ℓ represent in the context of rolling average imputation?
Signup and view all the answers
Which method of imputation can give more weight to recent observations?
Which method of imputation can give more weight to recent observations?
Signup and view all the answers
In the context of the provided content, which statement about α is true?
In the context of the provided content, which statement about α is true?
Signup and view all the answers
What does the function na_ma in the package imputeTS primarily focus on?
What does the function na_ma in the package imputeTS primarily focus on?
Signup and view all the answers
What is the significance of the parameter k in the na_ma function?
What is the significance of the parameter k in the na_ma function?
Signup and view all the answers
Which of these visual representations primarily shows imputed values over original data?
Which of these visual representations primarily shows imputed values over original data?
Signup and view all the answers
In a plain rolling average with ℓ = 2, how many data points are averaged for each observation?
In a plain rolling average with ℓ = 2, how many data points are averaged for each observation?
Signup and view all the answers
What kind of data alteration does missing value replacement via rolling averages perform?
What kind of data alteration does missing value replacement via rolling averages perform?
Signup and view all the answers
What impact does using a larger ℓ in rolling averages have on the resulting values?
What impact does using a larger ℓ in rolling averages have on the resulting values?
Signup and view all the answers
Which type of average is less sensitive to outliers in a dataset?
Which type of average is less sensitive to outliers in a dataset?
Signup and view all the answers
What is the primary advantage of spline interpolation over linear interpolation?
What is the primary advantage of spline interpolation over linear interpolation?
Signup and view all the answers
What degree of polynomial is typically used in spline interpolation?
What degree of polynomial is typically used in spline interpolation?
Signup and view all the answers
Which of the following is NOT a requirement for spline interpolation?
Which of the following is NOT a requirement for spline interpolation?
Signup and view all the answers
In spline interpolation, what must be true about the polynomials at the knots?
In spline interpolation, what must be true about the polynomials at the knots?
Signup and view all the answers
What is one of the main limitations of using high-degree polynomial interpolation?
What is one of the main limitations of using high-degree polynomial interpolation?
Signup and view all the answers
What is the purpose of fitting polynomial parameters in spline interpolation?
What is the purpose of fitting polynomial parameters in spline interpolation?
Signup and view all the answers
How does spline interpolation handle regions of missing data?
How does spline interpolation handle regions of missing data?
Signup and view all the answers
What does the term ‘knots’ refer to in spline interpolation?
What does the term ‘knots’ refer to in spline interpolation?
Signup and view all the answers
Which of the following approaches is suitable when connecting more than two points in spline interpolation?
Which of the following approaches is suitable when connecting more than two points in spline interpolation?
Signup and view all the answers
How does the derivative condition at the knots affect the spline function?
How does the derivative condition at the knots affect the spline function?
Signup and view all the answers
Study Notes
Introduction to DAT320
- Course: DAT320: Basics
- Topic: Preprocessing: Missing Values, Imputation, and Interpolation
- Institution: Norwegian University of Life Sciences
- Semester: Autumn 2024
- Instructor: Hans Ekkehard Plesser
Missing Values in Time Series
- Missing values in time series data require careful handling.
- Methods include global and local replacement, and interpolation.
Different Ways Missing Values Occur
- Missing Completely At Random (MCAR): Missing values unrelated to the data.
- Missing At Random (MAR): Missing values depend on other data.
- Missing Not At Random (MNAR): Missing values depend on the variable itself.
Handling Missing Values
- Removing Missing Values: Not generally useful for time series.
- Replacing Missing Values: Substituting with a fixed value, or mean/median or interpolation, or using rolling mean/median/weighted mean, or linear/spline interpolation.
- Interpolation: Using known values to predict missing values.
Handling Missing Values in Time Series
- Consecutive Missing Values (sub-periods of missing values)
- Single Missing Values
- Using packages in R like imputeTS to handle missing values in time series
Upsampling and Downsampling
- Upsampling (increasing resolution): Missing values need interpolation.
- Downsampling (decreasing): Can require interpolation if non-integer factor used.
Global Missing Value Replacement
- Default (often 0 or 1)
- Global mean/median may be computed from the data
- Using random data might not be suitable as it doesn't consider trends or seasonality.
Local Missing Value Replacement - Carry
- Last Observation Carried Forward (LOCF): Replaces missing values with the last observed value.
- Next Observation Carried Backward (NOCB): Replaces missing values with the next observed value.
Local Missing Value Replacement - Rolling Average
- Rolling Average/Replacement (moving average): Replaces missing values with the mean from local neighbors.
- Other methods include median, weighted average.
- Linearly and Exponentially weighted moving average may be used.
Interpolation
- Linear Interpolation: Missing values are interpolated linearly between previous and next non-missing values. The formula for calculating the missing values is provided.
- Issues with linear interpolation: It is not always sufficient to capture the behavior of the time series.
- Spline Interpolation: An alternative approach where it uses piecewise polynomial functions rather than connecting straight lines. The process of using spline interpolation is explained in detail.
- Choosing an interpolation method: Considerations when selecting linear or spline methods for interpolation.
Spline Interpolation
- Spline interpolation approach.
- A method for fitting polynomials to data.
- Polynomial degree considerations.
- Function parameters.
- Detail about the use of knot coordinates and polynomial degrees to predict values correctly.
Software details
- Package (R package imputeTS): for missing value handling in time series data.
-
na_ma
function and parameterk
in a R packageimputeTS
for rolling average imputation. -
maxgap
parameter for preventing imputation of long gaps.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
This quiz explores methods for preprocessing missing values in time series data, focusing on imputation and interpolation techniques. Understand the different scenarios in which missing values occur and how to effectively handle them in data analysis. Perfect for students of DAT320 at the Norwegian University of Life Sciences.