Summary

This document provides an introduction to ARIMA models for time series analysis. It explains moving averages, stationarity, and how to determine ARIMA model parameters. The document also includes diagrams and explanations.

Full Transcript

# Understanding ARIMA Models ## Moving Average A moving average is a statistical calculation used in data analysis and time series forecasting to smooth out fluctuations or noise in a dataset. It is a technique that calculates the average of a set of data points within a specific window or period...

# Understanding ARIMA Models ## Moving Average A moving average is a statistical calculation used in data analysis and time series forecasting to smooth out fluctuations or noise in a dataset. It is a technique that calculates the average of a set of data points within a specific window or period and moves that window through the dataset, generating a series of averages. The resulting moving average line is often used to identify trends, patterns, and underlying patterns in the data. Moving averages are used in various fields for different purposes, and they offer several important advantages: - **Smoothing Data:** One of the primary purposes of moving averages is to smooth out short-term fluctuations or noise in data. This can make it easier to identify long-term trends and patterns. - **Trend Identification:** Moving averages are widely used in technical analysis in finance to identify trends in stock prices or other financial indicators. For example, a rising moving average may indicate an uptrend, while a declining moving average may suggest a downtrend. - **Forecasting:** Moving averages can be used to make short-term forecasts by extrapolating trends from historical data. For example, a moving average of past sales data might be used to predict future sales. - **Filtering Signals:** Moving averages can be used to filter out signals or events that are considered noise in a dataset. This can help in decision-making processes. - **Smoothing Seasonal Data:** In time series analysis, seasonal variations can be problematic. Moving averages can help in smoothing these variations, making it easier to analyze and model the data. ## Flowchart of Determining ARIMA Model Parameters ```mermaid graph LR A[Time Series] -->|NO| B{Stationary?} B -->|YES| E{ACF & PACF} B -->|NO| C{Power Transformation Differencing} C -->|NO| B C -->|YES| D{Stationary?} D -->|YES| E D -->|NO| B E -->|YES| F{Coefficient Estimation (β), Φ(B)} F -->|YES| G{Diagnostic Check?} G -->|YES| H{ARIMA Model} G -->|NO| F ``` ## Stationarity A flat-looking series, one whose properties do not depend on the time at which the series is observed. Here are some key statistical properties of stationary data: - **Constant Mean:** In stationary data, the mean (average) of the data remains constant over time. This means that the data does not exhibit any long-term trend or systematic changes in its central tendency. - **Constant Variance:** Stationary data also has a constant variance, meaning that the spread or dispersion of data points remains stable over time. There are no significant fluctuations in variability. - **Constant Autocovariance or Autocorrelation:** Autocovariance and autocorrelation are measures of the linear relationship between a data point and its lagged (past) values. In stationary data, the autocovariance and autocorrelation remain relatively constant over time, and there are no significant trends or patterns in these measures. - **No Seasonal Patterns:** Stationary data does not exhibit seasonality, which refers to recurring patterns or cycles at fixed time intervals (e.g., daily, weekly, or yearly). Seasonal patterns would imply that the data has a repeating, predictable structure. - **White Noise:** In some cases, stationary data can be considered as white noise, which is a type of random signal with constant mean and variance and uncorrelated values at different time points. White noise is often used as a baseline for comparison in time series analysis. To achieve stationarity in data, various transformations and differencing techniques can be applied, such as taking first differences, logarithmic transformations, or seasonal differencing. These techniques aim to remove trends and seasonality from the data, making it stationary and suitable for further analysis using statistical models. ## Examples of Stationary Time Series Data If the data is not stationary, then data has to be transformed into stationary: The two common methods to transform series into stationarity are: - **Transformation:** Using log or square root to stabilize the non-constant variance. - **Differencing:** Subtracts the current value from the previous one. Differencing can be done in different orders like first-order differencing (will have linear trend), second-order differencing (will have quadratic trends), etc. ## Differencing Subtracts the current value from the previous one. ## Partial Autocorrelation and Autocorrelation Often represented as ACF (Autocorrelation Function), is a crucial component when working with ARIMA models. It measures the correlation between a time series and its own lagged values. Analysts record time-series data by measuring a characteristic at evenly spaced intervals—such as daily, monthly, or yearly. The number of intervals between the two observations is the lag. For example, the lag between the current and previous observation is one. If you go back one more interval, the lag is two, and so on. The ACF plot helps identify the order of the AutoRegressive (AR) component of the ARIMA model. A significant correlation at a particular lag indicates that the time series depends on its past values up to that lag order. An autocorrelation plot is designed to show whether the elements of a time series are perfect postive correlation, perfect negative correlation, or independent (can be random) of each other. ## Randomness/White Noise/Independent For random data, autocorrelations should be near zero for all lags. Analysts also refer to this condition as white noise. Non-random data have at least one significant lag. When the data are not random, it's a good indication that you need to use a time series analysis or incorporate lags into a regression analysis to model the data appropriately. ## Partial Auto Correlation Function Often represented as PACF (Partial Auto Correlation Function), it displays only the correlation between two observations that the shorter lags between those observations do not explain. For example, the partial autocorrelation for lag 3 is only the correlation that lags 1 and 2 do not explain. In other words, the partial correlation for each lag is the unique correlation between those two observations after partialling out the intervening correlations. ## AR Model: Auto-Regressive Model Often you can forecast a series based solely on the past values in the series - called lags. It assumes that the past and future data are perfectly correlated and that the past will accurately reflect the future. - The model helps in future stock price calculation using the prices analysis from past data. $X_t = C + \phi_1 X_{t-1} + e_t$ where, - $X_{t-1}$ = value of X in the previous year/month/week. If “t” is the current year, then "t-1" will be the last. - $\phi_1$ = coefficient, which we multiply with $X_{t-1}$. The value of $\phi_1$ will always be 1 or -1. - $e_t$ = The difference between the period t value and the correct value ( $e_t = y_t - \hat{y}_t$) - p = The order. Thus, AR (1) is first order autoregressive model. The second and third order would be AR (2) and AR (3), respectively. ## MA Model: Moving Average Model Widely used tool in capital markets. It involves calculating a series of means from consecutive periods of numbers or values. MA can be continuously updated as new data becomes available, making it useful for tracking trends. It can be a lagging indicator, reflecting past data points. Formula Moving Average = $C_1 + C_2 + C_3.... C_n / N$ Where, - $C_1, C_2.... C_n$ stands for the closing numbers, prices, or balances. - N is the number of periods for which the average requires to be calculated. ## ARIMA: Autoregressive Integrated Moving Average Since it is a combination of the two models we previously discussed, the predictors in this model include the lagged values of $Y_t$ and the lagged errors. We call this an ARIMA (p,d,q) model where - p = order of the autoregressive part; - d = degree of first differencing involved; - q = order of the moving average part. ## Three Components of ARIMA 1. Auto Regressive (AR) 2. Integrated (I) 3. Moving Average (MA) ### Auto Regressive (AR) Is built on top of the autocorrelation concept, where the dependent variable depends on the past values of itself. ### Integrated (I) Attempts to convert the non-stationarity nature of the time-series data into stationary. ### Moving Average (MA) It attempts to reduce the noise in our time series data by performing some sort of aggregation operation to your past observations in terms of residual error ɛ. ## Time Series Plot of the Observed Series - Look for possible trend, seasonality, outliers, constant variance or non-constant variance ## ACF and PACF - The autocorrelation(ACF) plot shows the correlation of the series with itself at different time lags. - The partial autocorrelation function (PACF) plot shows the amount of autocorrelation at lag k that is not explained by lower-order autocorrelations. An AR(1) model has a single spike in the PACF and an ACF with a pattern. An AR(2) has two spikes in the PACF and a sinusoidal ACF that converges to 0. MA models have theoretical ACFs with non-zero values at the MA terms in the model and zero values elsewhere. The PACF will taper to zero in some fashion. - If the ACF and PACF do not tail off, but instead have values that stay close to 1 over many lags, the series is non-stationary and differencing will be needed. Try a first difference and then look at the ACF and PACF of the differenced data. - If the series autocorrelations are non-significant, then the series is random (white noise; the ordering matters, but the data are independent and identically distributed). You're done at that point. - If first differences were necessary and all the differenced autocorrelations are non-significant, then the original series is called a random walk and you are done.

Use Quizgecko on...
Browser
Browser