AQA L Past Paper PDF
Document Details
Uploaded by PopularMint9764
AQA
Tags
Summary
This document provides an introduction to time series analysis. It covers various types of data and specific examples of financial time series. It also explores the concepts of stochastic processes and their applications.
Full Transcript
1. Types of data a. Cross-Section Data i. Set of random observations on one or more variables collected over the same constant time intervals or time frequencies, such daily, weekly, monthly, quarterly or annually. We use the notation Yi to indi...
1. Types of data a. Cross-Section Data i. Set of random observations on one or more variables collected over the same constant time intervals or time frequencies, such daily, weekly, monthly, quarterly or annually. We use the notation Yi to indicate an observation on variable Y for individual i. In the cross-sectional data, we collect data over time for two or more cross-sectional variables. b. Times Series Data i. Set of random observations on one specific variable collected over constant time intervals or time frequencies, such daily, weekly, monthly, quarterly or annually. We use the notation Yt to indicate an observation on variable Y for time t. In time series data, we collect data over time for one cross-sectional variable. c. Panel data i. Data sets of random observations on both a time series and a cross-sectional data. For instance, research involving portfolio choice might use data on the return earned by many companies’ shares for many months. We will use the notation Yit to indicate an observation on variable Y for unit i at time t. 2. Financial time series a. Financial Time Series Data i. A sequence of random observations or measurements of a single financial variable collected at regular intervals over time, such as daily, weekly, monthly, quarterly, or annually. This data allows for the analysis of trends, patterns, and behaviors within the financial domain. b. Financial Variables i. These are specific phenomena of interest in the financial market, including stock prices, interest rates, exchange rates, and other economic indicators. Observations of financial variables are recorded at particular time points, providing the basis for time series analysis. c. Financial Data i. Comprising individual random measurement units, financial data is directly related to the financial variable of interest. This data reflects specific values of the variable at distinct time intervals, enabling quantitative analysis of financial trends and forecasting. 3. Example of financial time series data a. 4. Classification with respect to time units: a. time series in discrete time b. time series in continuous time 5. Classification with respect to data (values): a. Discrete time series b. Continuous time series 6. later the same for stochastic processes and model a. Continuous time and discrete time b. Time is continuous, but data are usually reported at discrete points in time Thus, sampling a continuous time series leads to a discrete time series. 7. Typical observed time series a. Interestrates b. Property prices Exchangerates/returns c. Stock prices/returns d. Index values/returns Commodity prices/returns 8. Stochastic processes a. Stochastic processes, also known as random processes, are mathematical models that describe the evolution of a system or collection of random variables indexed by time or space, under uncertainty. In these models, the future state of the system is probabilistic and depends on the current state, reflecting the inherent randomness over time. b. The study of stochastic processes involves examining how these random variables evolve, typically characterized by trajectories and probability density functions. These processes can be classified based on time (discrete or continuous) and state space (discrete or continuous). Examples include: c. Brownian Motion: A continuous-time process often used to model random movements in finance and physics. d. Poisson Processes: Used to model events occurring at random points in time, commonly in queueing theory. e. Markov Chains: A type of discrete-time stochastic process where the future state depends only on the current state (memoryless property). f. Stochastic processes can be either stationary (with statistical properties constant over time) or nonstationary. They are classified as discrete-time if the time variable takes positive integer values, and continuous-time if the time variable takes positive real values. g. In time series analysis, a time series can be considered as a single realization or sample from a stochastic process, which represents the underlying “population.” In this sense, stochastic processes provide the broader theoretical framework from which time series are derived. 9. Random variable a. In time series analysis, a random process can produce a time series where each point in time corresponds to a random variable with a specific probability distribution. The outcome of each variable is determined by chance, making it possible for the time series to display various patterns, trends, or randomness over time. b. Random variable i. A random variable is a variable that can assume a range of values, both positive and negative, with each value occurring according to a specific probability. Random variables are fundamental in probability theory and are used to quantify uncertain outcomes. For example, the result of rolling a die or the return on a stock over a day can be considered random variables. c. A function i. A function is a mathematical rule that pairs each input with exactly one output. It defines a relationship between two sets of values, typically called the domain (inputs) and the range (outputs). For each element in the domain, the function assigns precisely one corresponding element in the range, ensuring that no input maps to multiple outputs. ii. For example, if the total interest "I" depends on the time "t" money is invested, we can express this relationship by saying that "I is a function of t. iii. In simpler terms, a function ensures that every "x" value (input) has one, and only one, "y" value (output) associated with it. 10. Time series a. A time series is a sequence of observations recorded at successive points in time, specifically indexed by time, which differentiates it from general stochastic processes. Common examples include daily stock prices, monthly sales data, or annual GDP figures. Time series are typically measured at equally spaced intervals, and the data often exhibit patterns such as trends, seasonality, or cycles. 11. Key Aspects of Time Series Analysis a. Objective: Time series analysis aims to identify historical patterns and relationships in data to make informed forecasts about future values. This is essential in fields like economics, finance, meteorology, and more. b. Relation to Stochastic Processes: All time series are a form of stochastic process, but not all stochastic processes are time series. Stochastic processes are broader, involving various types of indexed random phenomena, while time series are specifically indexed by time and are typically finite and observed. c. Characteristics: i. The observations are ordered and often exhibit dependency, meaning the sequence matters. Changing the order can alter the meaning of the data due to dependencies between observations. ii. Time series data often exhibit dependency. Data is not necessarily independent and identically distributed, unlike in standard linear regression. iii. These unique ordering and dependency characteristics distinguish time series analysis from other statistical methods, as they 12. Purpose and application of Time Series Analysis a. Time series analysis originated as a tool primarily aimed at forecasting future values based on historical data. b. Its broader applications include: i. Understanding the Underlying Dynamics: By analyzing past data, time series analysis seeks to uncover the forces and structure that drive the observed values. This can help in identifying trends, cycles, seasonality, and other patterns that reveal insights about the process generating the data. ii. Modeling and Forecasting: Once patterns are identified, a suitable model can be fit to the data. This model serves as the foundation for forecasting future values, allowing for more informed decision-making. Forecasting is one of the most crucial applications of time series analysis, particularly in fields like finance, economics, and environmental sciences. iii. Control and Monitoring: Beyond forecasting, time series models can be applied in monitoring and controlling systems. Techniques such as feedback and feedforward control help in maintaining desired system behaviors or adjusting inputs based on observed outcomes. This is especially relevant in engineering and operational contexts. 13. Time Series analysis (time series Plot a. A time series plot is a graphical representation that displays data points in chronological order, illustrating how a random variable evolves over time. In these plots, time is represented on the x-axis, while the observed values are shown on the y-axis. b. Time series plots serve as crucial tools for identifying and describing fundamental theoretical features and patterns within a time series. These patterns can be categorized into four main components: c. 1. Trend: The long-term movement in the data, indicating an upward or downward direction. d. 2. Seasonality: Regular fluctuations that occur at specific intervals, often related to seasons or cycles. e. 3. Long-Run Cycle: Extended oscillations that occur over longer periods, often influenced by economic or other macro-level factors. f. 4. Irregular Changes: Unpredictable variations caused by unexpected events or anomalies. g. The presence of these components influences the fundamental statistical properties of the time series, including: h. Constant or Non-Constant Mean: Whether the average value of the series changes over time. i. Constant or Non-Constant Variance: Whether the variability in the series remains stable or fluctuates. j. Autoregression Patterns: Relationships between current and past values, which are critical for modeling and forecasting. k. Understanding these features helps determine whether a time series is stationary (having constant statistical properties over time) or non-stationary (exhibiting changes in these properties). Analyzing stationarity is essential for applying appropriate statistical methods, as many time series analysis techniques assume stationarity. 14. Time series components a. Trend i. The trend changes the mean of the series. A trend exists when there is a long-term increase or decrease in the data. It does not have to be linear. Sometimes we will refer to a trend as “changing direction”, when it might go from an increasing trend to a decreasing trend. b. Seasonal i. A seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week. Seasonality is always of a fixed and known period. c. Cyclic fluctuations i. In time series data refer to rises and falls that do not occur at a fixed frequency. These variations are often driven by economic conditions and are typically associated with the business cycle, lasting a minimum of two years. ii. It’s important to distinguish between cyclic behavior and seasonal behavior. Cyclic fluctuations are irregular and not tied to a specific time frame, while seasonal patterns occur at consistent intervals, often linked to calendar events. Generally, cycles have a longer average duration compared to seasonal patterns, and their magnitudes tend to be more variable. d. Irregular component i. The irregular component represents random, unstructured, and unpredictable variations in a time series that cannot be attributed to other components, such as trend, seasonal, or cyclical patterns. It embodies the "noise-like" fluctuations inherent to the data. ii. It is important to distinguish the irregular component from residuals. While both capture random variations, residuals are the differences between observed values and those predicted by a model. In contrast, the irregular component is an intrinsic aspect of the time series itself. iii. Identifiable causes can often be linked to irregular values and outliers. Examples include natural disasters (like earthquakes and floods), social disruptions (such as strikes), and errors in data processing. iv. The irregular component is typically assumed to follow a white noise process, characterized by a mean of zero, constant variance, and no autocorrelation—meaning past values do not inform future predictions. However, in practice, they do not always conform to this ideal. 15. Stationary time series a. A stationary time series has statistical properties that do not depend on the time at which the series is observed. They have the following features: i. Constant mean ii. Constant variance iii. Non seasonality b. A common assumption in many time series analysis techniques is that the data are weakly stationary. c. Stationarity: i. Stationary are statistical properties in stochastic processes or time series that do not depend on the time at which the series is observed. ii. Stationary conditions: constant mean, constant variance, and autocovariance over time and no periodic fluctuations (seasonality). All white noise is a type of stationary time series, but all stationary time series are not white noise series. d. Strict (or strong) stationarity: i. A time series is strictly stationary if all moments in the overall distribution of the series remain unchanged over time. e. Weak (or Second-Order) stationarity: i. A time series is weakly stationary if its mean, variance, and autocovariance are constant over time. It does not require the overall ii. distribution of the time series to remain unchanged over time. f. First-order stationarity: i. A time series is first-order stationarity if its mean remains unchanged over time. 16. Detecting Stationarity: Statistical and non-statistical test: a. Non-statistical test: i. Visual test of time series plot Reveal mean, variance, and autocovariance features ii. Global vs local analysis check Reveal mean and variance features iii. Correlogram (ACF): Reveal autocorrelation features iv. For stationary data the AFC drops to zero relatively fast. lag 1 is often large and positive. v. For non-stationary data the ACF decreases slowly. vi. b. Statistical test: The unit root tests i. A unit root (also called a unit root process) is a property of some stochastic processes (such as random walks) that imply nonstationarity. ii. The reason why it’s called a unit root is because of the mathematics behind the process. At a basic level, a process can be written as a series of monomials (expressions with a single term). Each monomial corresponds to a root. If one of these roots is equal to 1, then that’s a unit root. iii. Unit root processes may sometimes be confused with trend-stationary processes; while share some properties, they differ in different aspects. It is possible for a time series to be non-stationary and still have no unit root and be trend-stationary. iv. A unit root is a stochastic trend in a time series referred as a random walk with drift. If a time series has a unit root, it implies a systematic pattern that is unpredictable. v. vi. vii. viii. c. Statistical test (The unit root test): The presence of a unit root can be tested using a unit root test. i. There are many tests as none of them stand out as having the most power. Two common choices are the ii. DIckey-Fuller test (DF) 1. Assumes that the time series is a linear function AR(1). Applicable only to AR(1). 2. Ho: 𝛽 =1,Ha: 𝛽 DF critical value we do not reject Ho. That is, the time series is not stationary. 6. Serial correlation can be an issue, in which case the Augmented Dickey-Fuller (ADF) test can be used. iii. Augmented dIckey-Fuller test (ADF) 1. It assumes that the time series is a linear function AR of higher order than (1) 2. It assumes that the time series is a linear function AR(1) 3. Ho: 𝛽 =1,Ho: 𝛽 DF critical value, we do not reject Ho. That is, the time series is not stationary. 7. The ADF handles bigger, more complex models. It does have the downside of a fairly high Type I error rate. Rejecting HO when it is true. 17. Differencing the data. a. It is one of the most common and useful technique to stabilize the mean of a time series by removing changes b. in the level of a time series, and therefore eliminating (or reducing) trend and seasonality. c. Differencing creates a new time series based on the first difference between the current observation and the previous observation from the original time series (raw data). That is, d. Given 𝑌 = ∅ 𝑌 + 𝜀 , where ∅ = 1, then, we take the first difference and create the new series 𝑑 = 𝑌 − e. 𝑌 𝑡−1 f. Then, we have that 𝑑𝑡 = 𝜀𝑡, where 𝜀𝑡 = IID (0, 𝜎2). That is, E(𝑑𝑡) = 0 and the variance (𝑑𝑡)= 𝜎2 g. The new time series is stationary with constant mean, constant variance, and without seasonality component. h. The differenced data will contain one less point than the original data. Although you can difference the data more than once, one difference is usually sufficient. i. 18. Non-stationary time series a. A non-stationary time series has statistical properties that depend on the time at which the series is observed. They have the main following features: i. Trend ii. Seasonality iii. Long-run cycle iv. Irregular changes v. Random changes b. Time series are often non-stationary or have means, variances, and covariances that change over time. c. Non-stationary behaviors can be trends, seasons, cycles, random walks, or combinations of these. d. Non-stationary data, as a rule, are unpredictable and cannot be modeled or forecasted. e. The results obtained by using non-stationary time series may be spurious in that they may indicate a relationship between two variables where one does not exist. f. To obtain consistent, reliable results, the non-stationary data needs to be transformed into stationary data. 19. Random walk (Non-stationary time series data) a. Each observation In a random walk is highly dependent on the previous point, plus a random error term defined as white noise. b. The direction of a random walk is random, leading to a path that can fluctuate indefinitely without predictable trend, seasonality, or mean reversion. c. The variance of a random walk evolves over time and goes to infinity as time goes to infinity; therefore, a random walk cannot be predicted. d. Daily Stock Prices are a common example of a random walk. e. The random walk hypothesis is a theory that stock market prices are a random walk and cannot be predicted. f. The Efficient Market Hypothesis suggests that stock prices follow a random walk because they incorporate all available information, making it impossible to predict future prices based on past movements. g. Due to their properties, random walks are commonly used to model unpredictable paths in several fields of study, such finance, physics, and other. h. Cumulative movement effect: Each point is the sum of all previous changes. Therefore, it can drift away from the starting point over time, with no tendency to return to the starting point. i. Non-Stationarity: Non-stationarity random walks do not have a constant mean or variance over time. All random walks are non-stationary processes, but not all non-stationary processes are random walks. j. Equal likelihood of up or down movement: The probability of moving up or down by a certain amount is equal In the simplest form of a random walk. The size of change in each step is independent and identically distributed (i.i.d). k. 20. Time series fundamentals a. Residuals i. It is also called the disturbance term or the residual. It is a proxy for all the omitted or neglected variables that may affect Y but are not (or cannot be) included in the regression model. ii. It is the error in the prediction. It is difference between the actual value of Y and the predicted value of Y. iii. Positive residuals tell us that the actual value is located above the regression line. A negative residual tell us that the actual iv. value is located below the regression line. b. The ordinary least squares (OLS) i. It is a type of linear least squares method for choosing the unknown parameters in a linear regression model so that the estimated regression line is as close as possible to the observed data, where closeness is measured by the sum of the squared mistakes made n predicting Y given X. c. The model: 𝑌 = ∝ + 𝛽 𝑋 + 𝑢 Where, 𝑖11𝑖 i. 𝑌 = the independent variable i. 𝑖 ii. The i term denotes the i observation. That means we have cross-sectional data. On the case of time series, the i term denotes the tth observation. iii. ∝ = intercept iv. 𝛽𝑛= the slop of the coefficient n v. 𝑋𝑛= The independent variable n vi. 𝑢𝑖= The error term in the regression 21. White noise a. White noise times series are stationary time series, but stationary time series are not white noise series. b. White noise is a type of time series with constant zero mean, constant variance, and zero correlation between lags. c. In time series analysis a Gaussian white noise is a sequence of independent identically distributed (IID) Normal random variables with zero mean and constant variance 𝜎2: d. 𝜖1:𝑁~𝐼𝐼𝐷𝑁 0,𝜎2 e. By definition, white noise is not predictable. Therefore, white noise is not useful for prediction. f. A well-fitted model has residuals that are white noise. That is, no more information can be extracted from the data. g. When a time series has non-constant mean, variance, or autocorrelation between the data, it is a clear indicator that additional data can be still extracted from the time series using ARIMA, GARCH or other type of models. 22. Detecting white noise a. Non-statistical test: i. Visual test of time series plot Reveal mean, variance and autocorrelation features ii. Global vs local analysis check Reveal mean, variance features iii. Correlogram (ACF): Reveal autocorrelation features b. Statistical test: i. Portmanteau tests are a kind of tests that check whether the whole set of autocorrelations of the residual time series are statistically different from zero. ii. Box-Pierce test: Ljung-Box test Where: is the autocorrelation coefficient at lag of the residuals iii. differenced series, and is the number of lags being considered. Ho: Autocorrelations up to lag k = 0 iv. Ha: Autocorrelations up to lag k ≠ 0 v.. n is the number of terms in vi. If the residuals are random, they are distributed as degrees of freedom, where m is the number of parameters in the model. 23. Autoregressive Processes a. In a multiple regression model, we forecast the variable of interest using a linear combination of explanatory variables called also predictors. In an autoregression model, we forecast the variable of interest using a linear combination of past values of the variable. The term autoregression indicates that it is a regression of the variable against itself. b. A p-order autoregressive process, denoted AR(p), can be written as 𝑌=α+∅𝑌 +∅𝑌 +...+∅𝑌 +𝜀 c. 𝑡 1 𝑡−1 2 𝑡−2 𝑝 𝑡−𝑝 𝑡 d. The value of y at time t is a linear function of y at earlier times plus a fixed constant and a random error term. e. In line with the ordinary linear regression model, it is assumed that the error terms 𝜀𝑡 is white noise. That is, the error terms are independently distributed based on a normal distribution with zero mean and a constant variance σ2. Additionally, the error terms are assumed to be independent of the y values. f. A first-order autoregressive process, denoted AR(1), takes the form g. 𝑌=α+∅𝑌 +𝜀 𝑡 1𝑡−1𝑡 h. The AR(1) process is stationary when |φ1| < 1. If |φ1| = 1 we have a random walk. 24. Forecasting Accuracy: Time Series Forecast Error a. b. 25. Moving Average Process a. While the Autoregressive process uses past values of the forecast variable in a regression, a moving average process uses past forecast errors in a regression-like model. b. A q-order moving average process, denoted MA(q) takes the following form: c. 𝑌=𝜇+∅𝜀 +∅𝜀 +...+∅𝜀 +𝜀 d. 𝑡 1 𝑡−1 2 𝑡−2 𝑞 𝑡−𝑞 𝑡 e. It is assumed that the error terms are independently distributed based on a normal distribution with zero mean and a constant variance σ2. The value of y at time t is a linear function of past errors. f. Keep in mind that the values of 𝜀𝑡 is unknown at time t, so it is not really a regression in the usual sense. g. A first-order moving average process, denoted MA(1), takes the form 𝑌 = 𝜇 +∅ 𝜀 + 𝜀 𝑡 1𝑡−1𝑡 h. An autoregressive moving average (ARMA) process consists of both autoregressive and moving average terms. An ARMA(p, q) process can be expressed as i. 𝑌=∅+∅𝑌 +...+∅𝑌 +𝜀+∅𝜀 +...+∅𝜀 j. 𝑡 0 1 𝑡−1 𝑝 𝑡−𝑝 𝑡 1 𝑡−1 𝑞 𝑡−𝑞 k. An ARIMA(p,d,q) process adds differencing to an ARMA process. l. If a time series must be differenced d times before it becomes stationary, then it is said to be integrated of order d – denoted I (d). m.