Machine Learning in Finance Linear Models PDF
Document Details
Uploaded by Deleted User
University of St. Gallen
Despoina Makariou
Tags
Summary
This document provides a lecture on machine learning in finance, specifically focusing on linear models. It covers simple and multiple linear regression, along with their applications in finance. The document also includes questions to encourage active learning.
Full Transcript
Machine learning in Finance Linear models Prof. Dr. Despoina Makariou Institute of Insurance Economics University of St. Gallen Today’s learning outcomes Simple Linear Regression Multiple Linear Regression 1 Quizz Question: What is regressio...
Machine learning in Finance Linear models Prof. Dr. Despoina Makariou Institute of Insurance Economics University of St. Gallen Today’s learning outcomes Simple Linear Regression Multiple Linear Regression 1 Quizz Question: What is regression analysis about? 2 Regression analysis Regression analysis is the study of dependence. It is a method to quantify the relationship between a variable of interest and explanatory variables. We want to model a real data set using as simple a model as possible, while not losing explanatory power on any complex phenomenon observed in the data. 3 Quizz Question: Can you think of any examples of regression applicability? 4 Regression analysis applicability Regression analysis is proved to be very useful in many fields, with a very wide applicability: Finance people may want to determine the systematic risk for a particular stock, which is referred to as beta. A stock’s beta is a measure of the volatility of the stock compared to a benchmark such as the S&P 500 index. Economists analyse how an economic indicator, like GDP, is related to other macroeconomic variables like industrial production, housing index, population of a city, etc. Actuaries investigating how the number and amount of car insurance claims of an individual is related to certain characteristics of the car and the policyholder so that the premium can then be set accordingly for different people. Regression analysis include linear regression/modelling, which forms the basis of this class. 5 Simple Linear Regression A simple linear model is a special case of linear model, where there is only one explanatory/predictor variable. The data comes in a pair (x, Y), with x represents the covariates and Y the response variable. Assume Y is a continuous random variable. We assume the covariates x is fixed and known. We use n to denote the sample size. That is, the data consists of n pairs (x1 , y1 ),..., (xn , yn ). We assume a linear model for a generic pair (x, Y) and our data (xi, yi) to be Y = β 0 + β 1 x1 + ϵ i (1) where i = 1,..., n and where we assume that ϵ has mean 0 and variance σ and the ϵi ’s are independent of each other. The term ϵi represents measurement/individual error, or unexplained variation. 6 Simple Linear Regression dE(Y) Note that the E(Y) = β0 + β1 x and hence β1 = dx. Therefore, the interpretation of β1 is the average change in the response Y for a unit change in x. The assumptions in model (1) implies: 1) Error can be positive or negative, since ϵi has mean 0. 2) Group mean: For any data with covariate x, the mean response is E(Y) = β0 + β1 x 3) Group variance: var(Y) = σ 2 , independent of the value of x. 4) Error is assumed to be additive on the group mean. 7 Least square estimator It is sometimes called the Ordinary Least Square (OLS) estimator. We want to estimate β0 and β1 from the data, and also estimate σ 2 which the residual variance. To derive the least square estimator from the data, we minimise the sum of squares of the residuals: n ∑ n ∑ g(β0 , β1 ) := ϵ2i = (yi − β0 − β1 xi )2 (2) i=1 i=1 8 Quizz Question: How do we minimise the sum of squares of the residuals? 9 Least square estimator To find βˆ0 and βˆ1 that minimise the function g, we differentiate with respect to β0 and β1 and set the derivatives to zero: ∑ n ∂g 0= = −2 (yi − β0 − β1 xi ) (3) ∂β0 i=1 ∑ n ∂g 0= = −2 xi (yi − β0 − β1 xi ) (4) ∂β1 i=1 10 Least square estimator Solving we get: ∑n i=1 (xi − x̄)(yi − ȳ) Sxy βˆ1 = ∑n = (5) (x i=1 i − x̄) 2 Sxx where Sxx is is the sum of the squares of the difference between each input x < and the mean x value and Sxy is sum of the product of the difference between x and its means and the difference between y and its mean. βˆ0 = ȳ − βˆ1 x̄ (6) 2 2 To estimate σ , note that E(g(β0 , β1 )) = nσ , hence we can estimate it by n ∑ 2 −1 σ̂ = n (yi − βˆ0 − βˆ1 xi )2 (7) i=1 It can be proven that σ̂ 2 is biased E(σ̂ 2 ) ̸= σ 2. 11 Quizz Question: How do we interpret the coefficients βˆ0 and βˆ1 ? 12 Interpretation of coefficients The intercept βˆ0 is the mean value of the dependent variable when the independent variable takes the value 0. Its estimation has no interest in evaluating whether there is a linear relationship between two variables. It has, however, an interest if you want to know what the mean value of output could be when the input equals zero. The slope βˆ1 corresponds to the expected variation of the output when an input varies by one unit. It tells us about the sign of the relationship between the input and output and the speed of evolution of the output as a function of the input. The larger the slope in absolute value, the larger the expected variation of output for each unit of input. Note, however, that a large value does not necessarily mean that the relationship is statistically significant. 13 Least square estimator We have not assumed the error ε is normally distributed. Assuming so allows to make inference on the parameters, like testing their significance. We can then also construct confidence intervals and prediction intervals for new predictions. In this class, we use the linear regression for solely predictive purposes. Now we will look at how a linear model is explaining the variation of the data. 14 Decomposition of the total variation of the data The spread/deviation of each data point can be measured by (yi − ȳ)2 , since without the knowledge of xi , the best estimate at any x will be ȳ. We square the deviation since we want to add up these deviations without cancelling out one another for the data. Hence the total deviation, or the Total Sum of Squares for the data, can be measured by n ∑ Total SS = (yi − ȳ)2 (8) i=1 15 Decomposition of the total variation of the data With knowledge of xi ’s we estimate yi by ŷi = βˆ0 + βˆ1 xi If this regression line is useful in predicting the yi ’s, then intuitively the deviation yi − ŷ should be small, while ŷ − ȳ will be close to the original deviation. To determine how useful the regression line is, we decompose 16 Decomposition of the total variation of the data A good regression line should have a “small” unexplained deviation for each data point. Squaring both sides and sum up over all data points, we have 17 Decomposition of the total variation of the data In words, we decompose the total variation by Total SS = SS(reg) + RSS (9) where SS(reg) = SS due to regression, and RSS = Residual SS. This decomposition leads us to define the coefficient of determination, or the “R-squared”, by SS(reg) RSS R2 = =1− (10) Total SS Total SS By the above decomposition 0 ≤ R2 ≤ 1. If the regression is a very good one so that RSS is small, then SS(reg) is close to the Total SS, so that R2 ≈ 1. On the other hand, a bad regression gives a large RSS, so that R2 ≈ 0. 18 Multiple linear regression We now move from simple linear regression to multiple linear regression. While simple linear regression has only one covariate, multiple linear regression has at least two or more covariates. Suppose we have n observations, and the i-th data point is (xi1 ,..., xik , yi). The model is then yi = β0 + β1 xi1 +... + βk xik + ϵi (11) where i = 1,..., n. We want to minimise n ∑ SS(β0 , β1 ,..., βk ) = (yi − β0 − β1 xi1 −... − βk xik )2 (12) i=1 It becomes very cumbersome to write the model and carry out analysis. Hence, you will often see this model written in matrix form. 19