Machine Learning in Finance PDF
Document Details
Uploaded by AchievableElbaite834
University of St. Gallen
Despoina Makariou
Tags
Summary
These notes cover machine learning concepts and their applications in finance. The document provides course information, including details about the exam. The document also features a brief introduction to machine learning, financial applications, and big data concepts.
Full Transcript
Machine learning in Finance Course information & Basics of Machine Learning with examples in financial applications Prof. Dr. Despoina Makariou Institute of Insurance Economics University of St. Gallen 1 2 3 Main parts of the course 4 Main parts of the course...
Machine learning in Finance Course information & Basics of Machine Learning with examples in financial applications Prof. Dr. Despoina Makariou Institute of Insurance Economics University of St. Gallen 1 2 3 Main parts of the course 4 Main parts of the course 5 Main parts of the course 6 Learning tools Theory presented in slides Application of theory in R Our discussions 7 Examination sub part (1/2) Decentral ‑ Written examination (70 percent, 90 mins.) Date of written examination: 25.10.2024 (our last class) Predominantly multiple choice based Mock questions will be discussed on 18.10.2024 via Zoom 8 Examination sub part (2/2) Decentral ‑ Group examination paper with presentation (all given the same grades) (30 percent) Written at home (in groups ‑ all the same grade) You have to define a predictive modelling research question of financial nature. You will find a financial dataset of your choice, perform certain machine learning tasks (they will be specified), and write a report. Your code should also be submitted. More details will be announced soon. Your assignment will be presented in groups in class. 9 Course literature For the exam, reading lectures slides (and participation) is sufficient! Indicative additional literature for the curious reader includes: 1. James, G., Witten, D., Hastie, T. and Tibshirani, R., 2013. An introduction to statistical learning (Vol. 112, p. 18). New York: springer. 2. Ni, Dong, X., Zheng, J. (2021). An introduction to machine learning in quantitative finance. World Scientific. 3. Abedin, M.Z., Hassan, M.K., Hajek, P., Uddin, M.M. (Eds.). (2021). The Essentials of Machine Learning in Finance and Accounting (1st ed.). Routledge. https://doi.org/10.4324/9781003037903 10 Our communication 11 Today’s agenda Machine Learning essentials and applications in finance Linear models Resampling approaches Introduction to R and Rstudio programming Implementation of the taught concepts in R 12 ML basics and apps in finance - Learning outcomes What is machine learning? Supervised learning vs unsupervised learning Regression vs classification Examples of financial applications How to measure the performance of a machine learning algorithm? Bias and variance trade-off 13 Quizz Question: What is big data from a statistical perspective? 14 Big Data in Statistics What is Big Data from a statistical perspective? n : number of observations; p: number of features (dimensionality). Big n: Large sample size Big p: Large dimensionality 15 Quizz Question: What is needed from a method to address large sample size or high dimesionality? 16 Big Data in Statistics What is Big Data from a statistical perspective? n : number of observations; p: number of features (dimensionality). Big n: Large sample size — requiring machine learning techniques to efficiently process data (e.g. low complexity algorithms or distributed computing, ie. the method of making multiple computers work together to solve a common problem). Big p: High dimensional statistics — require assumption of some special structures in data and innovation in statistical procedures. 17 Quizz Question: What is Machine Learning? 18 What is machine learning? Machine learning is a subfield of Artificial Intelligence. A process of extracting patterns, features and making useful predictions based on data - emphasis on prediction accuracy. Data are assumed to be generated from a random process, hence any conclusion drawn is probabilistic. 19 Quizz Question? How can we make better predictions using machine learning? 20 Quizz Question? How can we make better predictions using machine learning? Answer a. Larger amount of data, or better-quality data lead to more accurate prediction. b. Or we can use better learning algorithms. 21 How can machine learning be used in Finance? Here, we consider Finance in its broader sense - The financial sector is a section of the economy made up of firms and institutions that provide financial services and this sector comprises a broad range of industries including banks, development banks, investment companies, insurance companies, real estate firms etc. Some examples of use of machine learning in finance include: Fraud detection in loan applications. Facilitation of banks underwriting process, i.e. by training algorithms on large amounts of customer data, the system can make quick underwriting and credit scoring decisions. Stock price prediction based on quarterly revenues, business news etc. Real estate price prediction based on economic and demographic indicators etc. 22 What is statistical learning? Y : dependent variable, response variable, output. X = (X1 , X2..., Xp ), regressors, covariates, features, independent variables, inputs. Suppose that the output variable Y depends on the input variable X and we can model the relationship in the following manner: Y = f(X) + ϵ (1) where f is an unknown function and ϵ captures measurement error (randomness) with mean zero. We call statistical learning the task of estimating f from given data which we call training data and here take the form (X1 , Y1 ), (X2 , Y2 ),..., (Xn , Yn ). 23 Quizz Question: Why to estimate f? 24 Prediction Why to estimate f? For inference or prediction (here we care about prediction). If, in practice, the output Y is not easily obtained but the input X is, then it is desirable to be able to predict what the output Y will be for a given value of X. Such a prediction has the form Ŷ = f̂(X) where f̂ is an estimate of the unknown function f. 25 Reducible and irreducible error For a fixed value of X, our overall error can be decomposed into two parts: the irreducible error (even if we knew f, we would still make errors in prediction) and reducible error We will focus on minimizing the reducible error, since the irreducible error is beyond our control once we have chosen our predictors. 26 Quizz Question: What categories of learning do you know? 27 Supervised learning Supervised learning 1. Regression: Y is continuous/numerical. 2. Classification: Y is categorical. We will deal with both problems. a. Some methods work well on both types, e.g. Neural Networks. b. Other methods work best on regression tasks, e.g. Linear Regression, or on classification tasks, e.g. linear discriminant analysis. Supervised learning is all about how to estimate f. Two reasons for estimating f. a. Prediction (predictive accuracy). b. Inference (estimation accuracy + uncertainty). Supervised learning for prediction is the focus of our course. 28 Unsupervised Learning No outcome variable, just a set of covariates measured on a set of samples (here we have only X). The goal is to extract useful summarising features based on data. Example of method: Clustering: Find groups of samples that behave similarly. Example of application: Market segmentation: we try to divide potential customers into groups based on their characteristics. With these methods it is more difficult to know how well you are doing. 29 Clustering 30 Quizz Question: How do we measure the success of a machine learning method? 31 Loss A loss function quantifies how far your predicted value is to the true value. The loss function directly encodes our knowledge about a given problem and the goal we set for the learning machine. The definition of the loss function thus depends on whether we are in a classification or regression setting. In all cases, a loss function must obey a few basic rules: i) it should compare a label with a predicted label and output a positive real number, ii) it must output zero when the two labels are equal. 32 Loss For regression, we care about the squared differences between the actual and the predicted values. For classification, the typical loss function is the 0-1 loss, which merely assigns a cost of 1 to misclassifications and 0 to correct predictions. Given that loss function, the risk of a classifier corresponds to its probabilty of misclassifying a random pattern. 33 Risk The risk of a learning machine or model is the amount of prediction error that can be expected when using that model. The main goal in supervised learning is to minimize this risk in order to obtain the most accurate model. 34 Loss and risk We use a loss function, L to quantify the prediction accuracy. We consider mainly two types of loss functions 1. ℓ2 loss function for regression: L(Y, f(X)) = (Y − f(X))2 (2) 2. 0-1 loss function for classification: L(Y, f(X)) = 1(Y ̸= f(X)) (3) For a given loss function, L, the risk of a learning function f is defined by the expected loss R(f) = EX,Y (L(Y, f(X))). (4) We aim to find f(x) that minimize R(f) pointwise (for each point of a given set). 35 Risk The risk is not a quantity that can be computed exactly, since it measures the quality of predictions of unknown phenomena, and in practice must be estimated. The core idea is that we cannot know exactly how well an algorithm will work in practice (the true ”risk”) because we don’t know the true distribution of data that the algorithm will work on, but we can instead measure its performance on a known set of training data (the ”empirical” risk). 36 Measuring the quality of fit to data Suppose we observe i.i.d. training data (Xi, Yi), i = 1,..., n. The empirical risk of any function f is: n 1∑ Rn (f) = L(Yi , f(Xi )). (5) n i=1 We can find the empirical risk minimiser f̂ which minimises Rn (f). The training error of the empirical risk minimiser is Rn (f̂). Regression: Mean squared error (MSE) is given by n 1∑ MSE = (Yi − f̂(Xi ))2 (6) n i=1 Classification: Misclassification error rate is given by n 1∑ MER = 1(Yi ̸= f̂(Xi ))2 (7) n i=1 Our method is designed to make MSE or the MER small on the training data we are looking at. 37 Quizz Question: Is training error all we care about? 38 Test errors What we really care about it how well the method works on new data (X̃i , Ỹi )i∈{1,...,m}. We call this new data test data. We aim to choose the method that gives the lowest test MSE or MER for regression and classification problems, respectively. m 1 ∑ test MSE = (Ỹi − f̂( X̃i ))2 (8) m i=1 Importantly, f̂ is independent of the test data, so test MSE is a more accurate approximation of the true risk of f̂. 39 Flexibility vs interpretabilty For more complex or flexible models, we need to be especially aware of the distinction between training and test errors. Which one is more flexible? 1. Parametric model vs non-parametric model. 2. Linear model vs non-linear model. 3. Linear model with 10 features vs linear model with 100 features 40 Training vs Testing Errors In general, the more flexible a method is the lower its training error rate will be, i.e. it will “ fit” or explain the training data very well. However, the test error rate may in fact be higher for a more flexible method than for a simple approach like linear regression. 41 Bias-Variance Tradeoff Bias is the simplifying assumptions made by our method to make the target function easier to approximate. Variance is the amount that the estimate of the target function will change given different training data. 42 Bias-Variance Tradeoff If the true model is Y = f(X) + ϵ, where f(x) = E(Y|X = x). Suppose fit a model f̂ based on training data and (x0 , Y0 ) be a test observation from the population. Then the expected test MSE at x0 is E(Y0 − f̂(x0 ))2 = var(f̂(x0 )) + {Bias(f̂(x0 ))}2 + var(ϵ) where Bias(f̂(x0 )) = E{f̂(x0 )} − f(x0 ). As the flexibility of f̂ increases, its variance increases and its bias decreases. This corresponds to the bias-variance tradeoff. To minimize the expected test MSE at x0 we need to select a statistical learning method that simultaneously achieve relatively low variance and low bias. 43 To remember We must keep this picture in mind when choosing a learning method. More flexible/complicated one is not always better! 44