Neural Networks for Machine Learning in Finance PDF

Document Details

RecommendedSunflower

Uploaded by RecommendedSunflower

University of St. Gallen

Despoina Makariou

Tags

neural networks machine learning finance artificial intelligence

Summary

This document provides a detailed overview of machine learning in finance, focusing on neural networks. It covers foundational concepts such as different types of neural networks, feedforward neural networks, and training algorithms and the fundamental concepts of neural networks. The presentation also touches on architecture, and training.

Full Transcript

Machine learning in Finance Neural Networks Prof. Dr. Despoina Makariou Institute of Insurance Economics University of St. Gallen Learning outcomes Intro to neural networks. Main neural network types. Basic concepts in feedforward neural networks....

Machine learning in Finance Neural Networks Prof. Dr. Despoina Makariou Institute of Insurance Economics University of St. Gallen Learning outcomes Intro to neural networks. Main neural network types. Basic concepts in feedforward neural networks. 1 Introduction to Neural Networks Neural networks (NNs) are computer systems that are often said to operate in a similar fashion to the biological. neural networks, such as the human brain. According to the universal approximation theorem, proved by Cybenko in 1989, a NN with 1 hidden layer, often called a shallow NN, and a nonlinear activation can approximate any function with arbitrarily small error, i.e. can in principle learn anything. 2 Introduction to Neural Networks However, in such cases the hidden layer might need to be very large, and thus it won’t be useful for learning a model that generalizes well to new data. Instead, one prefers to construct multilayer deep NNs where each layer computes certain patterns which are then combined by the next layer in order to form more complex patterns which lead to more accurate predictions. 3 Neurons and layers To understand the term deep in deep learning, we need to first understand how ANNs are structured. All the circles here are called neurons or nodes. Each ’column’ (neurons that are in the same color) are organized into what we called layers. Layers between input layer (cyan) and output layer (yellow) are called hidden layers. 4 Neurons and layers If there are more than one hidden layer, the ANN is called a deep ANN. The layer structure used in the previous example and the examples below is called dense layer, which are also known as fully connected layers. As the name suggests, it fully connects each input to each output. 5 Commonly used classes of Neural Networks There are many classes of NNs and many have sub-classes. The three most commonly used classes of NNs are feedforward Neural networks, convolutional neural networks. Recurrent Neural Networks. 6 Feedforward Neural Networks They are ANNs in which the connections between the units do not form a cycle and in feedforward NNs information always moves forward. The most basic feedforward NN, which contains only the input layer, is called perceptron. A perceptron takes several binary inputs, for example, x1,..., x5, and produces a single binary output: 7 Feedforward Neural Networks All the inputs x1,..., x5 are multiplied with their weights w1,..., w5. The neuron’s output, 0 or 1, is determined by whether the ∑5 weighted j=1 wj xj is less than or greater than some threshold value. 8 Feedforward Neural Networks Here we will use multilayer perceptrons which are feedforward NNs with three or more layers including the input layer, the hidden layer(s) and the output layer. 9 Feedforward Neural Networks - example Each input to this ANN has three dimensions, e.g. age, height and weight. Two neurons in the output layer which means that there are two possible outcomes, e.g. overweight or underweight. Number of neurons in the hidden layer chosen is arbitrarily 10 Convolutional Neural Networks Convolutional Neural Network: They are variants of multilayer perceptrons, designed to take in an input image, assign importance (weights) to various objects in the image and be able to differentiate one from the other. For example: 11 Recurrent Neural Networks They are types of ANNs which use sequential data or time series data. They are commonly used for ordinal or temporal problems, such as language translation, natural language processing (nlp), speech recognition,and image captioning. They are incorporated into popular applications such as Siri, voice search, and Google Translate. Like feedforward and convolutional neural networks (CNNs), recurrent neural networks utilize training data to learn. 12 Focus The focus today lies on feedforward neural networks We will discuss some basic concepts. 13 Feedforward neural networks - basic concepts Input layer: No computation is done, information is passed to the next layer. Hidden layers: They are layers between the input and the output layers which consist of a number of neurons or nodes which are processing units that compute a linear combination of the inputs from the previous layer and then this sum is passed to an activation function, which performs some type of transformation on the given sum. 14 Feedforward neural networks - basic concepts Output layer: Finally, an activation function that maps to the desired output format (e.g. softmax for classification, exponential for regression) is used. Connections and weights: The network consists of connections, each connection transferring the output of a neuron i to the input of a neuron j. Activation function: The activation function maps the node from the current layer to the node in the next layer. 15 Feedforward neural networks - basic concepts The nonlinear activation functions allow NNs to compute nontrivial problems using only a small number of nodes. Typical activation functions: a. Linear activation function: f(z) = z. b. Nonlinear activation functions: Are those which f(x + y) ̸= f(x) + f(y), x, y, such as for e.g. f(z) = z2 , log(z), ez ,... 16 Feedforward neural networks - basic concepts z e Sigmoid function: σ(z) = 1+e z ∈ [0, 1] The sigmoid function is widely used for binary classification. However, the maximum possible value of its derivative is 0.25 and hence is not advisable to use it as an activation function for several layers as it is likely to encounter the problem of vanishing gradients: gradient will converge to 0. For this reason, the sigmoid is usually used as an activation function only for the output layer. Hyperbolic tangent activation function: z −z e ) tanh(z) = (e(ez +e −z ) ∈ [0, 1]. It is very similar to the sigmoid activation function and it can be used with hidden layers. 17 Feedforward neural networks - basic concepts Each connection between two neurons has an associated weight, wi,j which represents the strength of the connection between the two neurons. Starting from the input layer, input is passed to the next neuron via a connection, and the input will be multiplied by the weight. 18 Feedforward neural networks - basic concepts Next, for each node in the second layer, a weighted sum is then computed with each of the incoming connections. This sum is then passed to an activation function f , which performs some type of transformation on the given sum. e.g. for the first neuron in the hidden layer. h1 = f(x1 w′11 + x2 w′21 ) (1) Then we repeat the whole process from the hidden layer to the output layer. In general, if there are multiple hidden layers, we will need to repeat the above process until we reach output layer. In this example, y1 = f(h1 wh11 + h2 wh21 + h3 wh31 ) (2) 19 Feedforward neural networks - basic concepts Learning rule: It is a rule, or an algorithm, which modifies the parameters of the neural network in order for a given input to the network to produce a favored output. This learning process typically amounts to modifying the weights. 20 Feedforward neural networks - basic concepts The different learning rules in the NN are: Hebbian learning rule – It identifies, how to modify the weights of nodes of a network. Perceptron learning rule – Network starts its learning by assigning a random value to each weight. Delta learning rule – Modification in sympatric weight of a node is equal to the multiplication of error and the input. Correlation learning rule – The correlation rule is the supervised learning. Outstar learning rule – We can use it when it assumes that nodes or neurons in a network are arranged in a layer. 21 Feedforward neural networks - basic concepts Neural network architecture: It involves the choices of the depth of the network, the numbers of hidden neurons in each hidden layer, as well as activation function(s). Hyperparameters: Apart from the weights you also have the so called hyperparameters whose values are used to control the learning process. 22 Feedforward neural networks - basic concepts For a simple feedforward neural network, the hyperparameters are: number of neurons, number of layers, learning rate, regularization penalty to prevent overfitting, momentum is added if the NN approaches a shallow local minimum so that a global minimum is found, number of epochs is the number of iterations that should be increased until the validation accuracy starts decreasing even when training accuracy is increasing (overfitting), batch size is like a for loop as it defines the number of samples to work through before updating the internal model parameters. 23 Feedforward neural networks - training algorithm We train a model by minimizing a loss function, which is a way to quantify the difference between actual values from the data and predicted value from the model. For example: In linear regression model, given the pair of data (xi , yi ), i = 1,..., n, we want to minimize the following n ∑ minβ0 ,β1 (yi − β0 − β1 xi )2 (3) i=1 where yi is actual value and β0 + β1 xi is the predicted value. 24 Feedforward neural networks - training algorithm In the previous two examples, the loss function was the mean squared error (MSE) n 1∑ MSE = (yi − ŷi )2 (4) n i=1 where yi is the actual value from data and ŷi is the predicted value from a model. The scaling factor n1 can be omitted in practice since it won’t affect the result. 25 Feedforward neural networks - training algorithm There are other types of loss functions, we provide some examples: Mean absolute error: n 1∑ MAE = |yi − ŷi | (5) n 1 Mean absolute percentage error: n 1 ∑ yi − ŷi MAPE = | | (6) n 1 yi Mean square logarithm error: n 1∑ MSLE = (log(yi + 1) − log(ŷi + 1)) (7) n 1 26 Feedforward neural networks - training algorithm We start the training process by setting up arbitrary weights, wℓij , between every neuron among all the layers ℓ, then we specify activation functions, f, we choose a loss function, L and we define the minimization problem minwℓ L(y, ŷ) (8) ij Unlike computing the minimum, or maximum, for a basic function e.g. (x1)2 , usually there is no way to minimise L in one iteration. Thus, we need some algorithms to train NNs. One of the commonly used algorithms would be Stochastic gradient descent (SGD). 27 Feedforward neural networks - training algorithm During each iteration, epoch, r = 1, 2,.... The loss function L(r) as well as the gradients dℓij (r) based on the current weight wℓij (r) are calculated. Then, update all the weights based on the gradients wℓij (r + 1) = wℓij (r) − αdℓij (r), α ∈ [0.0001, 0.01]. Repeat above process until some stopping criterion is reached. For example, specify a maximum number of iterations rm and stop the process when r > rm. The whole procedure here is usually called training. During each iteration, we would say the NN is learning by updating the weights wℓij. 28 Feedforward neural networks - training algorithm The hyperparameter α is called learning rate. We have to test and tune with each model before we know exactly where we want to set it. Typical range of value would be in [0.0001, 0.01]. A Larger value of α could lead to overshooting, i.e. the updating steps (αdlij ) are too large and it may pass the minimum. By contrast, a smaller value of α would require more iterations before the loss function is minimized i.e. it would increase of computational time. Overall, the act of choosing between a higher learning rate and a lower learning rate leaves us with this kind of trade-off idea. 29 Training, validation, and testing We split the data into three distinct data sets. Training set is used to train the model - minimise the Loss function and find out the optimized weights wℓij After the training step, the model produced output based on the input in validation set. The weights will not be updated in this step. The output from the validation set is then used to adjust the learning parameters and check whether the model is overfitting. If the prediction results on validation set perform considerably worse than the training set, then the model is said to be overfitting. 30 Training, validation, and testing The concept of overfitting boils down to the fact that the model has learned the features of the training set extremely well, but if we give the model any data that slightly deviates from the exact data used during training, then the model is unable to generalize and accurately predict the output. 31 Training, validation, and testing There are several common ways to reduce overfitting: Adding more data to the training set. Data argumentation – creating additional augmented data by reasonably modifying the data in our training set (e.g. Rotating, Flipping, Zooming the image data set). Reduce model complexity (e.g. reduce number of hidden layers, number of neurons for some layers). Dropout – randomly ignore a proportion of nodes in a given layer during training. 32 Training, validation, and testing There is also the problem of underfitting when the model is not able to predict the data it was trained on that we may need to deal with. There are several ways to reduce underfitting: Increase the complexity of the model. Add more features to the input samples. Reduce the proportion of dropout. 33 Training, validation, and testing Once encounter overfitting or underfitting, one need to try the measures suggested above and retrain the model. After finishing training the model and validating the model, the final step would be testing the model The test set provides a final check that the model is generalizing well before deploying the model to production. Note that a rule of thumb indicates that a neural network with 3 to 5 hidden layers can solve most of the high-dimensional problems. 34 Training, validation, and testing The prediction performance from the test set will be the final assessment of the model before deploying it to production. No matter how good or bad the performance on test set, one cannot go back to adjust the structure of the model and retrain the model, which should be done during training and validation. 35 Training, validation, and testing The prediction performance from the test set will be the final assessment of the model before deploying it to production. No matter how good or bad the performance on test set, one cannot go back to adjust the structure of the model and retrain the model, which should be done during training and validation. 36

Use Quizgecko on...
Browser
Browser