1_1_Artificial Neural Networks.pdf

1_1_Introduction to Neural Networks Machine Learning Models Machine learning models are a way to compute a function that maps some inputs to their corresponding outputs. Functions consist of mathematical operations such as addition, multiplication etc. They are combined with non-linear activation and stacked in layers - to learn complexities. Neural Network Artificial Neural Networks (ANNs) ○ A class of machine learning models. ○ Inspired by central nervous system of mammals. ○ Made up of several interconnected neurons organized in layers. ○ Neurons in one layer pass messages to neurons in the next layer. ○ A neural network is a function that learns the expected output for a given input from training datasets. Perceptron ○ A two-layer network used for simple operations. Back Propagation Algorithm ○ Used for efficient multi-layer network training. Deep Learning ○ A class of neural networks characterized by a significant number of neurons that are able to learn sophisticated models based on progressive levels of abstraction. ○ Deep => More number of layers ○ Inspiration: Human visual system Perceptron Perceptron ○ A model with one single linear layer ○ A simple algorithm that outputs 1 (yes) or 0 (no) for an input vector of x of m values. (x1, x2, …, xm) Multilayer Perceptron (MLP) Multilayer Perceptron ○ A perceptron with multiple layers. ○ Input and output layers are visible from the outside; other layers are hidden. ○ Linear function in each layer. Problems with Linearity Activation Functions A big jump => progressive learning (Learning little-by-little) is not likely. A big jump will not help in learning. Solution: A function that progressively changes from 0 to 1 smoothly, with no discontinuity => Activation Function. Mathematically, we need a continuous function that allows us to compute the derivative. Derivative is the amount by which a function changes at a given point. Sigmoid Activation Function Sigmoid function has small output changes in the range (0, 1) when the input varies in the range of (-∞, ∞). If z=wx+b is very large and positive, e-z is close to zero and ɸ(z) -> 1. If z=wx+b is very large and negative, e-z is close to ∞and ɸ(z) -> 0. So, for a neuron with sigmoid activation function, the changes are gradual. So output can be any value between 0 and 1. Sigmoid… Data Type: Numerical features derived from raw data, typically standardized or normalized. Target Variable: Binary, representing two classes (e.g., spam or not spam). Sigmoid Function: Converts raw scores to probabilities, facilitating decision-making based on a threshold. Ex. Detecting whether an email is spam or not. tanh Activation Function Output: -1 to +1 Data Type: Numerical features derived from historical and current data, typically standardized or normalized. Target Variable: Can take on positive and negative values, representing changes or differences (e.g., stock price changes). Tanh Function: Normalizes outputs to the range of -1 to 1, aiding in centering the data and improving model performance. Ex. Predicting stock price changes ReLU Activation Function Rectified Linear Unit ReLU addresses some optimization problems observed with sigmoids. ReLU is defined as f(x) = max(0,x) The function is zero for negative values and grows linearly for positive values. Suitable for handling numerical data where the values can range widely, but the primary concern is to pass only non-negative values through the activation function. ReLU is simple to implement. ReLU … Data Type: Numerical features, often non-negative or sparsely populated. Target Variable: Can be categorical (in classification tasks) or continuous (in regression tasks). ReLU Function: Activates neurons by setting negative inputs to zero and passing positive inputs unchanged, promoting sparsity and efficiency. Applications: ○ Image Classification: Used in convolutional layers to process pixel intensities. ○ Object Detection: Helps in identifying and classifying objects within images. ○ Regression Tasks: Effective in handling non-linear relationships in data with positive values. Issues in Deep Neural Networks Vanishing Gradient Problem ○ Gradients are used to update the parameters (weights) of the neural network. ○ Vanishing gradient happens when gradients become very small. ○ This causes the early layers of the neural network to learn slowly or to stop learning. ○ Prevalent with sigmoid and tanh. Dying ReLU Problem ○ Occurs when neurons with ReLU activation function output zero for all inputs. ○ Neurons become inactive or they die and stop contributing to the learning of the network. Pros ○ It avoids and rectifies vanishing gradient problem. ○ ReLu is less computationally expensive than tanh and sigmoid because it involves simpler mathematical operations. Cons ○ One of its limitations is that it should only be used within hidden layers of a neural network model. ○ Some gradients can be fragile during training and can die. It can cause a weight update which will makes it never activate on any data point again. In other words, ReLu can result in dead neurons. In another words, for activations in the region (x0, it can blow up the activation with the output range of [0, inf]. Leaky ReLU LeakyRelu is a variant of ReLU. Instead of being 0 when z Low error rate on training data => achieved by minimizing loss function. A model can become excessively complex when it tries to capture all the relations inherently expressed by the training data. Increase of complexity - 2 negative consequences ○ Increase in execution time. ○ Good performance on training data but not so good on validation data. Reason: Overfitting What is overfitting? ○ The problem of a model losing its ability to generalize is called overfitting. ○ Reason: The model is able to contrive relationships between many parameters in the specific context, but these relationships do not exist in a more generalized context. Causes of Overfitting ○ Too Many Parameters. ○ Insufficient Training Data. ○ Complex Models: Models that are too complex relative to the simplicity of the underlying data structure are prone to overfitting. How to identify your model is overﬁtting? Thumb Rule ○ if during the training we see that the loss increases on validation, after an initial decrease, then we have a problem of model complexity. Methods to prevent Overﬁtting Simplifying the Model ○ Reduce the complexity of the model Select fewer features Reduce the number of model parameters Cross Validation ○ Cross validate with subsets of data ○ Ensure that the model’s performance is consistent with subsets of data. Apply Regularization Techniques Apply pruning techniques for decision tree models ○ Remove branches that lack significant importance. Increase Training Data Early Stopping ○ Monitor the model’s performance on a validation set and stop training when the performance starts to degrade. Regularization Techniques L1 Regularization (LOSSO) ○ The complexity of the model is expressed as the sum of absolute values of the weights. L2 Regularization (Ridge) ○ The complexity of the model is expressed as the sum of the squares of the weights. Elastic Regularization ○ The complexity of the model is captured by a combination of the preceding two techniques. Batch Normalization Another form of regularization It enables to accelerate training by halving training epochs. This makes each layer re-adjust its weight to the different distribution for every batch. Make layer inputs more similar in distribution, batch after batch, epoch after epoch. Hyperparameter Tuning and Auto ML Hyperparameters Parameters of the network ○ Weights ○ Biases Parameters that can be optimized (Hyperparameters) ○ Number of Hidden Neurons ○ Batch Size ○ Number of Epochs ○ Learning Rate Hyperparameter Tuning ○ The process of finding the optimal combination of those hyperparameters that minimize cost functions. Auto ML ○ A set of research techniques aiming at both automatically tuning hyperparameters and searching automatically for optimal network architecture. Back Propagation The process through which multilayer perceptrons learn from training is called back propagation. A way of progressively correcting mistakes as they are detected. Initially, all the weights have random assignment. The net is activated for each input in the training set: values are propagated forward from the input stage through the hidden stages to the output stage where a prediction is made. The predicted output is compared with the true value and the error is calculated. The error is propagated back using an optimizer algorithm (ex. gradient descent) to adjust the neural network weights with the goal of reducing the error. Both forward and backward propagation are repeated until the error (loss function) gets behind a predefined threshold. The network progressively adjusts its internal weights in such a way that the prediction increases the number of correctly forecasted labels.

1_1_Artificial Neural Networks.pdf

Document Details

Tags

Related

Full Transcript

Upgrade to continue