Enhancing CNN Performance (PDF)
Document Details
Uploaded by GratefulSerpentine86
Tags
Summary
This document provides an overview of techniques used to enhance the performance of Convolutional Neural Networks (CNNs). It details concepts like dropout, residual connections, and batch normalization, emphasizing their role in preventing overfitting, improving gradient flow, and stabilizing training processes.
Full Transcript
The field of CIN has evolved significantly incorporating a range...
The field of CIN has evolved significantly incorporating a range , of advanced techniques to enhance their performance and adaptability. Core building blocks are : Droport regularization technique used to prevent overfitting. Residual Connections technique to improve gradient flow layer Batch Normalization technique to stabilize training processes. DropOUT. 1 As said before it's , a regularization technique used to prevent overfit. ting. A neuron is the fundamental computational unit that processes infor motion by receiving inputs , applying a mathematical operation and producing an output. The neurons are inspired by the biological neurons in the human brain. A neuron receives multiple input signals /often represented as numerical values which are weighted and summed. To this weighted sum , or bias term is added and the result is passed through an activation function. which introduces non-linearity into the network. This process enorbles the network to learn complex patterns and relationships in the data. EXAMPLE. In a simple Feed forward neural network suppose , we have: Three inputs : 11 = 0 5. , X = -0. 2 , X3 = 0. 8 · Corresponding weights : We = 0. 6 , Wr = - 0 1. , Wa = 0. 3. Bias term : b = 0 2.. The output of the neuron is calculated as : z = Wax1 + Wexz + c3X3 + b = 0. 7 If a ReLU activation function is applied , the final output will be max 10 z) 0 7 , =.. Let's remember that Overfitting means that a model becomes excessively tailored to the training data capturing not only the underlying , patterno but also the noise and specific details unique to the training set irrelevant patterns. Result is the model performs exceptionally well on training data but it struggles to generalize to unseen data. poor performance on validation. During training dropout randomly , deactivates a fraction of neurons in the layer applied to it's The neurons are dropped by setting their activations to zero. Probability of a neuron being dropped out is defined by the dropout rate "p" , where o pel. If p = o s. each neuron has a 50% of being dropped out. , Neurons dropped only during the training are. During inference/test or validation) no neurons , are dropped out. this is because the goal of inference is to achieve a stable and deterministic output using the trained model. Despite all of this dropout has its limitations, it may not significant , by improve performance in models where overfitting is not a concern /like extremely large dataseto). So , as during training the deactivation said before , is achieved by temporarily setting the output of these neurons to zero , effectively removing their contribution to both the forward pass and the backward pass during each iteration. By randomly deactivating neurons , the network is forced to distribute learning across multiple pathways rather than depending on specific neurons. prevents overfitting and morker the network more robust. In dropout it is used the Bernoulli Random Variables. To mountain the same magnitude of activations during inference each , neuron's activation is multiplied by 11-p). this scale ensures that theaverage contribution of neurons during inference matches their average contribution during training , despite the change in the number of active neurons. In the inverse dropout the scaling is applied during training , and NOT during inference. ensures that the expected value of activations remains consistent between training and inference, so there's no need to scale during inference. Vanishing Gradient Problem. 2 This problem is a fundamental challenge in training deep neural networks particularly those with many layers ,. It arises during back propagation process when gradients are , computed and propagated backward through the network from the output layer to the carlier layers. In deep networks the gradients are successively multiplied , as by the weights of each layer during back propagation they tend , to diminisn exponentially. Occurs because many activations functions squash their inputs into a small range ; consequently also their devicates fall within a small range. When gradients aremultiplied layer after layer , they shrink to values near zero for carlier layers. Some issues caused by the vanishing gradient problem are : FlowConvergeneter layers are nearly zero , so the networks optimization algorithm struggles to make meaningful updates to the weights in those layers This slow down the process.. Poor Training. Deep Networks fail to learn robust representations when earlier layers don't receive sufficient gradient signals. This prevents data from building a strong hierarchical under standing of the data. Reduced Performance. Earlier layers are inadequately trained so the network , is unable to extract high-quality lo-level features. Residual Connections /Skip connections) were introduced in ResNet to address these problems · A residual connection skips one or more layers by adding the input of a layer directly to its output. Each layer adds something to the previous value , rather than produt cing an entirely new coue. So, residual connections play addressing thea crucial role in vanishing gradient problem by allowing gradients to bypass intermediate layers during back propagation. this "shortcut" prevents the gradient from dimnishing excessively. Furthermore when the residual , function for =o /during initialization or when layers aren't fully optimazed) , the inputx is passed directly to the output without modification easy for the network to represent identity fun STANDARD SCALING. Preprocessing technique used in machine learning to standardize features in a dataset and ensuring that each feature has , a mean of zers a variance of one. Primary aim ofstandard scaling is to normalize the distribution of feature values so that all features contribute equally to the model's training process. x= (x u)- 22 BATCH NORMALIZATION. Is a technique in deep learning that aims to normalize the outputs of each layer /or block) in a neural network during training. The BN adjusts the distribution of the lager outputs , ensuring they have a mean of zero and a standard deviation of one. It normalizes activations during training. This method calculates the mean and the variance of the activations within a mini-batch during training and uses these statistics to normalize the output. DATA AUGMENTATION. A the performance and generalization technique used to enhance ability of models. It involves the process of artificially expanding the size and diversity of a training dataset by applying various transformations to the existing data. rotations , translations scaling, , flipping ecc. The varied examples from the primary goal is to create new, original data that mantain the original class labels or characte ristics , enabling the model to became more robust to variations and better equipped to generalize unseen data. Steps of this method data are small randomly selected subset of :. , 1) Sample a mini-batch of examples from the dataset. 2) For each example , apply one or more random transformations. 3) Train the model on the transformed mini-batch. -augmented By introducing randomness , data augmentation effectively enables the model to learn features that are invariant to these transformations. resulting in better generalization and robustness to small variation, improving generalization to new unseen , examples.