Lecture 3.pdf

Convolutional Neural Network (Basics) Part 1 Motivations for CNNs Image data is large Consider a 64 x 64 RGB image ○ Image contains 64 * 64* 3 = 12288 integers ○ If we were to use a Dense network that outputs 128 Neurons, We get a matrix of size 12288 x 128 ⇒ 1572864 parameters just for the first layer Problems: ○ High number of parameters mean larger chances of over fitting ○ Difficult to train on memory constrained GPUs CNNs have been proposed as a possible solution How do Convolutional kernels work Convolutional kernels (Edge Detector) Edge detector kernel produces an output that highlights the presence of an edge Learning Kernels In the past, convolution kernels used to be handcrafted With CNNs, the idea is to let the neural network “learn” its own kernels via gradient descent This precludes the need for handcrafted kernels Padding In the previous slides, you will notice that convolutions reduce the output size Padding adds zeros to the input to preserve size Typically 2 options: ○ “Valid”: Do not add zeros to the input ○ “Same”: Adds zeros to the input such that the output size is the same as the input size Strided Convolutions Strides control how many steps the kernel moves Motivation: ○ Quicker downsampling Reduce chance of overfitting ○ Computational efficiency Precludes need to compute every single step ○ Larger increase in receptive field with each layer Receptive field ○ Defines how much information a particular pixel contains from the raw image ○ Akin to “memory” Receptive Field Dilated Convolutions Another way to increase receptive field Inserts 0’s within the kernel the artificially increase its size Calculation of output dimensions: Conv1D n: original input size p: padding amount k: kernel size d: dilation rate s: stride https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv1D Calculation of output dimensions: Conv2D https://www.tensorflow.org/api_docs/python/tf/ker as/layers/Conv2D Convolutions with Channels (Single filter) Up till now, our inputs only have 1 channel What happens when we process images with 3 channels (e.g. RGB images) Each kernel / filter will produce a single output channel Output channels depend on the number of kernels / filters Convolutions with Channels (Multiple filters) To get more channels in output, we simply increase the number of filters Outputs of each filter are concatenated along channel axis Example: Number of parameters in a single Conv layer We have a convolution kernel that is of size (3,3). We output 10 filters. This kernel operates on RGB images and uses bias. How many parameters does this layer have? This is significantly < ~1m parameters if a Dense layer was used Typical CNN structure Typically, kernels are of size 3, 5 or 7. Arbitrarily chosen Strides should be less than kernel size Height and width should reduce as the number of layers increase but the number of channels should increase Flattening and Dense layers at the end facilitate downstream classification or even regression tasks ○ https://www.tensorflow.org/api_docs/python/tf/keras/layers/Flatten ○ https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense Pooling Layers Pooling layers are also common in CNN architecture ○ Essentially same as a typical kernel but instead of learning weights, simply take the max of the kernel or average Max pooling: https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPooling2D Average pooling: https://www.tensorflow.org/api_docs/python/tf/keras/layers/AveragePooling2D ○ Pooling is done on a per channel basis ⇒ No change in number of channels Advantages: ○ Reduces size of input without the need to learn parameters Putting it altogether Feature extractor using: ○ CNN ○ Pooling layers Feature analyser using: ○ Dense and Dropouts End to end training Why are CNNs good for images Translational Invariance ○ Kernels “slide” across input. ○ Detects features even if they are in another location E.g. Edge detectors Sparsity of connections Each pixel in the output summarizes information from only a small number of input pixels ○ Prevents overfitting Convolutional Neural Network (Basics) Part 2 Residual Networks Motivation: ○ We never really know how many layers are optimal for a particular problem ○ Current strategies result in larger and larger networks with more layers but people realise a drop in performance even during training time ○ This deterioration was attributed to deeper networks having backpropagation challenges Harder to propagate gradients through the network as it becomes deeper Vanishing gradients problem: A phenomenon that occurs during the training of deep neural networks, where the gradients that are used to update the network become extremely small or "vanish" as they are backpropogated from the output layers to the earlier layers. ResNets were created to address this issue Residual Network Addition of skip connection ○ Facilitates the identity function ○ Skip connection provides another route for gradients to propagate backwards mitigating the vanishing gradients problem Residual Network Drawbacks: ○ Output size doesn’t change in order for addition operation to work Typical network design ○ A number of skip connections between layers ○ This is followed by pooling layers to downsample input size Computational cost of CNNs We have seen that CNNs significantly reduce the number of model parameters The trade off is computational cost. Number of input filters and output filters contribute to high computational cost Computational Cost Mitigations 1 x 1 Convolution layer Depthwise Separable Convolutions 1 x 1 Convolutions Nearly 10 times reduction in computational cost Depthwise Separable Convolution Depthwise Convolution Normal Convolution https://www.geeksforgeeks.org/depth-wise-separable- Data Augmentation We can do data augmentation on the fly to make the model more generalisable in real life tf.image.stateless_random_brightness tf.image.stateless_random_contrast tf.image.stateless_random_crop tf.image.stateless_random_flip_left_right tf.image.stateless_random_flip_up_down tf.image.stateless_random_hue tf.image.stateless_random_jpeg_quality tf.image.stateless_random_saturation Image Autoencoders Idea of autoencoders ○ Encoder summarizes input into embeddings (i.e. compressing of information into embedding space) ○ Decoder decodes the embeddings and recreates the original input Image Autoencoders (Example) Autoencoders Uses: ○ Data exploration Whether there is any innate structure within the data even without labels ○ Denoising autoencoders Encoders trained to extract non-noisy features Decoders trained to reconstruct non-noisy features Outcome: Removal of noise from images ○ Basis for many semi-supervised learning algorithms A way to create embeddings without the need for labels

Document Details

Tags

Related

Full Transcript

Upgrade to continue