Convolutional Neural Nets (CNN) PDF

Convolutional Neural Nets Computer Vision --Abhijit Boruah Motivation Deep learning in computer vision ▫ Help self driving cars figure out the other cars and pedestrians around. ▫ Advanced facial recognition, enabling facial unlocking features in gadgets. ▫ Picture suggestions in apps and web pages. ▫ New art creations. Eg. Google Magenta project (a research project exploring the role of machine learning in the process of creating art and music. Uses CNN). Computer Vision Problems Detect cars and number of them for a self driving car Deep learning on larger images Input features :64*64*3 = 12288 Challenge: Memory Requirements Dimensions of W? (1000, 3m ) matrix Input features :1000*1000*3 = 3m 1000 3m 1000x1000 Edge Detection The first step to detect objects may be to How to detect edges For vertical edge detection, construct a 3x3 filter and convolute the image with the filter. -5 -4 0 8 -10 -2 2 3 * 0 -2 -4 -7 -3 -2 -3 -16 How is vertical edged 6x6x1 image (greyscale) detected in the above matrix?? Take element wise product 3*1+1*1+1*2+0*0+0*5+0*7+1*-1+8*-1+2*-1 Functions to implement convolution operator: Python: conv-forward Tensorflow: tf.nn.conv2d Keras: Conv2D Vertical Edge Detection Horizontal Edge Detection 3x3 filter for horizontal edge detection Strong +ve edge Strong -ve edge More Filters What are the best possible numbers that can be used in a filter? Sobel Filter: 1 0 -1 2 0 -2 1 0 -1 Puts more weights on the central pixel, more robust. Scharr Filter: 3 0 -3 10 0 -10 3 0 -3 Best option: Learn the 9 numbers by back propagation and treat them as parameters to get a good edge detector Padding Strided Convolutions Another basic building block of convolution. 91 100 83 69 91 127 44 72 74 7x7 Stride = 2 Hop over two steps Convolutions over RGB images = * 4*4 3(h),3(w),3(c) 6(h),6(w),3(c) Channel number on image and filter must match Convolution over RGB Images Take the 27 numbers from the filter and do multiplication with the values in corresponding channels of the image, add up the numbers to get the first value of the output matrix.. Multiple Filters( to detect all types of edges) Vertical filter 4*4*2 (where 2 is the number of filters used) Horizontal filter WA One Layer of a CNN Z ReLU A ( + b1 ) W ( + b2 ) A ReLU This computation from a 6x6x3 to a 4x4x2 Matrix is one layer of a CNN 4*4*2 One layer of CNN Number of filters Number of feature detectors Number of parameters on one layer: ▫ If you have 10 filters that are 3x3x3 in one layer of a neural network, how many parameters does that layer have? ▫ Ans: No of parameters for the 1st filter = 3*3*3+ 1(bias) = 28. Therefore total number of parameters = 28*10 = 280. Summary of notations Example of a ConvNet f =3 f =5 f =5 s =1 s =2 s =2 p =0 p =0 p =0 7 x 7 x 40 10 filters 37 x 37 x 10 20 filters 17 x 17 x 20 40 filters 39 x 39 x 3 nH = nW = 37 nH = nW = 17 nH = nW = 39 nc=10 nc=20 nc=3. Take all the 1960 numbers and unroll. them into a large vector, feed it into a Logistic Regression function to get the final. LogReg output. The output largely depends on selecting the hyper parameters. like number of filters, p, s etc.. 1960 Types of layers in a ConvNet Convolution (Conv) (application of a filter to an input) Pooling (pool) (Pooling layers are used to reduce the dimensions of the feature maps) Fully Connected (FC) (FF Neural Nets) Although it is possible to designing a CNN using only convolution layers, most architectures also have few pooling and FC layers. Pooling ConvNets often also use pooling layers to reduce the size of the representation, to speed the computation, as well as make some of the features detected a bit more robust. Max Pooling: Intuition-A high number will represent a feature detected. 9 2 hyperparams 6 3 f=2 s=2 2x2 4x4 Fixed computation, no parameters to learn Average Pooling: Take the average of the numbers. Neural Net Example X Conv1 Pool1 X X 7 …………… X f =5 maxpool = s =1 f=2 X s=2 32 x 32 x 3 6 filters 28x28x6 X Layer2 X Layer1 x X Vecto X X r X 400 X X X X Softmax function with 10 outputs X X FC3 FC4 X 120 X 84 X W = (120,400) X B = 120 x x The SoftMax Activation Function The softmax activation function is designed to work with multi-class classification tasks, where an input needs to be assigned to one of several classes. The function is used to transform raw, unbounded scores (which are often referred to as logits) into a probability distribution over multiple classes. It basically assigns probabilities to each class, indicating how likely it is that an input belongs to that class. The softmax function uses a vector of real numbers as input and returns another vector of the same dimension, with values ranging between 0 and 1. Mathematical Formulae How Softmax Works in Neural Netwo rks Output Layer: In a neural network for classification, the final layer typically has one neuron for each class. The outputs are the logits. Conversion to Probabilities: The softmax function converts these logits into probabilities that sum to 1. This makes it easier to interpret the output as the network’s confidence in each class. Training: During training, the loss function uses these probabilities to update the model weights to improve its predictions. Use Cases The Softmax function is commonly used in: Image classification: Assigning probabilities to different classes (e.g., dog, cat, airplane). Natural Language Processing (NLP): Classifying text into differe nt categories (e.g., sentiment analysis).

Convolutional Neural Nets (CNN) PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue