Deep Learning and Variants_Lecture 8_20240217.pdf
Document Details
Uploaded by PalatialRelativity
Tags
Related
- Deep Learning and Variants_Session 4_20240127.pdf
- Deep Learning and Variants_Session 5_20240128.pdf
- Deep Learning and Variants_Lecture 6_20240204.pdf
- 23-24 - M2AI - DL4CV - 1 - Deep Learning 57-92.pdf
- COMP9517 Deep Learning Part 1 - 2024 Term 2 Week 7 PDF
- Deep Learning (Ian Goodfellow, Yoshua Bengio, Aaron Courville) PDF
Full Transcript
Presents Deep Learning & its variants GGU DBA Convolutional Neural Network Dr. Anand Jayaraman Professor, upGrad; Chief Data Scientist, Agastya Data Solutions [email protected] Why images are hard setosa.io/ev/image-kernels Understanding images If each pixel is considered an attribute, a 1024X1...
Presents Deep Learning & its variants GGU DBA Convolutional Neural Network Dr. Anand Jayaraman Professor, upGrad; Chief Data Scientist, Agastya Data Solutions [email protected] Why images are hard setosa.io/ev/image-kernels Understanding images If each pixel is considered an attribute, a 1024X1024 resolution has roughly 3 million nodes (as there are 3 channels, R,G and B) This is extremely huge set of inputs for any machine learning problem Fashion MNIST Fashion MNIST 70k Images 10 Classes 28x28 pixels Building a classification model using traditional Neural Network Works reasonably well But.. requires all test images also to be 28x28 grayscale pixels and to be centered. But what about others? Manual Feature extractions Shirt is something with 2 hands and a big body Shoes have a sole, and a covered top Etc.. Traditional approaches Use a domain expert identify important features that help perform a desired task on the image (eg. classification) Mathematicians designed (hand engineered) filters (which are typically matrices) that can extract features from an image Traditional vision A filter extracts the presence or absence of a feature in a location Create second level features (most important regions where the features are present and the intensity of the features etc.) Build ML on top of these features Limitations Not aligned with the way brain seems to be handling (as we do not do explicit feature engineering in most cases) Accuracies were not encouraging VISION: DEEP LEARNING APPROACHES Hubel and Wiesel The Hubel and Wiesel experiments greatly expanded the scientific knowledge of sensory processing. They won a Nobel in 1981 for this work. In 1959, they inserted a microelectrode into the primary visual cortex of an anesthetized cat. They then projected patterns of light and dark on a screen in front of the cat. They found that – Some neurons fired rapidly when presented with lines at one angle, while others responded best to another angle. They called these neurons "simple cells." – Still other neurons, which they termed "complex cells," responded best to lines of a certain angle moving in one direction. – These studies showed how the visual system builds an image from simple stimuli into more complex representations Architecture inspired by brain Local connectivity (as shown by more recent MRI scans) Hierarchical feature development (as shown by Hubel and Wiesel) Our objectives Deep neural nets – With local connectivity – Automatic hierarchical feature engineering – Manageable number of weights Computer Vision – Growth ILSVRC (ImageNet Large-Scale Visual Recognition Challenge) 2011 Machine error ~30% 2011 Expert human error ~5% AlexNet error (15.4%) 2012 2013 Less skilled human ~ 10% ZFNet error (11.2%) Winner Googlenet (6.7%) 2014 Runner up VGG net (7.3%) 2015 2015 Winner Microsoft ResNet (3.6%) CONVOLUTION TO ACHIEVE THEM ALL Convolution: Let us understand how convolution extract features and then apply it to Neural nets Formally convolution is defined as Convolution Convolution involves a function that is convolved and a function that is convolving (known as feature) Monochromatic image and filter 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 0 1 0 0 1 1 0 0 1 0 0 1 1 0 0 1 0 1 Image Filter Convolution layer Convolution layer: Local connectivity Spatial arrangement Parameter sharing http://www.insofe.edu.in Convolution operation on an image Feature Map: Every cell in feature map is the result of applying (dot product) a kernel/filter/weight-matrix on a specific region of input. An Input image represented as matrix of pixel values Blue is the input. Green is the resultant feature map. Shaded blue is kernel. http://www.insofe.edu.in Convolution with right features engineers features setosa.io/ev/image-kernels Instead of sending in image to dense layer – we send extracted features from the images Feature maps/multi-channel image as input to convolution layer What happens if the input is not a single channel image but a RGB image or set of feature maps (output of the previous layer?) http://www.insofe.edu.in Convolution in 3 layers =(1*0)+(1*0)+(1*1)+(1*1)+ (0*1)+(1*0)+(1*1)+(0*0)+ (1*1)+(2*1)+(3*1)+(4*1) = 13 Even a 3D convolution gives a 2D output Spread sheet 3D convolution visualization Intuitively, the network will learn filters that activate when they see some type of visual feature such as an edge of some orientation or a blotch of some color on the first layer, or eventually entire honeycomb or wheel-like patterns on higher layers of the network. As we slide the filter over the width and height of the input volume we will produce a 2-dimensional activation map that gives the responses of that filter at every spatial position. Now, we will have an entire set of filters in each CONV layer (e.g. 6 filters), and each of them will produce a separate 2-dimensional activation map. We will stack these activation maps along the depth dimension and produce the output volume. Pooling operation Max-pooling layer: Local translational invariance dimensionality reduction http://www.insofe.edu.in Pooling operation: local translational invariance Input Image 2 Input Image 1 Filter/kernel 0 0 0 0 0 1 3 1 0 0 0 0 0 0 0 2 8 2 0 0 0 0 0 0 0 1 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 3 1 0 0 0 0 0 0 0 0 2 8 2 0 0 0 0 0 0 0 0 0 1 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 Feature map http://www.insofe.edu.in Pooling operation: dimensionality reduction http://www.insofe.edu.in INTERPRETING CONVOLUTION AS NEURAL NETWORKS Convolution1D operation in ANN X is a input record/sample from our dataset/distribution x1 x2 x3 x4 Three neurons applying same filter in three different locations of the input with a little overlap. x5 x6 x7 http://www.insofe.edu.in Convolution2D operation in ANN x11,x12,x13,x21,x22,x23,x31,x32,x Feature map size 3x3 33 Conv window size 3x3 Input size 5x5 w11,w12,w13,w21,w22,w23,w31,w32,w 33 http://www.insofe.edu.in Filters are weights 1 1 1 0 1 1 1X1 1X0 1 1X0 1X1 0 1 0 1 0 0 1x1+0x0+1x0+1x1 2 1 1 0 0 0 We can consider nodes to be locally connected (or) Most weights to be constrained to be zero (regularization) 1 0 0 1 0 2 Weights are shared 1 W1 1X1 1X0 1 1 W2 1X0 1X1 0 1 0 1 2 W1 1 2 W2 1 1X1 1X0 1 1X0 0X1 0 1 1 W3 2 1 1 W4 1 1 1 1 2 1X1 1X0 0 1 1X0 0X1 1 1 0 W1 W2 1 1 2 1 1 1X1 0X0 1 2 1 0X0 1X1 0 W3 W4 1 W4 1 W1 W2 W3 1 W4 1 1 W3 2 Convolution2D operation in ANN 2D input X (5,5) conv filter/kernel (3,3) Feature map (3,3) http://www.insofe.edu.in Remember Fashion MNIST example? Before CNN we did this.. From: https://www.youtube.com/watch?v=x_VrgWTKkiM&t=4s Instead of sending in image to dense layer – we send extracted features from the images From: https://www.youtube.com/watch?v=x_VrgWTKkiM&t=4s With CNN we do this Supervised learning: Convolutional Neural Networks-CNN http://www.insofe.edu.in An Animation of 2D CNN https://www.youtube.com/watch?v=CXOGvCMLrkA