Image Classification / Regression and More 2024 PDF
Document Details
Uploaded by LovingTrust3561
Centre Universitaire Aflou
2024
SELLAM Abdellah
Tags
Summary
This document is a lecture note from the University Center of Aflou, covering Computer Vision topics like image classification and regression, and feature extraction using handcrafted descriptors, including Local Binary Patterns (LBP) and Histogram of Oriented Gradients (HOG). The 2024 content focuses on methods involving image feature learning and prediction.
Full Transcript
3. Image Classification / Regression and More Computer Vision SELLAM Abdellah University Center of Aflou 2024 SELLAM Abdellah (CU-Aflou) 3. I...
3. Image Classification / Regression and More Computer Vision SELLAM Abdellah University Center of Aflou 2024 SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 1 / 139 Table of Contents 1 Introduction The complex nature of input images Classification vs Regression 2 General Pipeline General Image Classification / Regression Architecture 3 Handcrafted Descriptors Overview Local Binary Patterns (LBP) Histogram of Oriented Gradients (HOG) 4 Feature Learning Overview Supervised Feature Learning Unsupervised Feature Learning Transfer Learning SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 2 / 139 The structure of images (1 / 3) After the complex process of image formation presented in the previous chapter, we end up with a multidimensional array (tensor) representing the image. Each element in this array is a called a pixel. The pixel contains numerical information about the light collected at a specific rectangular area of the image. Each pixel can either contain a single value indicating the intensity of the light regardless of the color. In this case we call the image: a Gray-scale image. Or it each contain three values indicating the intensity of the red, green, and blue lights. In this case we call the image: a RGB image. In signal processing, the digital image is considered a discrete 2D signal that has a width and height and a single channel if the image is gray-scale or three channels if the image is RGB. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 3 / 139 The structure of images (2 / 3) HEIGHT S EL N N A H C WIDTH The image as a 3D array or a 2D signal SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 4 / 139 The structure of images (3 / 3) Note that the algorithms we will see in this chapter differ in their handling of images. Some algorithms work on single-channel images or on each channel separately. Other approaches work on multiple channels simultaneously and produce multichannel results. Some approaches produce feature-maps very similar in nature to images. In the literature, several terms are use to refer to both feature-maps and images, these terms include: 2D Signal, Multidimensional Array, Volume,... SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 5 / 139 Table of Contents 1 Introduction The complex nature of input images Classification vs Regression 2 General Pipeline General Image Classification / Regression Architecture 3 Handcrafted Descriptors Overview Local Binary Patterns (LBP) Histogram of Oriented Gradients (HOG) 4 Feature Learning Overview Supervised Feature Learning Unsupervised Feature Learning Transfer Learning SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 6 / 139 Prediction, Classification, and Regression Prediction is the task of assigning semantic labels to input data (In our case, images). Classification: If the assigned labels are discrete (integer) values corresponding to groups, classes, categories, or tags then the prediction task is called classification. Examples: Face-based Identification, Emotion Recognition, Iris-based Identification, Image Segmentation, Object Recognition. Regression: If the assigned labels are continuous (real) values corresponding to the output of some underlying function or property then the prediction task is called Regression. Examples: Face-based Age Estimation, 3D depth Estimation, Object Localization, Heatmap Regression. The labels should be inferred semantically from the input images (they should reflect the meaning of the scenes depicted by images). SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 7 / 139 Table of Contents 1 Introduction The complex nature of input images Classification vs Regression 2 General Pipeline General Image Classification / Regression Architecture 3 Handcrafted Descriptors Overview Local Binary Patterns (LBP) Histogram of Oriented Gradients (HOG) 4 Feature Learning Overview Supervised Feature Learning Unsupervised Feature Learning Transfer Learning SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 8 / 139 Typical Pipeline f1 f2 f3 … Output Feature Extraction Prediction fM Feature Vector Most Image Classification / Regression approaches have the architecture depicted by the figure above. This architecture comprises two steps: Feature Extraction step. Prediction step. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 9 / 139 Feature Extraction Images are 2D signals captured using electronic sensors. They are highly complex which makes it very hard to process them directly by most state-of-the-art classifiers. Aspects of this complexity include: Illumination changes and different environments and weather conditions. Geometrical transformations of objects in the image (translation, rotation, scale, etc.). Noise and undesirable artifacts. Different camera angles and positions changing the signal shape drastically. Partial occlusion hiding parts of the objects of interest. As a result, Feature Extraction is used in computer vision tasks to produce features reducing the effects of these changes and artifacts. We can classify Feature Extraction methods into two categories: 1 Handcrafted Descriptors. 2 Feature Learning. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 10 / 139 Prediction The prediction step consists of learning how to produce labels from feature-vectors produced by the feature extraction algorithm. This step can be established using any state-of-the-art classifier or regression method. These methods include: SVM, KNN, Decision Trees, Neural Networks, Random Forest, etc. The prediction algorithm can also be either: Feed Forward in the case of fixed-length output: scalar, vector, matrix,... Recurrent in the case of variable-length output (sequence): time-series, text,... In this course, we will focus on the Feature Extraction algorithms because they are more related to Computer Vision. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 11 / 139 Table of Contents 1 Introduction The complex nature of input images Classification vs Regression 2 General Pipeline General Image Classification / Regression Architecture 3 Handcrafted Descriptors Overview Local Binary Patterns (LBP) Histogram of Oriented Gradients (HOG) 4 Feature Learning Overview Supervised Feature Learning Unsupervised Feature Learning Transfer Learning SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 12 / 139 Handcrafted Descriptors Over the years, computer vision scientists have proposed several handcrafted algorithms for extracting feature-vectors from images. Each one of these algorithms performs a fixed set of operations and steps on input images to produce feature-vectors or descriptors. These descriptors were designed to be robust against illumination and shape variations. Note that most of these descriptors operate on gray-scale images or on each color-channel separately. Examples of such descriptors include: Local Binary Patterns (LBP). Histogram of Oriented Gradients (HOG). Local Phase Quantization (LPQ). Scale-Invariant Feature Transform (SIFT). Weber Local Descriptor (WLD). Binarized Statistical Image Features (BSIF).... SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 13 / 139 Table of Contents 1 Introduction The complex nature of input images Classification vs Regression 2 General Pipeline General Image Classification / Regression Architecture 3 Handcrafted Descriptors Overview Local Binary Patterns (LBP) Histogram of Oriented Gradients (HOG) 4 Feature Learning Overview Supervised Feature Learning Unsupervised Feature Learning Transfer Learning SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 14 / 139 LBP (1 / 3) LBP is a texture descriptor introduced in 1994 by Ojala et al.. LBP works on single-channel images. LBP extracts a local feature at each pixel by comparing it to its surrounding pixels sampled from a Neighborhood Circle. LBP has two parameters that control its behavior: 1 r: The radius (in pixels) of the Neighborhood Circle. 2 n: The number of pixels sampled from the Neighborhood Circle. The figure below illustrates the operation of LBP with r = 1 and n = 8 on a single pixel: 35 47 33 1 1 1 8 12 10 0 0 11100110b = 230 17 21 5 1 1 0 The LBP operation is executed on each pixel of the image which results in an integer matrix of LBP codes. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 15 / 139 LBP (2 / 3) To calculate the LBP matrix of an input image I, we use the following steps: 1 For each pixel I(x, y ), calculate its LBP code using these steps: 1 Sample n pixels from a ring of radius r centered at pixel I(x, y ). 2 Compare each sampled pixel with the center pixel (I(x, y )). 3 Convert the n comparisons into a n-bit string using the following equation: ( 1 if d ≥ 0 f (d) = 0 otherwise Where d is the difference between the currently sampled pixel and the center pixel. 4 Convert the n-bit string into an n-bit unsigned integer code (LBP code). 2 Aggregate the LBP codes of all pixels to form a matrix (the LBP image). SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 16 / 139 LBP (3 / 3) The histogram of the LBP image can then be used as the LBP feature-vector. Most approaches will divide the LBP image into a grid of non-overlapping cells and extract a histogram from each cell. The concatenation of all LBP histograms is used as a feature-vector. 4 1 6 6 4 4 6 5 2 3 3 7 0 1 2 1 0 5 4 2 0 1 2 3 4 5 6 7 LBP Matrix LBP SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 17 / 139 Exercise 1 Calculate the LBP descriptor of the gray-scale image below with n = 8 and r = 1. 7 4 0 6 7 9 3 2 5 4 2 0 4 1 0 0 2 2 3 8 2 8 8 0 7 4 5 3 8 8 2 1 9 5 1 2 SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 18 / 139 Uniform LBP LBP codes extracted using the procedure presented in the previous section can be classified into two categories: Uniform codes and Non-uniform codes. An LBP code is said to be uniform if and only if there are at max two 0 ↔ 1 transitions (changes from 1 to 0 or 0 to 1) in its bit string. 00111000 11110011 00000111 11111111 01011100 01001001 10011011 11010111 The uniform LBP feature-vector is extracted by building a histogram of uniform LBP codes only or grouping all non-uniform LBP codes into one histogram bin. The advantages of Uniform LBP include: Uniform codes correspond to more natural and smooth features (several 0 ↔ 1 transitions are indicators of noise or a stochastic texture). Uniform LBP reduces the size of the feature-vector. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 19 / 139 Exercise 2 Calculate the Uniform LBP descriptor of the gray-scale image from exercise 1 with the same parameters. 7 4 0 6 7 9 3 2 5 4 2 0 4 1 0 0 2 2 3 8 2 8 8 0 7 4 5 3 8 8 2 1 9 5 1 2 SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 20 / 139 Rotation Invariant LBP Rotation Invariant LBP groups together all LBP codes that can be obtained from one another using bit rotations. For example, the codes 117 (0111 0101b ), 93 (0101 1101b ), and 174 (1010 1110b ) are considered the same. Bit rotations of the LBP code correspond to rotations of the neighborhood circle of pixels with respect to the central pixel. Consequently, assigning the same code to all bit rotations gives the resulting feature-vector some degree of rotation invariance. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 21 / 139 Exercise 3 Calculate the Rotation Invariant LBP descriptor of the gray-scale image from exercise 1 with the same parameters. 7 4 0 6 7 9 3 2 5 4 2 0 4 1 0 0 2 2 3 8 2 8 8 0 7 4 5 3 8 8 2 1 9 5 1 2 SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 22 / 139 Table of Contents 1 Introduction The complex nature of input images Classification vs Regression 2 General Pipeline General Image Classification / Regression Architecture 3 Handcrafted Descriptors Overview Local Binary Patterns (LBP) Histogram of Oriented Gradients (HOG) 4 Feature Learning Overview Supervised Feature Learning Unsupervised Feature Learning Transfer Learning SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 23 / 139 HOG HOG is a texture descriptor introduced in 2005 by Dalal et al.. To calculate the HOG descriptor of an image I the following steps are performed: 1 The computation of horizontal and vertical gradients. 2 Computing the oriented gradients image. 3 The binning of oriented gradients. 4 Constructing the histogram. Input Image Histogram of Oriented Gradients SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 24 / 139 Computing horizontal and vertical gradients (1 / 2) We need to compute the gradient of the image I along the x and y axes resulting in two images Gx and Gy. The computation of the gradients can be performed by convolving the original image I by: The two kernels: [ –1 0 1 ] and [ –1 0 1 ]⊤ Or more complex filters like the Sobel filter. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 25 / 139 Computing horizontal and vertical gradients (2 / 2) -27 -3 -2 5 16 -5 -9 6 6 -16 -3 22 ✱ -1 0 1 -23 -8 -4 -6 29 16 2 13 0 18 0 9 22 -28 13 18 29 13 20 19 0 17 14 -12 7 16 13 0 10 22 27 19 4 11 0 5 Gx 1 22 1 31 23 3 -22 0 11 -13 10 4 -1 10 2 10 19 24 7 14 1 -25 -2 -20 -14 ✱ 0 I -6 6 -12 31 13 -19 +1 -17 -17 6 8 24 2 Gy SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 26 / 139 Computing the oriented gradients To compute the orientations of gradients we use the formula G Θ = tan–1 Gyx 16 -5 -9 6 6 -16 -3 22 -23 -8 -4 -6 0 9 22 -28 0 -66 55 59 Gx 9 57 34 -42 -15 56 -83 -65 0 11 -13 10 -90 34 20 -41 1 -25 -2 -20 𝝝 6 -12 31 13 -17 6 8 24 Gy SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 27 / 139 Binning the orientations As we can see, the orientation values are in the range [–90◦.. + 90◦ ]. Binning consists of dividing this range into bins, then assign each orientation an integer code corresponding to its range. The figure below shows the operation of binning with 8 bins. 0 -66 55 59 4 1 6 6 9 57 34 -42 4 6 5 2 -15 56 -83 -65 3 6 0 1 -90 34 20 -41 0 5 4 2 𝝝 B SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 28 / 139 Constructing the histogram of oriented gradients The final HOG descriptor is obtained by taking the histogram of orientations (see the figure on the bottom). Note that Most approaches will divide the HOG binning image into a grid of non-overlapping cells and extract a histogram from each cell. The concatenation of all HOG histograms is used as a feature-vector. 4 1 6 6 4 6 5 2 4 3 3 6 0 1 2 0 5 4 2 1 0 1 2 3 4 5 6 7 B SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 29 / 139 Exercise 4 Calculate the HOG descriptor of the gray-scale image below with 8 bins and a cell size of 4 × 4 pixels. 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 2 2 0 0 0 0 0 0 2 2 0 0 0 0 0 0 0 0 0 SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 30 / 139 Table of Contents 1 Introduction The complex nature of input images Classification vs Regression 2 General Pipeline General Image Classification / Regression Architecture 3 Handcrafted Descriptors Overview Local Binary Patterns (LBP) Histogram of Oriented Gradients (HOG) 4 Feature Learning Overview Supervised Feature Learning Unsupervised Feature Learning Transfer Learning SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 31 / 139 Motivation Handcrafted image descriptors have shown good performance over the years. However, they suffer from several drawbacks, including: They are not very robust to changes in lighting and geometrical transformations. They do not generalize well in large datasets of images. They are hard to derive and invent. They are hard to tune (the optimal values of their parameters change for different problems). Hence, it is clear that a feature extraction method with the ability to adapt itself to various problems is needed. Such a category of feature extraction approaches is known as Feature Learning. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 32 / 139 Neural Networks for Feature Learning Before Deep Learning, attempts to learn features were in fact learning mid-level features from handcrafted features instead of learning from raw inputs. However, with the recent: Improvement of computing hardware. Introduction of several large labeled datasets. Improvements to Neural Networks including: Convolutional and Pooling layers, ReLU activation, Batch Normalization,... Neural Networks successfully achieved large advancement towards Feature Learning. Note that Deep Learning is founded mainly on the ideas of Feature Learning and Processing Raw Inputs. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 33 / 139 The Artificial Neuron: The Basic Unit of a Neural Network An artificial Neuron is a simulation of the biological Neuron. The Neuron is connected to a set of n inputs x1 , x2 ,... , xn. The n connections to the inputs are associated with a set of real weights w1 , w2 ,... , wn. The neuron computes a weighted sum of inputs and weights then adds a values b called bias to obtain a real value h = ni=1 wi xi + b. P The output of neuron is obtained by applying a non-linear activation σ to the value h. x1 w1 x2 w2 n 𝝈 (Σ w x + b) i=1 i i wn xn SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 34 / 139 The Neural Network The weights w1 , w2 ,... , wn and the bias b of the neuron are called parameters. These parameters can be adjusted to change the function represented by the neuron. However, the set of functions represented by a single neuron is very limited and cannot solve very complex tasks. Hence, we generally connect several neurons together by making the outputs of some neurons the input of others. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 35 / 139 The Layered Neural Network In practice, the neurons of an Artificial Neural Network are organized into Layers. Each layer is fed the output of another layer as input. If the layers are stacked one after the other without cycles and loops, we call the network Feed-Forward. If the connection between layers introduce cycles, we call the network Recurrent. Input Output Output Input Feed-Forward Neural Network Recurrent Neural Network SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 36 / 139 Activation Functions As we saw earlier, an Activation function is a non-linear function applied to the output of the neurons. Some of the most used activations are: ReLU (Rectified Linear Unit): Used in the hidden ( layers to prevent the 0 If x < 0 problem of Vanishing Gradients. ReLU(x) = x Otherwise Sigmoid: Ensures that its output is in the range ]0..1[, used in the output layer in the tasks of Binary Classification or Image 2 Image Regression. σ(x) = 1+e 1. –x Softmax: Ensures that the sum of a layer’s neurons is equal to 1 and makes the activation of the neuron with the maximum value closer to 1. This activation is used in Multiclass Classification. Given that the pre-activation values of the neurons of target layer are x1 , x2 ,... , xn , x e j the activation of the j t h neuron is: Softmax(xj ) = P i=1 n xi. e SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 37 / 139 Table of Contents 1 Introduction The complex nature of input images Classification vs Regression 2 General Pipeline General Image Classification / Regression Architecture 3 Handcrafted Descriptors Overview Local Binary Patterns (LBP) Histogram of Oriented Gradients (HOG) 4 Feature Learning Overview Supervised Feature Learning Unsupervised Feature Learning Transfer Learning SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 38 / 139 Neural Network Training Each neuron in the network has a set of parameters, namely, its weights and bias. The process of Training consists of adjusting the parameters of all neurons, so that the network returns the correct outputs for all inputs in a training dataset. This process is backed up by the Gradient Descent algorithm and its variants. N1 w11 w12 w13 b1 N1 N2 N3 w21 w22 w23 b2 N2 N3 w31 w32 b3 SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 39 / 139 Loss Functions and Metrics Loss Functions are used to model the prediction error that should be minimized to optimize the weights during training. We have several losses depending on the prediction task, among them we have: Binary Cross-Entropy. This loss is used in Binary Classification problems. It requires a single neuron in the output Layer with a Sigmoid activation function. Categorical Cross-Entropy. This loss is used in Multi-class Classification problems. It requires two or more neurons in the output Layer with a Softmax activation function. Mean Squared Error (MSE). This loss is used in Regression problems. It is proportional to the L2 distance between the predicted and ground truth labels. Mean Absolute Error (MAE). This loss is used in Regression problems. It is proportional to the L1 distance between the predicted and ground truth labels. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 40 / 139 Feature Learning using a Neural Network (1 / 3) The last layer of a Neural Network is call the Output Layer. Intermediary layers before the Output Layer are called Hidden Layers. Hidden Layers map the input of the Network to another space, in other words, they Extract Features. Input Hidden Hidden Output Layer Layer 1 Layer 2 Layer Feature Extraction Prediction SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 41 / 139 Feature Learning using a Neural Network (2 / 3) Using the Training Algorithm of a Neural Network we can adapt both the Prediction and Feature Extraction parts of the Network to the task we want to solve. This results in a Feature Extraction algorithm tailored to the task at hand, in other words Feature Learning. Input Hidden Hidden Output Layer Layer 1 Layer 2 Layer Feature Extraction Prediction SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 42 / 139 Feature Learning using a Neural Network (3 / 3) Although neural networks have an outstanding representational power, they were not successfully used for feature learning from raw images until recently for the following reasons: 1 The curse of dimensionality. Images have a large number of pixels which results in a large number of weights and Requires a huge number of training samples. 2 High memory usage. We need to keep lots of data and parameters in memory. 3 High computational power. A large number of computation is needed to deal with the huge amount of data and parameters. 4 Vanishing gradients. The use of the sigmoid or tanh activation functions in hidden layers limited the number of layers in the networks because they have flat regions, thus very small values of gradients. This problem was resolved by using the ReLU activation in hidden layers. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 43 / 139 The Dense (Fully-Connected) Layer The simplest type of layers is the Dense (Fully-Connected) layer. Each neuron in such a layer is connected to all neurons of the input layer. Number of parameter = (number of input neurons + 1) × number of output neurons. Input Output SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 44 / 139 The Convolutional Layer (1 / 2) The problem with Dense layers is their big number of parameters that grows with the size of the input and output. This makes them unsuitable to use with images as inputs because they have a large number of pixels. Convolutional layers solve this problem by introducing two concepts: 1 Sparse Connectivity: Each neuron in layer i is connected only to a subset of neighboring neurons from layer i – 1. 2 Weight Sharing: Convolutional layers applies the same weights to all the output neurons of the same kernel in a Convolutional layer. Consequently, the number of parameters in Convolutional layers does not depend on the sizes of the input and output, but rather on the number of kernels, the size of kernels, and the number of input channels. We have two main types of Convolutional layers: 1 2D Convolutions: Process 2D signals (e.g. Images). 2 1D Convolutions: Process 1D signals (e.g. Audio). SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 45 / 139 The Convolutional Layer (2 / 2) Dense Layer Connections Convolutional Layer Connections Connections of a Fully Connected layer (left) vs a Convolutional layer (right) when applied to an Image (2D Signal) SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 46 / 139 The 2D Convolutional Layer (1 / 10) Kernel 9 6 2 3 0 8 -1 0 -1 3 8 9 3 0 9 ? 1 1 0 7 6 3 0 0 2 0 -1 1 7 9 4 1 0 5 Bias 1 2 1 1 3 3 -2 0 2 0 9 4 7 Activation I ReLU An illustration of the algorithm of a 2D Convolutional Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 47 / 139 The 2D Convolutional Layer (2 / 10) Kernel 9 6 2 3 0 8 -1 0 -1 3 8 9 3 0 9 -3 1 1 0 7 6 3 0 0 2 0 -1 1 7 9 4 1 0 5 Bias 1 2 1 1 3 3 -2 0 2 0 9 4 7 Activation I ReLU An illustration of the algorithm of a 2D Convolutional Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 48 / 139 The 2D Convolutional Layer (3 / 10) Kernel 9 6 2 3 0 8 -1 0 -1 3 8 9 3 0 9 -5 1 1 0 7 6 3 0 0 2 0 -1 1 7 9 4 1 0 5 Bias 1 2 1 1 3 3 -2 0 2 0 9 4 7 Activation I ReLU An illustration of the algorithm of a 2D Convolutional Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 49 / 139 The 2D Convolutional Layer (4 / 10) Kernel 9 6 2 3 0 8 -1 0 -1 3 8 9 3 0 9 0 1 1 0 7 6 3 0 0 2 0 -1 1 7 9 4 1 0 5 Bias 1 2 1 1 3 3 -2 0 2 0 9 4 7 Activation I ReLU An illustration of the algorithm of a 2D Convolutional Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 50 / 139 The 2D Convolutional Layer (5 / 10) Kernel 9 6 2 3 0 8 -1 0 -1 3 8 9 3 0 9 0 ? 1 1 0 7 6 3 0 0 2 0 -1 1 7 9 4 1 0 5 Bias 1 2 1 1 3 3 -2 0 2 0 9 4 7 Activation I ReLU An illustration of the algorithm of a 2D Convolutional Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 51 / 139 The 2D Convolutional Layer (6 / 10) Kernel 9 6 2 3 0 8 -1 0 -1 3 8 9 3 0 9 0 5 1 1 0 7 6 3 0 0 2 0 -1 1 7 9 4 1 0 5 Bias 1 2 1 1 3 3 -2 0 2 0 9 4 7 Activation I ReLU An illustration of the algorithm of a 2D Convolutional Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 52 / 139 The 2D Convolutional Layer (7 / 10) Kernel 9 6 2 3 0 8 -1 0 -1 3 8 9 3 0 9 0 3 1 1 0 7 6 3 0 0 2 0 -1 1 7 9 4 1 0 5 Bias 1 2 1 1 3 3 -2 0 2 0 9 4 7 Activation I ReLU An illustration of the algorithm of a 2D Convolutional Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 53 / 139 The 2D Convolutional Layer (8 / 10) Kernel 9 6 2 3 0 8 -1 0 -1 3 8 9 3 0 9 0 3 1 1 0 7 6 3 0 0 2 0 -1 1 7 9 4 1 0 5 Bias 1 2 1 1 3 3 -2 0 2 0 9 4 7 Activation I ReLU An illustration of the algorithm of a 2D Convolutional Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 54 / 139 The 2D Convolutional Layer (9 / 10) Kernel 9 6 2 3 0 8 -1 0 -1 3 8 9 3 0 9 0 3 8 0 1 1 0 7 6 3 0 0 2 0 0 0 0 0 -1 1 7 9 4 1 0 5 3 5 2 0 Bias 1 2 1 1 3 3 0 0 0 0 -2 0 2 0 9 4 7 Activation I ReLU An illustration of the algorithm of a 2D Convolutional Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 55 / 139 The 2D Convolutional Layer (10 / 10) A convolutional layer can have several kernels, not just one. Each kernel have its own bias. The number of channels in the kernel is equal to the number of channels in the input of the convolutional layer. Each kernel produces a channel in the output of the convolutional layer. Number of Parameters = (Kw × Kh × Kc + 1) × Oc Where: Kw and Kh are the width and height of the convolutional kernels, respectively. Kc is the number of channels of the convolutional kernels (the same as the number of channels in the input Ic ). Oc is the number of channels in the output (the same as the number of kernels in the convolutional layer NK ). SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 56 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 57 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 58 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 59 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 60 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 61 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 62 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 63 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 64 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 65 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 66 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 67 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 68 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 69 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 70 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 71 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 72 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 73 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 74 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 75 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 76 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 77 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 78 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 79 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 80 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 81 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 82 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 83 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 84 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 85 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 86 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 87 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 88 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 89 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 90 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 91 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 92 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 93 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 94 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 95 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 96 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 97 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 98 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 99 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 100 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 101 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 102 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 103 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 104 / 139 The 2D Convolutional Layer (11 / 11) An illustration of the algorithm of a 2D Convolutional Layer with 2 kernels of size 3 × 3 on an input tensor of 4 channels SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 105 / 139 Padding (1 / 3) As we saw earlier, applying a convolutional kernel reduces the vertical and horizontal dimensions of the input. This is problematic because if we apply several layers of convolutions we will erase the entire input or reduce it to a single value. This happens because the values on the borders of the input have some missing neighbors. To avoid this, we can pad values to the input to make up for the missing ones. We can either pad zeros (Zero Padding) or duplicated the values on the border themselves (Mirror Padding). Note that the size of the padding (the number of padded pixel) depends on the size of the convolution kernel. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 106 / 139 Padding (2 / 3) 0 0 0 0 0 0 0 0 9 9 6 2 3 0 8 8 0 9 6 2 3 0 8 0 9 9 6 2 3 0 8 8 0 3 8 9 3 0 9 0 3 3 8 9 3 0 9 9 0 7 6 3 0 0 2 0 7 7 6 3 0 0 2 2 0 7 9 4 1 0 5 0 7 7 9 4 1 0 5 5 0 1 2 1 1 3 3 0 1 1 2 1 1 3 3 3 0 0 2 0 9 4 7 0 0 0 2 0 9 4 7 7 0 0 0 0 0 0 0 0 0 0 2 0 9 4 7 7 Zero Padding Mirror Padding The two main types of Padding used in CNNs SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 107 / 139 Padding (3 / 3) 0 0 0 0 0 0 0 0 Kernel 12 14 0 0 10 0 0 9 6 2 3 0 8 0 -1 0 -1 0 0 3 8 0 5 0 3 8 9 3 0 9 0 1 1 0 0 0 0 0 0 0 0 7 6 3 0 0 2 0 0 -1 1 0 3 5 2 0 0 0 7 9 4 1 0 5 0 Bias 0 0 0 0 0 0 0 1 2 1 1 3 3 0 -2 Activation 0 0 0 3 7 6 0 0 2 0 9 4 7 0 ReLU 0 0 0 0 0 0 0 0 Zero Padding in a 2D Convolutional Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 108 / 139 The Pooling Layer (1 / 10) A pooling layer performs a fixed-operation on a small window (region) of the image. The pooling layer has three main hyper-parameters: 1 Pooling Size: The dimensions of the pooling window, typically set to 2. 2 Strides: The spacing (step) between the successive pooling windows, typically set to 2. 3 Type: The fixed-operation to be applied to the window. Either Max or Average (mean). The most used one is Max Pooling. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 109 / 139 The 2D Max Pooling Layer (2 / 10) 12 14 0 0 10 0 0 0 3 8 0 5 14 Strides = 2 0 0 0 0 0 0 Size = 2 0 3 5 2 0 0 0 0 0 0 0 0 0 0 0 3 7 6 An illustration of the algorithm of a 2D Max Pooling Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 110 / 139 The 2D Max Pooling Layer (3 / 10) 12 14 0 0 10 0 0 0 3 8 0 5 14 8 Strides = 2 0 0 0 0 0 0 Size = 2 0 3 5 2 0 0 0 0 0 0 0 0 0 0 0 3 7 6 An illustration of the algorithm of a 2D Max Pooling Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 111 / 139 The 2D Max Pooling Layer (4 / 10) 12 14 0 0 10 0 0 0 3 8 0 5 14 8 10 Strides = 2 0 0 0 0 0 0 Size = 2 0 3 5 2 0 0 0 0 0 0 0 0 0 0 0 3 7 6 An illustration of the algorithm of a 2D Max Pooling Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 112 / 139 The 2D Max Pooling Layer (5 / 10) 12 14 0 0 10 0 0 0 3 8 0 5 14 8 10 Strides = 2 0 0 0 0 0 0 Size = 2 3 0 3 5 2 0 0 0 0 0 0 0 0 0 0 0 3 7 6 An illustration of the algorithm of a 2D Max Pooling Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 113 / 139 The 2D Max Pooling Layer (6 / 10) 12 14 0 0 10 0 0 0 3 8 0 5 14 8 10 Strides = 2 0 0 0 0 0 0 Size = 2 3 5 0 3 5 2 0 0 0 0 0 0 0 0 0 0 0 3 7 6 An illustration of the algorithm of a 2D Max Pooling Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 114 / 139 The 2D Max Pooling Layer (7 / 10) 12 14 0 0 10 0 0 0 3 8 0 5 14 8 10 Strides = 2 0 0 0 0 0 0 Size = 2 3 5 0 0 3 5 2 0 0 0 0 0 0 0 0 0 0 0 3 7 6 An illustration of the algorithm of a 2D Max Pooling Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 115 / 139 The 2D Max Pooling Layer (8 / 10) 12 14 0 0 10 0 0 0 3 8 0 5 14 8 10 Strides = 2 0 0 0 0 0 0 Size = 2 3 5 0 0 3 5 2 0 0 0 0 0 0 0 0 0 0 0 0 3 7 6 An illustration of the algorithm of a 2D Max Pooling Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 116 / 139 The 2D Max Pooling Layer (9 / 10) 12 14 0 0 10 0 0 0 3 8 0 5 14 8 10 Strides = 2 0 0 0 0 0 0 Size = 2 3 5 0 0 3 5 2 0 0 0 3 0 0 0 0 0 0 0 0 0 3 7 6 An illustration of the algorithm of a 2D Max Pooling Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 117 / 139 The 2D Max Pooling Layer (10 / 10) 12 14 0 0 10 0 0 0 3 8 0 5 14 8 10 Strides = 2 0 0 0 0 0 0 Size = 2 3 5 0 0 3 5 2 0 0 0 3 7 0 0 0 0 0 0 0 0 0 3 7 6 An illustration of the algorithm of a 2D Max Pooling Layer SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 118 / 139 Putting it all together: CNN (1 / 4) Now that we have listed all its component, we can build a complete CNN (Convolutional Neural Network). First, we need to stack a set of Convolutional layers to filter the input image and yield a Feature Volume. Then, a Pooling layer will be added to reduce the horizontal and vertical dimensions of the Feature Volume. This combination of Convolutional and Pooling layers will be repeated a number of times to achieve a higher lever of Feature Extraction. The next step is to Flatten the Feature Volume into a 1D Feature Vector. Finally, a set of Fully Connected layers will be used to carry out the prediction. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 119 / 139 Putting it all together: CNN (2 / 4) Conv.2D Avg.Pool.2D Conv.2D Avg.Pool.2D 5×5 2×2 (s=2) 5×5 2×2 (s=2) 14 × 14 × 6 10 × 10 × 16 5 × 5 × 16 32 × 32 × 1 28 × 28 × 6 Fully Connected Fully Connected Fully Connected Flatten Softmax 10 84 120 400 LeNet-5 CNN architecture used for the recognition of handwritten digits in. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 120 / 139 Putting it all together: CNN (3 / 4) Conv2D Max Conv2D ReLU Pooling 2D ReLU Max Conv2D Pooling 2D Max ReLU Pooling 2D RGB Input An illustration of CNN’s Feature Learning SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 121 / 139 Putting it all together: CNN (4 / 4) Conv. Conv. Input Convolutional Layer 1 Layer 2 Layer 3 Features extracted by a CNN from real face images. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 122 / 139 Table of Contents 1 Introduction The complex nature of input images Classification vs Regression 2 General Pipeline General Image Classification / Regression Architecture 3 Handcrafted Descriptors Overview Local Binary Patterns (LBP) Histogram of Oriented Gradients (HOG) 4 Feature Learning Overview Supervised Feature Learning Unsupervised Feature Learning Transfer Learning SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 123 / 139 Unsupervised Feature Learning In unsupervised learning, an unlabeled set of images is used instead of a labeled dataset. Consequently, we cannot use usual losses and objectives directly. Instead, the unlabeled high-dimensional inputs are analyzed to discover some hidden (latent) structure or unknown function underlying them Several ideas are proposed in the deep learning literature to achieve this goal, including: Restricted Boltzmann Machines (RBM). Autoencoders. Generative Adversarial Networks (GAN) SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 124 / 139 Autoencoder (1 / 4) An encoder-decoder is a neural network architecture consisting of two parts: 1 Encoder. Reduces the dimensionality of the input and outputs a compact encoding latent features. 2 Decoder. Remaps the compact encoding into an output of a size similar to that of the input. An autoencoder is an encoder-decoder network that sets both its input and output to the same value. This allows us to train the autoencoder on an unlabeled set of images. We need to feed each unlabeled image both as input and output of the network. Then a reconstruction loss that measures how different are the input and output of the network is used to optimize the autoencoder. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 125 / 139 Autoencoder (2 / 4) The reconstruction loss is given by the equation: N ∥Xi – (D ◦ E )(Xi )∥2 X L= i=1 Where: D is the decoder function. E is the encoder function. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 126 / 139 Autoencoder (3 / 4) LATENT INPUT ENCODER FEATURES DECODER OUTPUT X X An illustration of an autoencoder with four fully-connected layers SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 127 / 139 Autoencoder (4 / 4) FEATURE LEARNING FLATTEN COMPRESSION The incentive that makes Autoencoders learn interesting latent representations is that they are forced to reconstruct a replica of their inputs from a much smaller set of elements. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 128 / 139 Variational Autoencoder (1 / 2) Variational autoencoders map their inputs to a distribution rather than a simple latent representation. Hence, the latent feature-vector is replaced by two vectors corresponding to the mean and standard deviation of a normal distribution. The main hypothesis of the variational autoencoder is based on the fact that it will make the autoencoder learn the most compact variables that controls the generation of the unlabeled images. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 129 / 139 Variational Autoencoder (2 / 2) INPUT ENCODER DECODER OUTPUT MEAN SAMPLING X X STANDARD DEVIATION An illustration of an variational autoencoder with four fully-connected layers SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 130 / 139 GANs (1 / 3) RANDOM INPUT FAKE IMAGE LOW DIMENSIONAL LATENT SPACE GENERATOR REAL / FAKE DISCRIMINATOR UNLABELED DATASET REAL IMAGE HIGH DIMENSIONAL IMAGE SPACE An illustration of a typical GAN. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 131 / 139 GANs (2 / 3) Generative Adversarial Networks are neural network systems used to generate images (or other types of data). They are able to learn very intricate latent representation of unlabeled data in a reversed manner. Instead of inputting an image and expecting a feature-vector, GANs take as input a feature-vector are output a new generated image that resembles the real images in the dataset. After training, we can use a GAN to generate images that does not exist the training dataset. More interestingly, we can train these networks on an unlabeled dataset of images. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 132 / 139 GANs (3 / 3) A Generative Adversarial Network is composed of two parts: 1 Generator. A CNN that uses Transposed Convolutional Layers or Upsampling Layers + Convolutional Layers to map a 1D vector into a 2D image. 2 Discriminator. A CNN that uses the typical Convolutional Layers and Maxpooling Layers to classify a 2D image as either Real or Fake. The training of the GAN is performed in a Game-Theory-like manner in which: 1 The Generator is trained to be better at faking images by exposing the most recent discriminator algorithm. 2 The Discriminator is trained to be better at distinguishing fake images by exposing some fake images generated by the most recent version of the generator. The generator and discriminator are trained in turns on small mini-batches so that both are improved equally and without one of them overpowering the other. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 133 / 139 Table of Contents 1 Introduction The complex nature of input images Classification vs Regression 2 General Pipeline General Image Classification / Regression Architecture 3 Handcrafted Descriptors Overview Local Binary Patterns (LBP) Histogram of Oriented Gradients (HOG) 4 Feature Learning Overview Supervised Feature Learning Unsupervised Feature Learning Transfer Learning SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 134 / 139 Transfer Learning (1 / 3) Several Deep Learning architectures has been proposed over the years. Most of these architectures were implemented and trained by large companies (Google, FaceBook, OpenAI,...) on large dataset using their large computational resources. Lots of these models are publicly available on the web along with their pretrained weights. We can use these model directly, if they suit the need of our applications. However, in most case we cannot use them directly as they are because we either have a different task or a different output. Fortunately, most of these trained models are simple classification CNNs. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 135 / 139 Transfer Learning (2 / 3) As we saw earlier, classification CNNs are composed of a sequence of layers that can be divided into two parts: 1 Feature Extraction layers. 2 Classification layers. Since most of the difficulty in CNNs training can be attributed to Feature Learning rather than Prediction, we can simple take these Pretrained CNNs and chop the prediction layers (generally the last layer only) and keep Feature Extraction layers. Consequently, we will obtain a very nice Feature Extraction algorithm ready to be used on our own tasks. We call this algorithm the Backbone Model. It is however noteworthy that the images of our own task should not be very different from the ones on which the Backbone model was trained. We call this technique Transfer Learning. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 136 / 139 Transfer Learning (3 / 3) VGG16 PRETRAINED ON IMAGENET 112 × 112 × 128 112 × 112 × 128 224 × 224 × 64 224 × 224 × 64 112 × 112 × 64 56 × 56 × 128 56 × 56 × 256 56 × 56 × 256 56 × 56 × 256 28 × 28 × 256 28 × 28 × 512 28 × 28 × 512 28 × 28 × 512 14 × 14 × 512 14 × 14 × 512 14 × 14 × 512 14 × 14 × 512 7 × 7 × 512 224 × 224 × 3 CUSTOM 4096 4096 1000 OUTPUT CUSTOM SOFTMAX UNIT 10 An illustration of how we can use Transfer Learning to train a 10-class classifier backboned by the VGG16 model trained on the ImageNet dataset. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 137 / 139 References Timo Ojala, Matti Pietikäinen, and David Harwood. Performance evaluation of texture measures with classification based on kullback discrimination of distributions. In 12th IAPR International Conference on Pattern Recognition, Conference A: Computer Vision & Image Processing, ICPR 1994, Jerusalem, 9-13 October, 1994, Volume 1, pages 582–585. IEEE, 1994. Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detection. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), volume 1, pages 886–893. IEEE, 2005. Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1(4):541–551, 1989. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 138 / 139 References Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org. SELLAM Abdellah (CU-Aflou) 3. Image Classification / Regression and More 2024 139 / 139