Image Analysis Using Convolutional Neural Network (CNN) PDF
Document Details
Uploaded by Deleted User
Jacksonville University
2024
Md. Rezaul Karim
Tags
Summary
This document is a chapter on image analysis using Convolutional Neural Networks (CNNs). It covers the motivation for using CNNs, their components, and applications to image classification. The chapter is written by Prof. Dr. Md. Rezaul Karim and is part of a data science course in Spring 2024.
Full Transcript
Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Chapter 8: Image Analysis using Convolutional Neural Network Prof. Dr. Md. (CNN) Rezaul Karim D...
Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Chapter 8: Image Analysis using Convolutional Neural Network Prof. Dr. Md. (CNN) Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 443 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) 8 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) 8.1 Problems & Motivation 8.2 Components of CNN 8.3 Max Pooling Operation 8.4 Prof. Dr. Md. Rezaul Karim Padding in CNN 8.5 Stride in CNN 8.6 Flattening in CNN 8.7 Fully Connected Layer in CNN 8.8 Image classication using CNN Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 444 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Handwritten Digits Detection Problem Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 445 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation How a Computer Reads an Image? Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 446 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 447 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 448 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 449 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Convolutional Neural Network (CNN) in deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural network, most commonly applied to analyze visual imagery CNNs specically are inspired by the biological visual cortex ▸ the cortex has small regions of cells that are sensitive to the specic areas of the visual eld Prof. Dr. Md. Rezaul Karim ▸ the researchers showed that some individual neurons in the brain activated or red only in the presence of edges of a particular orientation like vertical edges or horizontal edges CNNs have applications in image and video recognition, recommender systems, image classication, image segmentation, medical image analysis, natural language processing, brain-computer interfaces, and nancial time series Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 450 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Layer structure in a normal neural network the neurons are structured vertically and are interconnected this means that the output from one neuron in the input layers goes to every neuron in the subsequent layer this process happens for every layer, as you can recognize in the Prof. Dr. Md. Rezaul Karim model above this kind of information propagation is highly inecient when it comes to computer vision if you must recognize trees in a photo, you will not take a look at the blue skies at the very top of the image Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 451 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Layer structure in a CNN Convolutional neural networks are specically designed to be used in computer vision tasks, which means that their design is optimized for processing image in CNNs, the layers are three Prof. dimensional Dr. Md. Rezaul Karim this means that the neurons are structured in shape of form (width, height, depth) if we have a 50 Ö50 pixels image encoded as RGB (red green blue), the shape of the input layer will be (50, 50, 3) Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 452 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 453 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 454 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 455 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 456 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 457 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 458 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 459 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 460 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 461 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 462 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 463 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 464 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 465 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 466 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 467 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 468 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 469 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 470 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 471 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 472 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 473 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 474 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 475 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 476 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 477 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 478 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 479 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 480 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 481 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 482 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 483 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 484 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 485 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 486 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 487 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 488 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 489 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 490 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 491 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 492 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 493 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 494 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 495 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 496 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 497 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 498 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 499 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 500 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 501 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 502 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 503 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 504 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 505 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 506 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 507 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 508 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 509 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim details can be found in https://medium.com/ymedialabs-innovation/ data-augmentation-techniques-in-cnn-using-tensorflow-371ae43d Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 510 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Problems & Motivation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 511 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Components of CNN Components of CNN Convolutional layers ReLu layers Pooling layers Fully connected layers Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 512 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Max Pooling Operation Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 513 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Padding in CNN Padding in CNN Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 514 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Padding in CNN Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 515 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Padding in CNN Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 516 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Padding in CNN Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 517 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Padding in CNN Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 518 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Stride in CNN Stride in CNN Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 519 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Stride in CNN Stride in CNN Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 520 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Stride in CNN Stride in CNN Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 521 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Stride in CNN Stride in CNN Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 522 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Stride in CNN Stride in CNN Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 523 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Stride in CNN Stride in CNN Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 524 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Stride in CNN Stride in CNN Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 525 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Stride in CNN An input image has been converted into a matrix of size 12 X 12 along with a lter of size 3 X 3 with a Stride of 1. Determine the size of the convoluted matrix. Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 526 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Stride in CNN Exercise Consider a convolutional neural network (CNN) architecture with the following specications for its convolutional layer: Input image size: 32 × 32 pixels Number of lters: 16 Filter size: 5 × 5 pixels Prof. Dr. Md. Rezaul Karim Stride: 1 pixel Padding: Same Calculate the total number of parameters (weights and biases) in the convolutional layer. Answer: To calculate the total number of parameters in the convolutional layer of the CNN, we need to consider the weights and biases associated with each lter. Here's the breakdown: Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 527 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Stride in CNN 1 Weights Calculation: ▸Each lter has a size of 5 × 5 pixels. ▸Since the input image has 32 × 32 pixels and padding is set to 'same', the output feature map will have the same spatial dimensions as the input image. ▸ Therefore, for each lter, there are 5 × 5 = 25 weights. ▸ Since there are 16 lters, the total number of weights is 16 × 25 = 400. 2 Biases Calculation: ▸ Each lter has one bias term associated with it. Prof. Dr. Md. Rezaul Karim ▸ Since there are 16 lters, the total number of biases is 16. 3 Total Parameters: ▸ Adding the number of weights and biases together, the total number of parameters in the convolutional layer is 400 (weights) +16 (biases) = 416. Therefore, the total number of parameters in the convolutional layer of the CNN architecture is 416. Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 528 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Stride in CNN Question: You are designing a Convolutional Neural Network (CNN) for image classication tasks. The input images have dimensions of 64x64 pixels, and you plan to use a single convolutional layer followed by a max-pooling layer and a fully connected layer as the output layer. The specications for the convolutional layer are as follows: Number of lters: 16 Filter size: 5x5 pixels Prof. Dr. Md. Rezaul Karim Stride: 2 pixels Padding: Valid (no padding) Activation function: ReLU (Rectied Linear Unit) a) Calculate the dimensions of the output feature maps after applying the convolutional layer and the max-pooling layer. b) Determine the total number of parameters (weights and biases) in the convolutional layer. Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 529 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Stride in CNN Answer: a) Output Feature Maps Dimensions: 1 Convolutional Layer: With a lter size of 5x5 pixels, a stride of 2 pixels, and padding set to ▸ 'same', the output feature map dimensions can be calculated using the formula: Output Size = ⌈ Input Size − FilterStride Size + 2 × Padding + 1⌉ ▸ For the input images of size 64x64 pixels and same padding, with a Prof. Dr. Md. Rezaul Karim lter size of 5x5 pixels and stride of 2 pixels: Output Size = ⌈ 64 − 52+ 2 × 0 + 1⌉ = ⌈ 592 + 1⌉ = ⌈29.5 + 1⌉ = ⌈30.5⌉ = 31 ▸ So, the output feature maps would have dimensions of 31x31 pixels. 2 Max-Pooling Layer: ▸ With a pool size of 2x2 pixels and a stride of 2 pixels, the output feature map dimensions after max-pooling can be calculated using the same formula. Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 530 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Stride in CNN For the 31x31 feature maps from the convolutional layer, the output size after max-pooling would be: 31 Output Size =⌈ ⌉ = ⌈15.5⌉ = 16 2 So, the output feature maps would have dimensions of 16x16 pixels. b) Total Parameters in Convolutional Layer: 1 Weights Calculation: ▸Each lter has a size of 5x5 pixels. ▸Since there are 16 lters, the total number of weights is 16 × 5 × 5 = 400. 2 Biases Calculation: Prof. Dr. Md. Rezaul Karim ▸ Each lter has one bias term associated with it. ▸ Since there are 16 lters, the total number of biases is 16. 3 Total Parameters: ▸ Adding the number of weights and biases together, the total number of parameters in the convolutional layer is 400 (weights) +16 (biases) = 416. In summary, the output feature maps after applying the convolutional layer and max-pooling layer would have dimensions of 16x16 pixels, and the convolutional layer would have a total of 416 parameters, including Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 531 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Stride in CNN In summary, the output feature maps after applying the convolutional layer and max-pooling layer would have dimensions of 15x15 pixels, and the convolutional layer would have a total of 416 parameters, including weights and biases. Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 532 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Stride in CNN Question: You are designing a Convolutional Neural Network (CNN) for image classication tasks. The input images have dimensions of 64x64 pixels, and you plan to use a single convolutional layer followed by a max-pooling layer and a fully connected layer as the output layer. The specications for the convolutional layer are as follows: Number of lters: 16 Filter size: 5x5 pixels Prof. Dr. Md. Rezaul Karim Stride: 2 pixels Padding: Same Activation function: ReLU (Rectied Linear Unit) a) Calculate the dimensions of the output feature maps after applying the convolutional layer and the max-pooling layer. b) Determine the total number of parameters (weights and biases) in the convolutional layer. Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 533 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Stride in CNN Answer: a) Output Feature Maps Dimensions: 1 Convolutional Layer: With a lter size of 5x5 pixels, a stride of 2 pixels, and padding set to ▸ 'same', the output feature map dimensions can be calculated using the formula: Output Size = ⌈ Input Size − FilterStride Size + 2 × Padding + 1⌉ ▸ For the input images of size 64x64 pixels and same padding, with a lter size of 5x5 pixels and stride of 2 pixels: Prof. Dr. Md. Rezaul Karim Output Size = ⌈ 64 − 52+ 2 × 0 + 1⌉ = ⌈ 592 + 1⌉ = ⌈29.5 + 1⌉ = ⌈30.5⌉ = 31 ▸ So, the output feature maps would have dimensions of 31x31 pixels. 2 Max-Pooling Layer: ▸ With a pool size of 2x2 pixels and a stride of 2 pixels, the output feature map dimensions after max-pooling can be calculated using the same formula. ▸ For the 31x31 feature maps from the convolutional layer, the output size after max-pooling would be: 31 Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 534 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Stride in CNN b) Total Parameters in Convolutional Layer: 1 Weights Calculation: ▸Each lter has a size of 5x5 pixels. ▸Since there are 16 lters, the total number of weights is 16 × 5 × 5 = 400. 2 Biases Calculation: ▸ Each lter has one bias term associated with it. ▸ Since there are 16 lters, the total number of biases is 16. 3 Total Parameters: ▸ Adding the number of weights and biases together, the total number of Prof. Dr. Md. Rezaul Karim parameters in the convolutional layer is 400 (weights) +16 (biases) = 416. In summary, the output feature maps after applying the convolutional layer and max-pooling layer would have dimensions of 16x16 pixels, and the convolutional layer would have a total of 416 parameters, including weights and biases. Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 535 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Flattening in CNN Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 536 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Fully Connected Layer in CNN Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 537 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Image classication using CNN Prof. Dr. Md. Rezaul Karim https://www.cs.toronto.edu/~kriz/cifar.html Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 538 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Image classication using CNN Quick Review: Steps of CNN Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 539 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Image classication using CNN Python code: Image classication using CNN import tensorflow as tf from tensorflow.keras import datasets, layers, models import matplotlib.pyplot as plt import numpy as np #Load the dataset Prof. Dr. Md. Rezaul Karim (X_train, y_train), (X_test,y_test) = datasets.cifar10.load_data() X_train.shape # See test data shape X_test.shape Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 540 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Image classication using CNN plt.figure(figsize=(15,2)) plt.imshow(X_train) # reshape data y_train = y_train.reshape(-1,) y_train[:5] y_test = y_test.reshape(-1,) classes = ["airplane","automobile","bird","cat","deer","dog","frog"," Prof. Dr. Md. Rezaul Karim # See some plot def plot_sample(X, y, index): plt.figure(figsize = (15,2)) plt.imshow(X[index]) plt.xlabel(classes[y[index]]) ## see plot plot_sample(X_train, y_train, 1) Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 541 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Image classication using CNN # Normalizing the training data X_train = X_train / 255.0 X_test = X_test / 255.0 # Build simple artificial neural network (ANN) for image classification ann = models.Sequential([ layers.Flatten(input_shape=(32,32,3)), layers.Dense(3000, activation='relu'), layers.Dense(1000, Prof.activation='relu'), Dr. Md. Rezaul Karim layers.Dense(10, activation='softmax') ]) ann.compile(optimizer='SGD', loss='sparse_categorical_crossentropy', metrics=['accuracy']) ann.fit(X_train, y_train, epochs=5) # See the report from sklearn.metrics import confusion_matrix MS in Data Science , Spring - 2024 Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) 542 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Image classication using CNN # build a convolutional neural network cnn = models.Sequential([ layers.Conv2D(filters=32, kernel_size=(3, 3), activation='relu', input_shape=(32, 32, 3)), layers.MaxPooling2D((2, 2)), layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu'), layers.MaxPooling2D((2, 2)), Prof. Dr. Md. Rezaul Karim layers.Flatten(), layers.Dense(64, activation='relu'), layers.Dense(10, activation='softmax') ]) cnn.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 543 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Image classication using CNN # Evaluate the model cnn.evaluate(X_test,y_test) y_pred = cnn.predict(X_test) y_pred[:5] y_classes = [np.argmax(element) forKarim Prof. Dr. Md. Rezaul element in y_pred] y_classes[:5] y_test[:5] classes[y_classes] Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 544 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Image classication using CNN Exam Questions 1 What do you mean by Convolutional Neural Network? What are the dierent layers on CNN? Explain the dierent layers in CNN. 2 What is pooling on CNN, and how does it Work? 3 Why do we prefer Convolutional Neural networks (CNN) over Articial Neural networks (ANN) for image data as input? 4 What is Pooling on CNN, and How Does It Work? What are the dierent types of Pooling? Prof. Dr.Explain their characteristics. Md. Rezaul Karim 5 Explain the terms Valid Padding and Same Padding in CNN. Explain these with an example. 6 What is data augmentation in Deep Learning? 7 Why is a convolutional neural network preferred over a dense neural network for an image classication task? Explain the role of the Convolution Layer in CNN. 8 Explain the signicance of the RELU Activation function in Convolution Neural Network. Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 545 / 883 Chapter 8: Image Analysis using Convolutional Neural Network (CNN) Image classication using CNN Exam Questions 9 Explain the signicance of the RELU Activation function in Convolution Neural Network. 10 What is the size of the feature map for a given input size image, Filter Size, Stride, and Padding amount? 11 What is Stride? What is the eect of high Stride on the feature map? 12 Explain the role of the attening layer in CNN. What is the role of the Fully Connected (FC) Prof. Layer Dr. Md.inRezaul CNN?Karim Briey explain the two major steps of CNN i.e, Feature Learning and Classication. Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 546 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Chapter 9: Stochastic Optimization Algorithms for Big Prof.Data Analytics Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 547 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics 9 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics 9.1 Base Optimizer in Machine Learning Algorithm 9.2 Gradient Descent (GD) 9.3 Stochastic Gradient Descent (SGD) 9.4 Big Data Optimizers 9.5 Mini-batch Gradient Prof. Dr. Md. Rezaul Karim Descent (GD) 9.6 SGD with Momentum 9.7 Adaptive Gradient Algorithm (Adagrad) 9.8 Adaptive delta Learning Rate Method (Adadelta) 9.9 Root Mean Square Propagation (RMSprop) 9.10 Adaptive Moment Estimation (Adam) Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 548 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Base Optimizer in Machine Learning Algorithm Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 549 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Gradient Descent (GD) Base Optimizer in Machine Learning Algorithm 1 Gradient Descent (GD) optimization 2 Stochastic Gradient Descent (SGD) ▸ is a simple yet very ecient Prof. Dr. Md. Rezaul Karim approach to tting linear classiers and regressors under convex loss functions such as (linear) Support Vector Machines and Logistic Regression Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 550 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Gradient Descent (GD) Gradient Descent (GD) Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 551 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Gradient Descent (GD) Gradient Descent (GD) gradient descent uses the whole training data to update weight and bias the equation for GD is used to update weight (w ) in a neural network we use the equation to update weight in a backwards pass, using backpropagation to calculate the gradient: Prof. Dr. Md. Rezaul Karim Backpropagation ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ∂`(wold ; x, y ) wnew = wold − η ⋅ ∂wold where ▸ η is the learning rate (eta), but also sometimes α or γ is used ▸ is the gradient, which is taken of loss function ` that used ∂`(wold ; x, y ) the whole training data ∂wold Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 552 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Gradient Descent (GD) Disadvantages of Gradient Descent for the Big data frequent updates are computationally expensive due to using all resources for processing the whole training sample at a time if we have millions of records then the training becomes very slow and computationally very expensive may not t in the Prof. memoryDr. Md. Rezaul Karim if there are multiple local minima, then there is no guarantee that the procedure will nd the global minimum what was the solution? Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 553 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Stochastic Gradient Descent (SGD) Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 554 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Stochastic Gradient Descent (SGD) Stochastic Gradient Descent (SGD) instead of using the entire dataset to compute the gradient in each iteration, SGD updates the parameters for each training record individually the equation for SGD is used to update weight (w ) in a neural network we use the equation to update parameters in a backwards pass, using backpropagation to calculate the gradient for the i th Prof. Dr. Md. Rezaul Karim record Backpropagation ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ · ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ∂`(wold ; xi , yi ) wnew = wold − η ⋅ ∂wold where ▸∂`(wold ; xi , yi ) is the gradient, which is taken of loss function ` for the i th (i = 1, 2,... , n) record ∂wold Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 555 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Stochastic Gradient Descent (SGD) Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 556 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Stochastic Gradient Descent (SGD) Demerits of Stochastic Gradient Descent for the Big data SGD solved the Gradient Descent problem by using only single records to updates parameters but still, 1 SGD is slow to converge because it needs forward and backward Prof. propagation for every Dr. Md. Rezaul Karim record 2 the path to reach global minima becomes very noisy 3 the stable error gradient can sometimes result in a state of convergence that isn't the best the model can achieve Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 557 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Big Data Optimizers Big Data Optimizers Stochastic Optimizer in Deep Learning Algorithm 1 Mini-batch GD 2 SGD with Momentum 3 Adaptive Gradient Prof. Algorithm (Adagrad) Dr. Md. Rezaul Karim 4 Adaptive delta Learning Rate Method (Adadelta) 5 Root Mean Square Propagation (RMSprop) 6 Adaptive Moment Estimation (Adam) 7... Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 558 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Mini-batch Gradient Descent (GD) Mini-batch Gradient Descent (GD) mini-batch gradient descent training is a combination of batch gradient and stochastic gradient descent training. it is a variation of the gradient descent algorithm that splits the training dataset into small batches that are used to calculate model error and update model coecients it seeks to nd a balance between the robustness of stochastic gradient descent and theDr. Prof. eciency of batch Md. Rezaul gradient descent Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 559 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Mini-batch Gradient Descent (GD) Advantages and Disadvantages of Mini-batch GD Advantages faster convergence compared to GD, especially for large datasets, as it processes smaller batches of data in each iteration requires less memory than GD since it doesn't need to store the entire dataset in memory easier to parallelize across multiple processors or devices, allowing for Prof. ecient distributed Dr. Md. Rezaul Karim computing provides a balance between the noise of SGD and the stability of GD, often converging faster than pure SGD Disadvantages the choice of learning rate becomes crucial, and tuning may be required to achieve optimal convergence the selection of an appropriate mini-batch size is a hyperparameter that needs careful consideration, and the optimal size may vary for dierent datasets and models Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 560 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Mini-batch Gradient Descent (GD) In the neural network terminology a batch is the complete dataset its size is the total number of training examples in the available dataset mini-batch size is the number of examples the learning algorithm processes in a single pass (forward and backward) a mini-batch is a Prof. Dr. Md. Rezaul Karim small part of the dataset of given mini-batch size iteration ▸ iteration is the number of batches of data the algorithm has seen (or simply the number of passes the algorithm has done on the dataset) ▸ so, every time you pass a batch of data through the NN, you completed an iteration. Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 561 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Mini-batch Gradient Descent (GD) In the neural network terminology epoch ▸ in the context of machine learning, an epoch is one complete pass through the entire training dataset during the training of a model ▸ so, during one epoch, the learning algorithm is exposed to each training example once ▸ this may not be equal to the number of iterations, as the dataset can also be processed in mini-batches, in essence, a single pass may process Prof. Dr. Md. Rezaul Karim only a part of the dataset for example when you have 10000 training instances and you want to do batching with size of 1000; you have to do 10000/1000 = 10 iterations to complete 1 epoch Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 562 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Mini-batch Gradient Descent (GD) Mini-batch Gradient Descent (GD) Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 563 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Mini-batch Gradient Descent (GD) Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 564 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics SGD with Momentum SGD with Momentum SGD with Momentum is an optimization algorithm used in machine learning to accelerate convergence and overcome oscillations by incorporating a moving average of past gradients into the parameter updates Prof. Dr. the momentum algorithm Md.usRezaul helps Karim progress faster in the neural network, negatively or positively, to the ball analogy this helps us get to a local minimum faster momentum is where we add a temporal element into our equation for updating the parameters of a neural network that is, an element of time Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 565 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics SGD with Momentum SGD with Momentum Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 566 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics SGD with Momentum SGD with Momentum Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 567 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics SGD with Momentum Update rule for Stochastic Gradient Descent (SGD) with momentum i The update rule for Stochastic Gradient Descent (SGD) with momentum for a single weight w in a neural network is given by: Let: w be the weight being updated, η be the learning rate, β be the momentum Prof. Dr. Md. Rezaul Karim parameter, gt be the gradient at time step t , vt be the momentum term at time step t. The SGD with momentum update rule is as follows: vt = β ⋅ vt−1 + (1 − β) ⋅ gt , wt = wt−1 − η ⋅ vt. where ∂L gt = ∂wt Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 568 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics SGD with Momentum Advantages and Disadvantages of SGD with Momentum Advantages Momentum helps the optimizer move faster in the relevant directions, leading to quicker convergence reduces oscillations or zig-zagging during optimization, providing smoother and more stable updates Disadvantages requires tuning theProf. momentum Dr. Md.hyperparameter, Rezaul Karim and an inappropriate choice may hinder convergence results may not be exactly reproducible due to the introduction of a momentum term while momentum helps with learning rate sensitivity, the overall performance still depends on an appropriately chosen learning rate In summary, SGD with Momentum is a modication of standard SGD that enhances convergence speed and stability by introducing momentum. The choice of hyperparameters, particularly the momentum term, remains crucial for optimal performance. Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 569 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Adaptive Gradient Algorithm (Adagrad) Adaptive Gradient Algorithm (Adagrad) Adagrad is an optimization algorithm used in machine learning that adapts the learning rates for each parameter based on the historical Prof. Dr. Md. Rezaul Karim gradients it aims to address challenges posed by sparse data Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 570 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Adaptive Gradient Algorithm (Adagrad) Adaptive Gradient Algorithm (Adagrad) Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 571 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Adaptive Gradient Algorithm (Adagrad) Adagrad vs SGD Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 572 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Adaptive Gradient Algorithm (Adagrad) Advantages of Adagrad Advantages Adapts learning rates individually for each parameter, allowing for larger updates for infrequently occurring features and smaller updates Prof. Dr. for frequently occurring onesMd. Rezaul Karim well-suited for sparse data and features, as it automatically adjusts learning rates based on the gradient magnitudes works well with convex optimization problems and simple models Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 573 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Adaptive Gradient Algorithm (Adagrad) Disadvantages of Adagrad Disadvantages the learning rates for each parameter monotonically decrease with time, which can lead to very small learning rates in later stages of training Prof. the squared gradients Dr. are Md. Rezaul accumulated Karim over time in the denominator, potentially leading to exploding denominators and overly small eective learning rates Adagrad may perform poorly on non-convex optimization problems and deep neural networks, where other adaptive algorithms like RMSprop and Adam are preferred Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 574 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Adaptive delta Learning Rate Method (Adadelta) Adaptive delta Learning Rate Method (Adadelta) Adadelta is an optimization algorithm used in machine learning that extends the idea of adaptive learning rates from Adagrad. It addresses some of Adagrad's limitations by dynamically adjusting learning rates and dealing with the monotonic decrease issue. The Adadelta update rule involves adapting the learning rates based on the historical information of both Prof. Dr. Md. the Karim Rezaul squared gradients and squared parameter updates. For a single weight w in a neural network, the update rule is given by: √ E [∆w 2 ]t−1 + ∆wt = − √ ⋅ gt (1) E [g 2 ]t + wt = wt−1 + ∆wt (2) Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 575 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Adaptive delta Learning Rate Method (Adadelta) where: w is the weight being updated, β is the decay rate, is a small constant for numerical stability, E [g 2 ]t is the exponentially decaying average of squared gradients at time step t , E [∆w 2 ]t is the exponentially decaying average of squared parameter updates at time step t , Prof. Dr. Md. Rezaul Karim ∆wt is the parameter update at time step t , gt is the gradient at time step t. This update rule adapts the learning rates for each weight based on the historical information of the squared gradients and squared parameter updates. The running averages E [g 2 ]t and E [∆w 2 ]t are updated at each time step to capture the long-term behavior of the gradients and parameter updates. The use of square roots and division helps in normalizing the updates and addresses the issue of the monotonic decrease in learning rates. Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 576 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Adaptive delta Learning Rate Method (Adadelta) Advantages and Disadvantages of Adadelta Advantages adjusts learning rates individually for each parameter, mitigating Adagrad's issue of monotonic decrease in learning rates unlike Adagrad, Adadelta does not accumulate all past squared gradients, making it more memory-ecient Prof. Dr. Adadelta is less sensitive Md.initial to the Rezaul Karim and choice of conditions hyperparameters compared to some other optimization algorithms. Disadvantages the algorithm involves additional parameters and computations, making it more complex than simpler optimization algorithms while Adadelta addresses some of Adagrad's limitations, it may not always outperform more recent algorithms like Adam in certain scenarios Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 577 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Root Mean Square Propagation (RMSprop) Root Mean Square Propagation (RMSprop) RMSprop is an optimization algorithm used in machine learning, specically designed to address some issues with Adagrad it adapts learning rates individually for each parameter by using a moving average of squared gradients The RMSprop update rule for a single weight w in a neural network is Prof. Dr. Md. Rezaul Karim given by: Let: ▸ be the weight being updated, be the learning rate, w ▸ be a small constant for numerical stability, η ▸ be the exponentially decaying average of squared gradients at ▸ E [g 2 ]t time step t , ▸ g be the gradient at time step t. t Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 578 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Root Mean Square Propagation (RMSprop) The RMSprop update rule is as follows: E [g 2 ]t = β ⋅ E [g 2 ]t−1 + (1 − β) ⋅ gt2 (3) η wt = wt−1 − √ ⋅ gt (4) E [g 2 ]t + This update rule adapts the learning Prof. Dr. Md.rates based Rezaul on the square root of the Karim moving average of squared gradients. The decay rate β determines how quickly the past squared gradients decay, and is added for numerical stability. This allows the optimizer to scale the learning rates for each weight individually, providing adaptability to dierent parameters and addressing some of the challenges associated with xed learning rates. Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 579 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Root Mean Square Propagation (RMSprop) Root Mean Square Propagation (RMSprop) Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 580 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Root Mean Square Propagation (RMSprop) Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 581 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Root Mean Square Propagation (RMSprop) Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 582 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Root Mean Square Propagation (RMSprop) Advantages and Disadvantages of RMSprop Advantages adjusts learning rates individually, providing adaptability to dierent parameters addresses the problem of a monotonic decrease in learning rates by Prof. Dr. using a moving average Md. Rezaul of squared Karim gradients unlike Adagrad, RMSprop uses a decaying average, preventing the denominator from growing excessively and making it more memory-ecient performs well in scenarios with non-stationary or changing data distributions Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 583 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Root Mean Square Propagation (RMSprop) Disadvantages performance can be sensitive to the choice of hyperparameters, such as the decay rate similar to other adaptive methods, RMSprop introduces noise in parameter updates, which may impact convergence stability Prof. Dr. Md. Rezaul Karim while eective in many cases, RMSprop may not always outperform other optimization algorithms, such as Adam, depending on the specic problem and data characteristics Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 584 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Adaptive Moment Estimation (Adam) Adaptive Moment Estimation (Adam) The Adam (Adaptive Moment Estimation) update rule for a neural network involves adapting the learning rates for each weight based on both the rst-order moment (average gradient) and the second-order moment (average squared gradient). The Adam update rule for a single weight θ in a neural network is given by: Let: w be the weight being Prof. updated, Dr. Md. Rezaul Karim η be the learning rate, β1 and β2 be the exponential decay rates for the rst-order moment and the second-order moment, respectively, be a small constant for numerical stability, mt be the rst-order moment (average gradient) at time step t , vt be the second-order moment (average squared gradient) at time step t , gt be the gradient at time step t. Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 585 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Adaptive Moment Estimation (Adam) The Adam update rule is as follows: mt = β1 ⋅ mt−1 + (1 − β1 ) ⋅ gt , vt = β2 ⋅ vt−1 + (1 − β2 ) ⋅ gt2 , mt m̂t = , 1 − β1 t Prof. Dr.vt Md. Rezaul Karim v̂t = , 1 − β2 t η wt = wt−1 − √ ⋅ m̂t. v̂t + This update rule adapts the learning rates based on the square root of the moving average of squared gradients. The decay rates β1 and β2 control the decay of past information in the moments. Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 586 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Adaptive Moment Estimation (Adam) Algorithm of Adaptive Moment Estimation (Adam) The full algorithm is for Adaptive Moment Estimation (Adam) is 1 Initialize model parameters w. 2 Initialize rst-order moment vector m and second-order moment vector v m = 0,Dr. to zeros: Prof. v =Md. 0. Rezaul Karim 3 Set time step t = 0. 4 Choose hyperparameters: learning rate (η ), exponential decay rates 1 2 for the rst (β ) and second (β ) moments, and a small constant () for numerical stability. Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 587 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Adaptive Moment Estimation (Adam) For each iteration: 1 Increment the time step: t = t + 1. 2 Compute the gradient gt of the cost function with respect to the parameters w. 3 Update the rst-order moment mt and the second-order moment vt : mt = β1 ⋅ mt−1 + (1 − β1 ) ⋅ gt vt = β2 ⋅ vt−1 + (1 − β2 ) ⋅ gt2 4 Correct the bias inProf. the rst-order moment Dr. Md. Rezaul mt and the second-order Karim moment vt : mt m̂t = 1−β t 1 vt v̂t = 1 − β2 t 5 Update the parameters w: η wt = wt−1 − √ ⋅ m̂t v̂t + Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 588 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Adaptive Moment Estimation (Adam) Disadvantage of using ADAM optimizer Sensitivity to hyperparameters. Increased memory requirements. Non-monotonic behavior in the Prof. Dr. Md.objective function. Rezaul Karim Sensitivity to noisy gradients. Possible overtting on small datasets. Complexity compared to simpler optimization methods. Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 589 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Adaptive Moment Estimation (Adam) Advantage of using ADAM optimizer Adaptive learning rates for each parameter Ecient handling of sparse gradients Combination of momentum and RMSprop computationally ecient little memory requirements Prof. Dr. Md. Rezaul Karim Eective for noisy or varying datasets Low memory requirements. Default parameters often eective. Suitability for a wide range of architectures. Fast convergence during training. details of this chapter can be found in https://ruder.io/optimizing-gradient-descent/ Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 590 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Adaptive Moment Estimation (Adam) Prof. Dr. Md. Rezaul Karim Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 591 / 883 Chapter 9: Stochastic Optimization Algorithms for Big Data Analytics Adaptive Moment Estimation (Adam) Exam Questions 1 What is Gradient Descent? What is the dierence between batch Gradient Descent and Stochastic Gradient Descent? 2 What is the dierence between epoch, batch, and iteration in Deep Prof. Dr. Md. Rezaul Karim Learning? 3 What are the reasons for mini-batch gradient being so useful? 4 Explain the Adam optimization algorithm? Dr. Md. Rezaul Karim (Professor, Dept. of Statistics and Data Science, JU) MS in Data Science Spring - 2024 592 / 883