COMP9517_24T2W7_Deep_Learning_Part_1-2.pdf
Document Details
Uploaded by FastGrowingJackalope
UNSW Sydney
2024
Tags
Full Transcript
COMP9517 Computer Vision 2024 Term 2 Week 7 Dr Dong Gong Deep Learning Image Classification using CNNs Introduction Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 2 Image Classification Dataset: CIFAR10 [Alex Krizhevs...
COMP9517 Computer Vision 2024 Term 2 Week 7 Dr Dong Gong Deep Learning Image Classification using CNNs Introduction Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 2 Image Classification Dataset: CIFAR10 [Alex Krizhevsky, “Learning Multiple Layers of Features from Tiny Images”, Technical Report, 2009.] Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 3 Image Classification Image classification with linear classifier Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 4 Image Classification An image example with 4 pixels and 3 classes. Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 5 Image Classification Interpreting a linear classifier. Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 6 Role of CNNs in Image Classification Ø CNNs are designed specifically for processing grid-like data, making them well-suited for images. Ø They can automatically learn relevant features from raw pixel data, eliminating the need for handcrafted feature engineering. Ø CNNs use convolutional layers to detect local patterns, followed by pooling layers to reduce spatial dimensions and prevent overfitting. Ø The fully connected layers at the end of a CNN make the final classification decision based on the learned features. Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 7 Advantages of CNNs in Image Classification Ø Automatic Feature Extraction Ø Hierarchical Feature Learning Ø Weight Sharing and Parameter Efficiency Ø Transfer Learning and Pretrained Model Ø Translation Invariance Ø Superior Performance Ø Robustness to Variations Ø Scalability Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 8 Datasets Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 9 MNIST Dataset Overview: MNIST, short for "Modified National Institute of Standards and Technology," is a widely used dataset in the field of machine learning and computer vision. It consists of a collection of handwritten digits that are extensively used for tasks such as digit recognition and image classification. Key Characteristics: 70,000 grayscale images. 28x28 pixels in size. Each image contains a single digit (0 through 9). Labeled dataset: Each image is associated with a corresponding digit label. Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 10 MNIST Dataset Applications: MNIST is used for various machine learning tasks, including: Digit recognition. Handwriting analysis. Image classification. Benchmarking and testing machine learning algorithms. Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 11 CIFAR-10 Dataset Overview: CIFAR-10 stands for the “Canadian Institute For Advanced Research - 10,” and it is a widely used dataset for image classification tasks in the field of machine learning and computer vision. Key Characteristics: CIFAR-10 consists of 60,000 color images. These images are divided into 10 different classes, each representing a distinct object or category. The dataset is split into 50,000 training images and 10,000 testing images. Each image is 32x32 pixels in size. The 10 classes include objects like airplanes, automobiles, birds, cats, and more. CIFAR-100 version Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 12 CIFAR-10 Dataset Applications: CIFAR-10 is used for a variety of machine learning tasks, including: Image classification. Object recognition. Transfer learning. Benchmarking and testing of convolutional neural networks (CNNs). https://paperswithcode.com/dataset/cifar-10 Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 13 ImageNet Consists of 14 million images, more than 21,000 classes, and about 1 million images have bounding box annotations Ø Annotated by humans using crowdsourcing platform “Amazon Mechanical Turk” ØImageNet Large-Scale Visual Recognition Challenge (ILSVRC) Ø annual competition to foster the development and benchmarking of state-of-the-art algorithms in Computer Vision Ø Led to improvement in architectures and techniques at the intersection of CV and DL Image Credit: Synced. https://syncedreview.com/2020/06/23/google-deepmind-researchers-revamp-imagenet/ Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 14 Classical CNN Models Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 15 LeNet Ø First developed by Yann Lecun in 1989 for digit recognition Ø First time backprop is used to automatically learn visual features Ø Two convolutional layers, three fully connected layers (32 x 32 input, 6 and 12 FMs, 5 x 5 filters) Ø Stride = 2 is used to reduce image dimensions Ø Scaled tanh activation function Ø Uniform random weight initialization Source: Lecun et al. (1989). Gradient-based learning applied to document recognition. Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 16 AlexNet ØAdded super important heuristics Ø ReLU non-linearity Ø Local Response Normalization Ø Data Augmentation Ø Dropout ØWinner of 2012 ILSVRC challenge ImageNet Large Scale Visual Recognition Challenge (ILSVRC) https://www.image-net.org/challenges/LSVRC/ Image Credit: Kdnuggets – Deep Learning’s Most Important Ideas. https://www.kdnuggets.com/2020/09/deep-learnings-most-important-ideas.html Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 17 VGG ØDeveloped at Visual Geometry Group (Oxford) by Simonyan and Zisserman Ø 1st runner up (Classification) and Winner (localization) of ILSVRC 2014 competition Ø VGG-19 comprises of 144 million parameters Image Credit: Medium. https://medium.com/coinmonks/paper-review-of-vggnet-1st-runner-up-of-ilsvlc-2014-image-classification-d02355543a11 Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 18 VGG-16 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014). Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 19 GoogLeNet ØA 22-layer CNN developed by researchers at Google ØDeeper networks prone to overfitting and suffer from exploding or vanishing gradient problem ØCore idea “Inception module” Ø Multi-branch, Multi-size kernel ØAdding Auxiliary loss as an extra supervision ØWinner of 2014 ILSVRC Challenge Source: Convolutional Neural Networks. https://medium.com/@rajat.k.91/convolutional-neural-networks-why-what-and-how-f8f6dbebb2f9 Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 20 ResNet ØDeveloped by researchers at Microsoft (Kaiming et al.) ØCore idea “residual connections” or “skip connections” to preserve the gradient ØThe identity matrix transmits forward the input data that avoids the loose of information (the data vanishing problem) Image Credit: Medium. https://medium.com/@pierre_guillou/understand-how-works-resnet-without-talking-about-residual-64698f157e0c Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 21 SENet (Squeeze-and-Excitation Network) Ø CNNs fuse the spatial and channel information to extract features to solve the task Ø Before this, networks weights each of its channels equally when creating the output feature maps Ø SENets added a content aware mechanism to weight each channel adaptively Ø SE block helps to improve representation power of the network, able to better map the channel dependency along with access to global information Source: Convolutional Neural Networks. https://medium.com/@rajat.k.91/convolutional-neural-networks-why-what-and-how-f8f6dbebb2f9 Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 22 DenseNet Ø “Dense block” vs “Res block” Ø More flexible connections Ø Transition layer is used between dense blocks Ø Reduce dimensionality and computation Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 2017. Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 23 Transfer Learning and Pre-training Pre-trained models from (large-scale) datasets Transferring the learned knowledge to new tasks New data with different distributions (synthetic data to real data) New tasks (classification to segmentation) Segmentation for scenes from diff. distribution Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 24 Class Incremental Learning Incrementally seeing different classes Continual learning Let DNNs accumulate knowledge from different datasets Learning like human being Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 25 Demo Examples Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 26 Demo Example 1 AlexNet – PyTorch https://d2l.ai/chapter_convolutional-modern/alexnet.html VGG16 https://d2l.ai/chapter_convolutional-modern/vgg.html ResNet https://d2l.ai/chapter_convolutional-modern/resnet.html https://colab.research.google.com/github/d2l-ai/d2l-pytorch- colab/blob/master/chapter_convolutional-modern/resnet.ipynb https://d2l.ai/index.html Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 27 Key takeaways Ø Training methodology Ø Split data into training (such as 70%), validation (10%), and testing (20%) Ø Take care of data leakage Ø Check distribution of classes, work on balanced datasets (ideally) Ø Find and develop baseline models at first Ø Tune hyperparameters on validation set. Save best model and do inference on test set (once) Ø Don’t use off-the-shelf model blindly. Do ablation studies to know its working Ø Data augmentation techniques are not standardized Ø Get input from experts to know what data augmentations make sense in the domain Ø Results Ø Use multiple metrics rather a single metric to report results (often they are complementary) Ø Show both qualitative and quantitative results (e.g., image segmentation) Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 28 Acknowledgements Slides from https://syncedreview.com/2020/06/23/google-deepmind-researchers-revamp-imagenet/ Lecun et al. (1989). Gradient-based learning applied to document recognition. Deep Learning’s Most Important Ideas. https://www.kdnuggets.com/2020/09/deep-learnings-most-important-ideas.html https://medium.com/coinmonks/paper-review-of-vggnet-1st-runner-up-of-ilsvlc-2014-image-classification-d02355543a11 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014). Source: Convolutional Neural Networks. https://medium.com/@rajat.k.91/convolutional-neural-networks-why-what-and-how-f8f6dbebb2f9 Image Credit: Medium. https://medium.com/@pierre_guillou/understand-how-works-resnet-without-talking-about-residual-64698f157e0c Source: Convolutional Neural Networks. https://medium.com/@rajat.k.91/convolutional-neural-networks-why-what-and-how-f8f6dbebb2f9 Copyright (C) UNSW COMP9517 24T2W7 Deep Learning I 29 Example exam question What does transfer learning with CNNs involve? A. Training a model from scratch for each new task. B. Using a pretrained model and fine-tuning it for a new task. C. Converting image data to text data to facilitate learning. D. Combining multiple CNNs into a single model for better performance. Copyright (C) UNSW COMP9517 24T2W7 Deep Learning Part 1-2 30