CNN Architectures: Datasets, LeNet, AlexNet, VGG

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is a primary characteristic of the MNIST dataset?

It includes localization and detection data.
It contains color images.
It consists of black and white images of handwritten digits. (correct)
It is mainly used for classifying clothing items.

What distinguishes Fashion MNIST from the original MNIST dataset?

Fashion MNIST contains only numerical data.
Fashion MNIST includes images of clothing items. (correct)
Fashion MNIST has a higher resolution.
Fashion MNIST is used for object detection, not classification.

What is the total number of images in the CIFAR dataset?

10,000
100,000
60,000 (correct)
50,000

What is the primary focus of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)?

Advancing image classification and object detection techniques. (D) Signup and view all the answers

Which of the following best describes the COCO dataset?

A dataset for object detection, keypoints, and captions. (B) Signup and view all the answers

What is the primary task for which the LeNet-5 architecture was designed?

Handwritten digit classification. (B) Signup and view all the answers

Which characteristic is associated with LeNet-5's architecture?

Application of max pooling for downsampling. (A) Signup and view all the answers

What differentiates LeNet-5 from more modern CNN architectures in terms of connectivity between feature maps?

LeNet-5 uses sparse connectivity, selectively connecting feature maps to encourage learning diverse features. (C) Signup and view all the answers

What is the main purpose of max pooling in CNNs?

To reduce the computational load and extract dominant features. (A) Signup and view all the answers

Which of the following is an advantage of using the Tanh activation function over the Sigmoid activation function?

Tanh is symmetric around zero, which may lead to faster convergence. (A) Signup and view all the answers

How does the ReLU activation function address the vanishing gradient problem?

By outputting the input directly if it is positive, preventing saturation. (D) Signup and view all the answers

What is a key architectural difference that distinguishes AlexNet from LeNet-5?

AlexNet involves 'parallel' networks and more parameters. (A) Signup and view all the answers

Which activation function was used in AlexNet to improve training performance compared to earlier models using Tanh or Sigmoid?

ReLU (C) Signup and view all the answers

What is the purpose of Dropout in the AlexNet architecture?

To prevent overfitting by randomly dropping connections during training. (D) Signup and view all the answers

What is a key architectural characteristic of VGG networks?

The utilization of smaller convolutional filters (3x3) in deeper networks. (A) Signup and view all the answers

What problem do Batch Normalization layers address in deep neural networks?

Vanishing/exploding gradients, improving convergence during training. (C) Signup and view all the answers

What is the primary purpose of residual connections in Residual Networks?

To allow the network to learn an identity function, aiding the training of very deep networks. (A) Signup and view all the answers

What is the key idea behind autoencoders?

To compress input data into a lower-dimensional space and then reconstruct it. (B) Signup and view all the answers

What distinguishes undercomplete autoencoders from overcomplete autoencoders?

Undercomplete autoencoders have fewer dimensions in the encoding than the input, encouraging learning of salient features; overcomplete have same or more dimensions. (D) Signup and view all the answers

What is a primary application of autoencoders?

Dimensionality reduction and denoising of images or signals. (C) Signup and view all the answers

In the context of autoencoders, what does balancing sensitivity and insensitivity refer to?

Balancing the trade-off between reconstruction accuracy and overfitting. (C) Signup and view all the answers

What is the purpose of skip connections in U-Nets?

To concatenate feature maps from the encoding path to the decoding path, preserving local detail. (C) Signup and view all the answers

For what type of task are U-Nets commonly used?

Biomedical image segmentation. (C) Signup and view all the answers

What is the task of image translation?

Converting an image from one domain to another (e.g., edges to photo). (C) Signup and view all the answers

What is the primary goal of image inpainting?

To restore missing or damaged parts of an image. (C) Signup and view all the answers

In the R-CNN family of object detection models, what is the initial step?

Generating region proposals. (D) Signup and view all the answers

Objectness, Category-independent object proposals, and Selective search are all examples of what?

Region proposal algorithms. (B) Signup and view all the answers

What is a key difference between R-CNN and Fast R-CNN?

R-CNN extracts features for each region proposal independently, while Fast R-CNN feeds the entire image into a CNN once and then extracts features. (C) Signup and view all the answers

What is the purpose of Intersection over Union (IoU) in object detection?

To measure the overlap between the predicted bounding box and the ground truth bounding box. (C) Signup and view all the answers

What is Non-Maximum Suppression (NMS) used for in object detection?

To remove redundant, overlapping bounding boxes predicting the same object. (D) Signup and view all the answers

What is a primary drawback of R-CNN?

Its high computational cost and slow processing speed. (A) Signup and view all the answers

In Faster R-CNN, what replaces the selective search algorithm used in R-CNN for generating region proposals?

A convolutional neural network (CNN). (C) Signup and view all the answers

What is the main advantage of Faster R-CNN over Fast R-CNN?

Faster R-CNN integrates region proposal directly into the network, rather than depending upon other external algorithms. (C) Signup and view all the answers

What does 'precision' measure in the context of evaluating object detection models?

The proportion of detected objects that are actually correct. (C) Signup and view all the answers

What does 'recall' measure in the context of evaluating object detection models?

The proportion of actual objects that are correctly detected by the model. (D) Signup and view all the answers

What are the general characteristics of YOLO?

Fast computation, lower accuracy. (C) Signup and view all the answers

How does Single Shot Detector (SSD) compare to YOLO in object detection?

SSD predicts object detections at multiple scales, while YOLO only predicts at a single scale. (D) Signup and view all the answers

In the context of evaluating a object detection system, how are true postives (TP), false positives (FP) marked?

Detections where IoU > thresh2 (B) Signup and view all the answers

In the general Batch Normalisation equation, what is the function of these variables: γ, β.

They are learnable parameters that scale and shift the normalised data. (B) Signup and view all the answers

Which is the correct Batch Normalisation equation from these options?

γ * ((x - μβ) / √(σβ^2 + ε)) + β (A) Signup and view all the answers

What is a key characteristic of the Fashion MNIST dataset that distinguishes it from the original MNIST?

It contains images of clothing items across 10 different classes. (D) Signup and view all the answers

What is the size of images in the CIFAR-10 dataset?

32 x 32 x 3 (D) Signup and view all the answers

What type of data, other than labeled images, does the ImageNet dataset also contain?

Localization and detection data and short video clips. (B) Signup and view all the answers

What is the approximate number of object instances in the COCO dataset?

1.5 million (D) Signup and view all the answers

What type of loss function did the LeNet-5 architecture utilize?

MSE loss (B) Signup and view all the answers

What are some of the reasons that LeNet-5 did not have all feature maps fully connected between layers?

Larger computational cost and breaking of symmetry. (D) Signup and view all the answers

What is the primary advantage of using max pooling in neural networks?

Reducing resolution while preserving activated features and decreasing computational cost. (D) Signup and view all the answers

What key architectural innovation did AlexNet introduce, enabling more efficient GPU parallelization during training?

Two 'parallel' networks (B) Signup and view all the answers

What is the primary goal of using dropout during the training phase in AlexNet?

To prevent overfitting by randomly dropping connections. (C) Signup and view all the answers

Which of the following is the main advantage of using smaller convolutional filters (e.g., 3x3) in VGG networks?

To reduce the number of parameters while maintaining a high representational capacity. (B) Signup and view all the answers

How does Batch Normalization contribute to training deeper neural networks?

By allowing higher learning rates due to more stable gradients. (C) Signup and view all the answers

What is the most significant advantage of using residual connections in deep neural networks?

Enabling the training of much deeper networks by alleviating the vanishing gradient problem. (C) Signup and view all the answers

What is the key characteristic of an undercomplete autoencoder?

The encoding has fewer dimensions than the input, forcing it to learn a compressed representation. (B) Signup and view all the answers

How do autoencoders achieve a balance between sensitivity and insensitivity?

By optimizing a reconstruction loss while regularising to prevent memorisation or overfitting. (B) Signup and view all the answers

What is the main purpose of skip connections in U-Nets in the context of image segmentation?

To combine feature maps to recover local detail lost during downsampling. (D) Signup and view all the answers

What is the general goal of 'image translation' in the context of CNNs?

To convert an image from one representation to another (e.g., labels to a street scene). (C) Signup and view all the answers

What is the use of non-CNN based algorithms in the R-CNN family of object detection models?

To generate region proposals. (C) Signup and view all the answers

What is the main goal of Non-Maximum Suppression (NMS) in object detection tasks?

To discard redundant, overlapping bounding boxes and retain the most accurate ones. (C) Signup and view all the answers

What is a significant limitation of R-CNN that Faster R-CNN addresses?

The reliance on fixed, non-trainable region proposal algorithms. (A) Signup and view all the answers

In the context of object detection, how is the 'precision' of a model defined?

The proportion of true positive detections among all detections made by the model. (B) Signup and view all the answers

What is a characteristic feature of the YOLO (You Only Look Once) object detection system?

It is lightweight, fast, and has lower accuracy. (D) Signup and view all the answers

In the context of evaluating object detection systems, which calculation accurately defines 'recall'?

True Positives / (True Positives + False Negatives) (D) Signup and view all the answers

What is the purpose of optimizing a reconstruction loss function in autoencoders?

To make the reconstructed output as close as possible to the original input. (B) Signup and view all the answers

What does 'image inpainting' primarily involve?

Filling in missing or damaged regions of an image. (B) Signup and view all the answers

How does the Single Shot Detector (SSD) approach object detection, especially when compared to YOLO?

SSD is a one-stage detector, similar to YOLO. (D) Signup and view all the answers

What is the relationship between the number of layers and performance in the plain networks described prior to the introduction of residual networks?

Deeper plain networks saturate in performance and then degrade with more layers. (C) Signup and view all the answers

What is the significance of the Visual Geometry Group (VGG) at the University of Oxford in the context of CNNs?

They developed VGG networks, which won the 2014 ImageNet Challenge. (C) Signup and view all the answers

What are the two learnable parameters which Batch Normalization utilizes?

γ (gamma), β (beta) (C) Signup and view all the answers

Which of the following is the correct function for Batch Normalization? (x is input batch, µ is batch mean, σ is batch variance, γ and β are learnable parameters)

$γ * (x - µ) / √(σ² + ε) + β$ (B) Signup and view all the answers

What is the scaled hyperbolic tangent activation function as used is LeNet-5?

$g(x) = Atanh(Sx)$ (B) Signup and view all the answers

What is a common purpose of the Leaky ReLU activation function?

To reduce the impact of dying ReLU (some neurons never activate). (B) Signup and view all the answers

What is the number of parameters in AlexNet?

60 million parameters (D) Signup and view all the answers

Flashcards

What is the MNIST dataset?

Is a dataset with modified NIST database for handwritten digit recognition, containing 60,000 training images and 10,000 testing images.

What is Fashion MNIST?

A dataset similar to MNIST but contains images of clothing items instead of digits. MNIST is considered overused and too easy.

What is the CIFAR Dataset?

A dataset from the Canadian Institute, containing 60,000 images total with 50,000 training images and 10,000 testing images. Images are 32x32 pixels and split into either 10 or 100 classes.

What is the ImageNet Dataset?

A dataset with 15 million labeled high-resolution images that has localization and detection data and video, and is used in the yearly ILSVRC competition.