CNNs for Image Classification

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

FCNs (Fully Connected Networks) follow a structured layered architecture. Which of the following is NOT a layer type in FCNs?

Convolutional Layer (correct)
Output Layer
Hidden Layers
Input Layer

Progressive feature extraction in FCNs allows deeper layers to capture lower-level abstractions, enabling the network to understand fine-grained details first before grasping broader concepts.

False (B)

What key properties must a good activation function possess for effective neural network training?

non-linearity, differentiability, computational efficiency, gradient behavior

The sigmoid activation function maps its output to a range between and , making it suitable for binary classification tasks.

0, 1

Signup and view all the answers

Match each activation function with its potential issue in deep neural networks:

Sigmoid = Vanishing gradient problem ReLU = Dying ReLU problem Leaky ReLU = Suboptimal performance due to incorrect hyperparameter

Signup and view all the answers

To address non-linear decision boundaries, you can use:

Fully Connected Neural Network (C)

Signup and view all the answers

Deep Neural Networks parameter efficiency is improved when pixel values are extracted as features.

False (B)

Signup and view all the answers

What does flattening an image cause, and what can be done to prevent this to have spatial structure remain in tact?

loss of spatial information, spatial relationships, process local regions of image

Signup and view all the answers

In situations where there are excessive parameters the model can ______ the training data, rather than learning.

memorize

Signup and view all the answers

Match the following parameters with their respective challenges:

Viewpoint variation = Pixel's value change with camera angles Intra-class variation = Algorithm that leverages the local structures of an image to extract features, promotes parameter sharing locally and preserves important spatial information

Signup and view all the answers

What design does a Convolutional Neural Network have?

Leverages local structures (D)

Signup and view all the answers

Each layer in a brain is required to recognizes object all at once.

False (B)

Signup and view all the answers

To improve CNN, rather than using backpropagation like in modern deep learning models. What should be done instead?

rely on self organizing forward structure

Signup and view all the answers

The first use of ______ in neural networks can be tracked back to work of Paul Werbos in 1974.

backpropagation

Signup and view all the answers

Match:

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner = Gradient-Based Learning Applied to Document Recognition.

Signup and view all the answers

The use of LeNet-5's architecture was critical to what particular action concerning image classification tasks?

To facilitate the application of CNNs to real-world image classification tasks (C)

Signup and view all the answers

AlexNet decreased the error rates of traditional computer visions by a minuscule margin.

False (B)

Signup and view all the answers

What parameters make the CNNs unique in its use of fully connected neural networks?

implicit assumptions that inputs are in the image format

Signup and view all the answers

Using CNN, spatial information is while promoting weight .

preserved, sharing

Signup and view all the answers

What is the correct order?

[INPUT → Convolution →Activation → Pooling → FCN] = CNN canonical structure

Signup and view all the answers

The mathematical operation combining signals for a product functions is called:

Convolution (D)

Signup and view all the answers

Convolutions, in simple terms, only help detect patterns.

False (B)

Signup and view all the answers

In reference to sizes of the kernel of Filters K, what does the sum represent?

weighted sum of the pixel values

Signup and view all the answers

If an image is grayscale with shape 28 x 28 x 1, Filters will also be?

5 x 5 x 1

Signup and view all the answers

Why is padding used generally?

to address spatial dimensions (D)

Signup and view all the answers

Number of channels in Feature is always the same as the output.

False (B)

Signup and view all the answers

In the benefits of convolution, convolution focuses on what in regards to the output.

small local regions

Signup and view all the answers

In the Pooling Operation, it is used after convolution to reduce ______ dimensions.

spatial

Signup and view all the answers

Match by the amount of parameters something must have for significant training data:

Deeper Neural Net = More labelled points

Signup and view all the answers

Which of the following is a drawback of the Step Function (Heaviside Step Function)?

Non-differentiable at x = 0 (A)

Signup and view all the answers

The Sigmoid function avoids the vanishing gradient problem, making it effective in deep networks and hidden layers.

False (B)

Signup and view all the answers

How does the ReLU activation function address both efficient computation and potential sparsity in neural networks?

thresholding operation, sparse activations

Signup and view all the answers

Unlike ReLU, Leaky ReLU allows a small, non-zero gradient for values, which helps avoid the '' problem.

negative, dying ReLU

Signup and view all the answers

Match the type of activation function to its application:

Sigmoid = Binary classification probability outputs ReLU = Hidden layers of deep network

Signup and view all the answers

What is the primary function of the Softmax function in the output layer of a neural network?

To convert raw scores into probabilities for multi-class classification (C)

Signup and view all the answers

Fully Connected Networks are known to be parametrically efficient, especially in image recognition tasks.

False (B)

Signup and view all the answers

Why is a smaller output feature, such as dimensionally reduced input features in convolution, more efficient for CNNs?

reduce the number of parameters at Input layer

Signup and view all the answers

With excessive ______, the model can memorize the training data instead of learning general patterns.

parameters

Signup and view all the answers

What is the proper match:

Convolutional Neural Networks = An algorithm that leverages the local structure of an image to extract features

Signup and view all the answers

Flashcards

Fully Connected Networks (FCN)

Structured layered architecture for feature extraction and classification; consists of input, hidden, and output layers.

Activation Functions

Group of functions introducing non-linearity to neuron output to enable complex decision boundaries. Examples: ReLU, Sigmoid, Tanh.

Non-Linearity

Allows the network to learn complex patterns and decision boundaries.

Differentiability

Must be differentiable to enable gradient-based optimizations, like gradient descent.

Signup and view all the flashcards

ReLU (Rectified Linear Unit)

Activation function that outputs the input directly if it's positive, otherwise, it outputs zero.

Signup and view all the flashcards

Leaky ReLU

A ReLU variation that allows a small, non-zero gradient for negative values, helping to avoid the 'dying ReLU' problem.

Signup and view all the flashcards

Softmax Function

Transforms raw scores into a probability distribution across multiple classes.

Signup and view all the flashcards

Loss of Spatial Information

Loss of spatial relationships between pixels when an image is converted to a single stream of numbers.

Signup and view all the flashcards

Overfitting

When a model has too many parameters and memorizes the training data instead of learning general patterns.

Signup and view all the flashcards

Convolutional Neural Network (CNN)

Algorithm leveraging local image structures for feature extraction with local parameter sharing.

Signup and view all the flashcards

Convolution

Mathematical operation applying a filter to an image to extract features.

Signup and view all the flashcards

Kernel (Filter)

A small matrix used in convolution to extract specific image features.

Signup and view all the flashcards

Stride

The number of pixels the filter shifts over the input matrix during convolution.

Signup and view all the flashcards

Padding

Technique of adding extra pixels around an image to avoid losing information during convolution.

Signup and view all the flashcards

Pooling Layer

Reducing the spatial dimensions of feature maps, retaining essential information.

Signup and view all the flashcards

Parameter Sharing

The same filter is applied across different parts of the input, reducing the number of unique parameters and computational cost.

Signup and view all the flashcards

Local Connectivity

Unlike fully connected layers, each neuron in a convolutional layer connects only to a small local region of the input volume.

Signup and view all the flashcards

Study Notes

Lecture 05 covers an introduction to Convolutional Neural Networks (CNNs) for image classification
Aim: To solve challenges discussing image classification, understand CNN, layers for CNN, how to train CNN, and parameter sharing efficiency compared to FCN
Goal: Solve challenges being discussed for Image Classification by training such model

Challenges Recap

Challenge 2 highlighted the non-linear decision boundary problem
While a fully connected neural network improves upon a single perceptron, some misclassification remains for complex data
Challenge 1 addresses extracting features automatically i.e. dealing with high dimensionality of extracted pixel values, especially for larger images

Fully Connected Neural Networks

FCNs have a hierarchical structure
FCNs follow a structured, layered architecture, consisting of input, hidden, and output layers
The Input layer receives raw data or features
Hidden layers extract and refine hierarchical feature representations
The output layer produces predictions or classifications
The design enables progressive feature extraction; deeper layers capture higher-level abstractions
Key factors, like layered representations and non-linear activation functions, enhance hierarchical learning
Layered representations leads to each hidden layer learning complex patterns
From low to high-level features
Models complex decision boundaries: Step functions vs. ReLU, Sigmoid, Tanh to introduce non-linearity
This architecture mimics human cognition, enabling intricate relationship learning in data

Activation Functions

Activation functions introduce non-linearity to the output of neurons, enabling the learning of complex decision boundaries
A good activation function must have:
- Non-linearity
- Differentiability
- Computational Efficiency
- Gradient Behavior
Types of Modern Activation Functions
- Step Functions
  - Outputs 1 if the input meets theshold
  - Outputs 0 if it does not meet the threshold
  - Easy to compute
  - Effective for binary classification problems
  - Can be used in threshold – based decision making.
  - Non-differentiable at x = 0 which makes gradient-based optimization algorithms difficult
  - Saturated Output
  - Not Smooth
  - Never used in Image Classification
- Sigmoid Function
  - The Sigmoid activation function is differentiable everywhere, allowing for gradient – based optimization. Maps output to a range between 0 and 1 which makes it suitable for binary classification tasks
  - The Sigmoid activation function suffers from Vanishing Gradient Problem; for very high or low inputs, the gradient becomes very small, slowing down the learning process and can lead to a loss of information in deeper networks
  - The output is not zero centred
  - The function is computationally expensive
  - Useful for binary classification when you need a probability output
  - Rarely ever used in deep networks and hidden layers
  - The sigmoid activation function is given by: σ(z) = 1 / 1+e-z
  - The sigmoid's derivative, used to compute the gradients: σ'(z) = σ(z) * (1 – σ(z)), where a(z) output
  - When z approaches ∞ , σ(z) approaches 1
  - σ'(z) becomes very small; for σ(z) = 1 the derivative is: {σ'(z) = σ(z) * (1 − σ(z)) ⇒ 1 * (1– 1) = 0}
  - z approaches -∞, σ(z) approaches 0, and hence σ'(z) becomes very small = 0
  - vanishing gradient leads weights update and change minimally plus slows learning
- ReLU (Rectified Linear Unit)
  - ReLU outputs the input directly if it's positive; otherwise, it outputs zero
  - Efficient Computation
  - Easy to compute
  - No Vanishing Gradients for positive values.
  - Outputs 1 since ReLU doesn't suffer from vanishing gradient
  - Since negative inputs are set to zero, ReLU produces sparse activations, reducing overfitting and improving model efficiency
  - Suffers from Dying ReLU problem, when inputs aren't negative ReLU outputs zero
  - If this happens often, some neurons may never activate
  - Dying ReLu makes those neurons useless during training
  - Not zero centered
  - Useful for hidden layers in deep networks, and is the go-to activation function for CNNs
- Leaky ReLU
  - Unlike ReLU, Leaky ReLU allows a small, non-zero gradient for negative values, avoids ReLU" problem
  - Allows small slope for negative inputs mitigating neurons dying
  - Is still Not Zero Centred
  - Hyper-parameter tuning of a (alpha) parameters needs set, which may lead to a suboptimal performance
  - Should be used in Hidden layers if deep network or in CNN if Overfitting is an issue.
- Softmax for Multi-class Classification on the Output Lyaer
  - It is commonly used in the output layer of a neural network for multi class classification problems
  - Transforms raw scores to probabilities by exponentiating/normalizing
  - Outputs a probability distribution between 0 and 1
  - Where the sum of all the outputs is 1
  - f(zi) = ezi / ∑ej [ Here: z → raw score or logit for the ith class, C number of classes]

Limitations of Fully Connected Networks

Fully Connected Networks have limitations
How are features extracted? -Deep Neural Networks use vectors as an input and convert two dimensional arrays of data into vectors
What extracted features we use and how do we use them? -Pixel values from data converted to csv files
Limitations: -Loss of spatial information when flattened into vector format -Too many parameters

Fully Connected Limitations Solved

The FCNs not designed for Image Recognition Task so cannot automatically learn features
FCN treat each pixel as independent features, which require lots of parameters and aren't designed for local structures in images
FCNs cannot extract textures, shapes, or edges Classical computer vision approaches like edge detection could be used
Handcrafted features may not generalize well or require domain expertise
Algorithm required which promotes parameter sharing, by retaining dimensionality and keeping most significant features
Challenges of Processing Image data: Viewpoint variation, deformation, intra-class variation, occlusion, background clutter
Solutions inspired by how the brain recognizes images which leverages edges, texture, shapes
The solution is a Convolutional Neural Network (CNN)

Convolutional Neural Networks (CNN)

The Convolutional Neural Network is inspired by the brain as it processes layers from basic to shapes
Neocognitron, a hierarchical neural network for pattern recognition created in 1980 was inspired by findings and used multiple layers used to extract features
- It uses local receptive fields to detect simple patterns
- And weight sharing to recognize shapes
- Depended self organizing forward structure for learning without back propagation
Werbos and backpropogation in 1974 helps train multilayer perceptrons
Demonstration by Hinton on backpropogation can efficiently train deep neural networks is considered a seminal work in AI research
Convolutional Neural Networks with Convolutional and Pooling later helps train deep neural networks
LeNet demonstrated that CNNs were a critical step in image classification
In 2012, AlexNet made large strides through CNNs to large-scale image classification, such as winning ImageNet competition
AlexNet used ReLU activations dropout, and data augmentation to train a deep network
Convolution is a special case of a fully connected neural network where CNNs aka convnets
Similar to neural networks, made of neurons with learnable weights and biases
The essential difference is that CNNs are designed with implicit assumptions that inputs are in the image format
Lets architecture encode certain properties to extract features, preserve spatial info, promote weight sharing
CNN achieve this via Convolution, Activation, Pooling, Fully Connected layers

CNN Components: Convolution

The Convolution is a mathematical operation that combines two signals
This is often done to apply a kernel/filter in Image processing for edge detection, etc
Using a small filter/matrix which slides over the image and performing mathematical operations help Convolutional Neural Networks models detect patterns
The convolution operation between an image I and a filter (kernel) K is defined as: (I * K)x,y m-1 n-1 = Σ Σ=0 I(x + Zi=0 y) • K(i, j)
Where (x, y) → are the output feature map coordinate (convolved image)
m, n → the height and width of the kernel/filter K
Filters must have same color channels dimensions as inputs
The size of the sum represents a weighted sum of the pixel values in the Image I at positions that corresponds to the filter K
Stride refers to the number of pixels in the input matrix
Padding are added to the image around the boundaries
The Convolution is set up 1 × 1 + 1 × 0 + 1 × 1 + 0 × 0 + 1 × 1 + 1 × 0 + 0 × 1 + 0 × 0 + 1 × 1
Helps extract edges such as horizontal and vertical edges
Filters can be combined to extract more advanced features

Convolutional Dimensions and Benefits

The two filter map which each have 3 channel
When working with a color image, it should also use 3 channels
After the convolution, there is an activation function, to choose goes to ReLU
The output from convolution is the convolved feature map
There are advantages for convolutions
There are takeaways where Same filter was applied on different location to reduce parameters
The input dimension of 6 × 6 × 3 can be reduced to 3 × 3 × 2 (but not with padding)
A convolutional Layer is defined by hyperparameters, by applying feature detectors
Given an input is W x H x C; in, the output volume dimensions are (Win-F+2P)/S +1, where the last term becomes the K-Filter
Local connectivity, and Parameter sharing are other benefits
Benefits to using a Deep Convolutional Network are that
- Some basic features are learned such as edges and darkspots
- Some advance shapes learned such as eyes, ears, and nose
- More complex ones and structure such as facial structure, allow layered representations
One objective is to reduce the spatial dimension, which increases the density channel
A pooling layer is typically used after convolution to reduce dimensions and retains essential information

CNN Components: Pooling Layer

1. 7 - pooling layer
7 - one of the objectives of convolution operation, spatial dimension, height, times the width
reduce the spatial dimension and retain essential information
- Types of pooling are:
7 max.
Average, takes average
The Pooling Layer does not have learnable parameters so no weight updates are required
A model of 6 × 4. And the final dimension output is two and 3 7 for this demo
It is achieved via window slide operation
Hyperparameters in input pooling layers; size of filters typically 2 x 2
The training of the CNN occurs through forward pass with propagation. With error; using gradient descent

Using CNN to Train Image Classification

Learn features in input through convolution. The feature learning happens via convolusional operation
Introduce non-linearity through activation.
Reduce dimensions and preserve spatial variance with the output.
Fully Connected Layer does classification of input by looking at the image features and it is called “end to end model
Final step is backward pass which has two main steps
Step 1. gradient of filter, to determine how much filter to loss the computer and does the convolution and has 2 Computing the gradient, but with the and that so it is also achieved window sliding.

Backward Pass for CNN

Backpropogation @ Convolutional Layer
Fully Connected (FC) Layer receives the flattened feature vector; needs reshaped
Follows chain rule:
- Hidden: δ' = (W¹+18!+¹) Og'(z') are elemtnwise of output functions that are derived.
  - W¹+¹8¹+1 → propagates error from net lyr
- Layer Output applies Softmax and has Cross Entropy:
  - δ' = ƏL/Əz = ŷ-y (error term elementwise) The backward pass of convolution layer; perform two main steps: (1)Computing the gradient w.r.t. the filter F that tells how much each filter contributes to the loss) +convolution input X and the gradient (2)computing the grandly w.r.t. the input x by calculating the convolution The final gradient computing formula:
- ∂E/∂x = δ⊙F Looks like both forward and Compute Convolution Operation.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

CNNs for Image Classification

Choose a study mode

Podcast

Questions and Answers

FCNs (Fully Connected Networks) follow a structured layered architecture. Which of the following is NOT a layer type in FCNs?

Progressive feature extraction in FCNs allows deeper layers to capture lower-level abstractions, enabling the network to understand fine-grained details first before grasping broader concepts.

What key properties must a good activation function possess for effective neural network training?

The sigmoid activation function maps its output to a range between ______ and ______, making it suitable for binary classification tasks.

Match each activation function with its potential issue in deep neural networks:

To address non-linear decision boundaries, you can use:

Deep Neural Networks parameter efficiency is improved when pixel values are extracted as features.

What does flattening an image cause, and what can be done to prevent this to have spatial structure remain in tact?

In situations where there are excessive parameters the model can ______ the training data, rather than learning.

Match the following parameters with their respective challenges:

What design does a Convolutional Neural Network have?

Each layer in a brain is required to recognizes object all at once.

To improve CNN, rather than using backpropagation like in modern deep learning models. What should be done instead?

The first use of ______ in neural networks can be tracked back to work of Paul Werbos in 1974.

Match:

The use of LeNet-5's architecture was critical to what particular action concerning image classification tasks?

AlexNet decreased the error rates of traditional computer visions by a minuscule margin.

What parameters make the CNNs unique in its use of fully connected neural networks?

Using CNN, spatial information is ______ while promoting weight ______.

What is the correct order?

The mathematical operation combining signals for a product functions is called:

Convolutions, in simple terms, only help detect patterns.

In reference to sizes of the kernel of Filters K, what does the sum represent?

If an image is grayscale with shape 28 x 28 x 1, Filters will also be?

Why is padding used generally?

Number of channels in Feature is always the same as the output.

In the benefits of convolution, convolution focuses on what in regards to the output.

In the Pooling Operation, it is used after convolution to reduce ______ dimensions.

Match by the amount of parameters something must have for significant training data:

Which of the following is a drawback of the Step Function (Heaviside Step Function)?

The Sigmoid function avoids the vanishing gradient problem, making it effective in deep networks and hidden layers.

How does the ReLU activation function address both efficient computation and potential sparsity in neural networks?

Unlike ReLU, Leaky ReLU allows a small, non-zero gradient for ______ values, which helps avoid the '______' problem.

Match the type of activation function to its application:

What is the primary function of the Softmax function in the output layer of a neural network?

Fully Connected Networks are known to be parametrically efficient, especially in image recognition tasks.

Why is a smaller output feature, such as dimensionally reduced input features in convolution, more efficient for CNNs?

With excessive ______, the model can memorize the training data instead of learning general patterns.

What is the proper match:

Flashcards

Fully Connected Networks (FCN)

Activation Functions

Non-Linearity

Differentiability

ReLU (Rectified Linear Unit)

Leaky ReLU

Softmax Function

Loss of Spatial Information

Overfitting

Convolutional Neural Network (CNN)

Convolution

Kernel (Filter)

Stride

Padding

Pooling Layer

Parameter Sharing

Local Connectivity

Study Notes

Challenges Recap

Fully Connected Neural Networks

Activation Functions

Limitations of Fully Connected Networks

Fully Connected Limitations Solved

Convolutional Neural Networks (CNN)

CNN Components: Convolution

Convolutional Dimensions and Benefits

CNN Components: Pooling Layer

Using CNN to Train Image Classification

Backward Pass for CNN

Studying That Suits You

Related Documents

More Like This

Convolutional Neural Networks (CNN) and Image Classification Quiz

GoogleLenet: Practical Aspects of Deep Learning Quiz and Flashcards

VGG19 Model Quiz: Deep Convolutional Neural Networks for Image Classif...

Image Classification and Convnet Training

The sigmoid activation function maps its output to a range between and , making it suitable for binary classification tasks.

Using CNN, spatial information is while promoting weight .

Unlike ReLU, Leaky ReLU allows a small, non-zero gradient for values, which helps avoid the '' problem.