CNNs for Image Classification

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

FCNs (Fully Connected Networks) follow a structured layered architecture. Which of the following is NOT a layer type in FCNs?

  • Convolutional Layer (correct)
  • Output Layer
  • Hidden Layers
  • Input Layer

Progressive feature extraction in FCNs allows deeper layers to capture lower-level abstractions, enabling the network to understand fine-grained details first before grasping broader concepts.

False (B)

What key properties must a good activation function possess for effective neural network training?

non-linearity, differentiability, computational efficiency, gradient behavior

The sigmoid activation function maps its output to a range between ______ and ______, making it suitable for binary classification tasks.

<p>0, 1</p>
Signup and view all the answers

Match each activation function with its potential issue in deep neural networks:

<p>Sigmoid = Vanishing gradient problem ReLU = Dying ReLU problem Leaky ReLU = Suboptimal performance due to incorrect hyperparameter</p>
Signup and view all the answers

To address non-linear decision boundaries, you can use:

<p>Fully Connected Neural Network (C)</p>
Signup and view all the answers

Deep Neural Networks parameter efficiency is improved when pixel values are extracted as features.

<p>False (B)</p>
Signup and view all the answers

What does flattening an image cause, and what can be done to prevent this to have spatial structure remain in tact?

<p>loss of spatial information, spatial relationships, process local regions of image</p>
Signup and view all the answers

In situations where there are excessive parameters the model can ______ the training data, rather than learning.

<p>memorize</p>
Signup and view all the answers

Match the following parameters with their respective challenges:

<p>Viewpoint variation = Pixel's value change with camera angles Intra-class variation = Algorithm that leverages the local structures of an image to extract features, promotes parameter sharing locally and preserves important spatial information</p>
Signup and view all the answers

What design does a Convolutional Neural Network have?

<p>Leverages local structures (D)</p>
Signup and view all the answers

Each layer in a brain is required to recognizes object all at once.

<p>False (B)</p>
Signup and view all the answers

To improve CNN, rather than using backpropagation like in modern deep learning models. What should be done instead?

<p>rely on self organizing forward structure</p>
Signup and view all the answers

The first use of ______ in neural networks can be tracked back to work of Paul Werbos in 1974.

<p>backpropagation</p>
Signup and view all the answers

Match:

<p>Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner = Gradient-Based Learning Applied to Document Recognition.</p>
Signup and view all the answers

The use of LeNet-5's architecture was critical to what particular action concerning image classification tasks?

<p>To facilitate the application of CNNs to real-world image classification tasks (C)</p>
Signup and view all the answers

AlexNet decreased the error rates of traditional computer visions by a minuscule margin.

<p>False (B)</p>
Signup and view all the answers

What parameters make the CNNs unique in its use of fully connected neural networks?

<p>implicit assumptions that inputs are in the image format</p>
Signup and view all the answers

Using CNN, spatial information is ______ while promoting weight ______.

<p>preserved, sharing</p>
Signup and view all the answers

What is the correct order?

<p>[INPUT → Convolution →Activation → Pooling → FCN] = CNN canonical structure</p>
Signup and view all the answers

The mathematical operation combining signals for a product functions is called:

<p>Convolution (D)</p>
Signup and view all the answers

Convolutions, in simple terms, only help detect patterns.

<p>False (B)</p>
Signup and view all the answers

In reference to sizes of the kernel of Filters K, what does the sum represent?

<p>weighted sum of the pixel values</p>
Signup and view all the answers

If an image is grayscale with shape 28 x 28 x 1, Filters will also be?

<p>5 x 5 x 1</p>
Signup and view all the answers

Why is padding used generally?

<p>to address spatial dimensions (D)</p>
Signup and view all the answers

Number of channels in Feature is always the same as the output.

<p>False (B)</p>
Signup and view all the answers

In the benefits of convolution, convolution focuses on what in regards to the output.

<p>small local regions</p>
Signup and view all the answers

In the Pooling Operation, it is used after convolution to reduce ______ dimensions.

<p>spatial</p>
Signup and view all the answers

Match by the amount of parameters something must have for significant training data:

<p>Deeper Neural Net = More labelled points</p>
Signup and view all the answers

Which of the following is a drawback of the Step Function (Heaviside Step Function)?

<p>Non-differentiable at x = 0 (A)</p>
Signup and view all the answers

The Sigmoid function avoids the vanishing gradient problem, making it effective in deep networks and hidden layers.

<p>False (B)</p>
Signup and view all the answers

How does the ReLU activation function address both efficient computation and potential sparsity in neural networks?

<p>thresholding operation, sparse activations</p>
Signup and view all the answers

Unlike ReLU, Leaky ReLU allows a small, non-zero gradient for ______ values, which helps avoid the '______' problem.

<p>negative, dying ReLU</p>
Signup and view all the answers

Match the type of activation function to its application:

<p>Sigmoid = Binary classification probability outputs ReLU = Hidden layers of deep network</p>
Signup and view all the answers

What is the primary function of the Softmax function in the output layer of a neural network?

<p>To convert raw scores into probabilities for multi-class classification (C)</p>
Signup and view all the answers

Fully Connected Networks are known to be parametrically efficient, especially in image recognition tasks.

<p>False (B)</p>
Signup and view all the answers

Why is a smaller output feature, such as dimensionally reduced input features in convolution, more efficient for CNNs?

<p>reduce the number of parameters at Input layer</p>
Signup and view all the answers

With excessive ______, the model can memorize the training data instead of learning general patterns.

<p>parameters</p>
Signup and view all the answers

What is the proper match:

<p>Convolutional Neural Networks = An algorithm that leverages the local structure of an image to extract features</p>
Signup and view all the answers

Flashcards

Fully Connected Networks (FCN)

Structured layered architecture for feature extraction and classification; consists of input, hidden, and output layers.

Activation Functions

Group of functions introducing non-linearity to neuron output to enable complex decision boundaries. Examples: ReLU, Sigmoid, Tanh.

Non-Linearity

Allows the network to learn complex patterns and decision boundaries.

Differentiability

Must be differentiable to enable gradient-based optimizations, like gradient descent.

Signup and view all the flashcards

ReLU (Rectified Linear Unit)

Activation function that outputs the input directly if it's positive, otherwise, it outputs zero.

Signup and view all the flashcards

Leaky ReLU

A ReLU variation that allows a small, non-zero gradient for negative values, helping to avoid the 'dying ReLU' problem.

Signup and view all the flashcards

Softmax Function

Transforms raw scores into a probability distribution across multiple classes.

Signup and view all the flashcards

Loss of Spatial Information

Loss of spatial relationships between pixels when an image is converted to a single stream of numbers.

Signup and view all the flashcards

Overfitting

When a model has too many parameters and memorizes the training data instead of learning general patterns.

Signup and view all the flashcards

Convolutional Neural Network (CNN)

Algorithm leveraging local image structures for feature extraction with local parameter sharing.

Signup and view all the flashcards

Convolution

Mathematical operation applying a filter to an image to extract features.

Signup and view all the flashcards

Kernel (Filter)

A small matrix used in convolution to extract specific image features.

Signup and view all the flashcards

Stride

The number of pixels the filter shifts over the input matrix during convolution.

Signup and view all the flashcards

Padding

Technique of adding extra pixels around an image to avoid losing information during convolution.

Signup and view all the flashcards

Pooling Layer

Reducing the spatial dimensions of feature maps, retaining essential information.

Signup and view all the flashcards

Parameter Sharing

The same filter is applied across different parts of the input, reducing the number of unique parameters and computational cost.

Signup and view all the flashcards

Local Connectivity

Unlike fully connected layers, each neuron in a convolutional layer connects only to a small local region of the input volume.

Signup and view all the flashcards

Study Notes

  • Lecture 05 covers an introduction to Convolutional Neural Networks (CNNs) for image classification
  • Aim: To solve challenges discussing image classification, understand CNN, layers for CNN, how to train CNN, and parameter sharing efficiency compared to FCN
  • Goal: Solve challenges being discussed for Image Classification by training such model

Challenges Recap

  • Challenge 2 highlighted the non-linear decision boundary problem
  • While a fully connected neural network improves upon a single perceptron, some misclassification remains for complex data
  • Challenge 1 addresses extracting features automatically i.e. dealing with high dimensionality of extracted pixel values, especially for larger images

Fully Connected Neural Networks

  • FCNs have a hierarchical structure
  • FCNs follow a structured, layered architecture, consisting of input, hidden, and output layers
  • The Input layer receives raw data or features
  • Hidden layers extract and refine hierarchical feature representations
  • The output layer produces predictions or classifications
  • The design enables progressive feature extraction; deeper layers capture higher-level abstractions
  • Key factors, like layered representations and non-linear activation functions, enhance hierarchical learning
  • Layered representations leads to each hidden layer learning complex patterns
  • From low to high-level features
  • Models complex decision boundaries: Step functions vs. ReLU, Sigmoid, Tanh to introduce non-linearity
  • This architecture mimics human cognition, enabling intricate relationship learning in data

Activation Functions

  • Activation functions introduce non-linearity to the output of neurons, enabling the learning of complex decision boundaries
  • A good activation function must have:
    • Non-linearity
    • Differentiability
    • Computational Efficiency
    • Gradient Behavior
  • Types of Modern Activation Functions
    • Step Functions
      • Outputs 1 if the input meets theshold
      • Outputs 0 if it does not meet the threshold
      • Easy to compute
      • Effective for binary classification problems
      • Can be used in threshold – based decision making.
      • Non-differentiable at x = 0 which makes gradient-based optimization algorithms difficult
      • Saturated Output
      • Not Smooth
      • Never used in Image Classification
    • Sigmoid Function
      • The Sigmoid activation function is differentiable everywhere, allowing for gradient – based optimization. Maps output to a range between 0 and 1 which makes it suitable for binary classification tasks
      • The Sigmoid activation function suffers from Vanishing Gradient Problem; for very high or low inputs, the gradient becomes very small, slowing down the learning process and can lead to a loss of information in deeper networks
      • The output is not zero centred
      • The function is computationally expensive
      • Useful for binary classification when you need a probability output
      • Rarely ever used in deep networks and hidden layers
      • The sigmoid activation function is given by: σ(z) = 1 / 1+e-z
      • The sigmoid's derivative, used to compute the gradients: σ'(z) = σ(z) * (1 – σ(z)), where a(z) output
      • When z approaches ∞ , σ(z) approaches 1
      • σ'(z) becomes very small; for σ(z) = 1 the derivative is: {σ'(z) = σ(z) * (1 − σ(z)) ⇒ 1 * (1– 1) = 0}
      • z approaches -∞, σ(z) approaches 0, and hence σ'(z) becomes very small = 0
      • vanishing gradient leads weights update and change minimally plus slows learning
    • ReLU (Rectified Linear Unit)
      • ReLU outputs the input directly if it's positive; otherwise, it outputs zero
      • Efficient Computation
      • Easy to compute
      • No Vanishing Gradients for positive values.
      • Outputs 1 since ReLU doesn't suffer from vanishing gradient
      • Since negative inputs are set to zero, ReLU produces sparse activations, reducing overfitting and improving model efficiency
      • Suffers from Dying ReLU problem, when inputs aren't negative ReLU outputs zero
      • If this happens often, some neurons may never activate
      • Dying ReLu makes those neurons useless during training
      • Not zero centered
      • Useful for hidden layers in deep networks, and is the go-to activation function for CNNs
    • Leaky ReLU
      • Unlike ReLU, Leaky ReLU allows a small, non-zero gradient for negative values, avoids ReLU" problem
      • Allows small slope for negative inputs mitigating neurons dying
      • Is still Not Zero Centred
      • Hyper-parameter tuning of a (alpha) parameters needs set, which may lead to a suboptimal performance
      • Should be used in Hidden layers if deep network or in CNN if Overfitting is an issue.
    • Softmax for Multi-class Classification on the Output Lyaer
      • It is commonly used in the output layer of a neural network for multi class classification problems
      • Transforms raw scores to probabilities by exponentiating/normalizing
      • Outputs a probability distribution between 0 and 1
      • Where the sum of all the outputs is 1
      • f(zi) = ezi / ∑ej [ Here: z → raw score or logit for the ith class, C number of classes]

Limitations of Fully Connected Networks

  • Fully Connected Networks have limitations
  • How are features extracted? -Deep Neural Networks use vectors as an input and convert two dimensional arrays of data into vectors
  • What extracted features we use and how do we use them? -Pixel values from data converted to csv files
  • Limitations: -Loss of spatial information when flattened into vector format -Too many parameters

Fully Connected Limitations Solved

  • The FCNs not designed for Image Recognition Task so cannot automatically learn features
  • FCN treat each pixel as independent features, which require lots of parameters and aren't designed for local structures in images
  • FCNs cannot extract textures, shapes, or edges Classical computer vision approaches like edge detection could be used
  • Handcrafted features may not generalize well or require domain expertise
  • Algorithm required which promotes parameter sharing, by retaining dimensionality and keeping most significant features
  • Challenges of Processing Image data: Viewpoint variation, deformation, intra-class variation, occlusion, background clutter
  • Solutions inspired by how the brain recognizes images which leverages edges, texture, shapes
  • The solution is a Convolutional Neural Network (CNN)

Convolutional Neural Networks (CNN)

  • The Convolutional Neural Network is inspired by the brain as it processes layers from basic to shapes
  • Neocognitron, a hierarchical neural network for pattern recognition created in 1980 was inspired by findings and used multiple layers used to extract features
    • It uses local receptive fields to detect simple patterns
    • And weight sharing to recognize shapes
    • Depended self organizing forward structure for learning without back propagation
  • Werbos and backpropogation in 1974 helps train multilayer perceptrons
  • Demonstration by Hinton on backpropogation can efficiently train deep neural networks is considered a seminal work in AI research
  • Convolutional Neural Networks with Convolutional and Pooling later helps train deep neural networks
  • LeNet demonstrated that CNNs were a critical step in image classification
  • In 2012, AlexNet made large strides through CNNs to large-scale image classification, such as winning ImageNet competition
  • AlexNet used ReLU activations dropout, and data augmentation to train a deep network
  • Convolution is a special case of a fully connected neural network where CNNs aka convnets
  • Similar to neural networks, made of neurons with learnable weights and biases
  • The essential difference is that CNNs are designed with implicit assumptions that inputs are in the image format
  • Lets architecture encode certain properties to extract features, preserve spatial info, promote weight sharing
  • CNN achieve this via Convolution, Activation, Pooling, Fully Connected layers

CNN Components: Convolution

  • The Convolution is a mathematical operation that combines two signals
  • This is often done to apply a kernel/filter in Image processing for edge detection, etc
  • Using a small filter/matrix which slides over the image and performing mathematical operations help Convolutional Neural Networks models detect patterns
  • The convolution operation between an image I and a filter (kernel) K is defined as: (I * K)x,y m-1 n-1 = Σ Σ=0 I(x + Zi=0 y) • K(i, j)
  • Where (x, y) → are the output feature map coordinate (convolved image)
  • m, n → the height and width of the kernel/filter K
  • Filters must have same color channels dimensions as inputs
  • The size of the sum represents a weighted sum of the pixel values in the Image I at positions that corresponds to the filter K
  • Stride refers to the number of pixels in the input matrix
  • Padding are added to the image around the boundaries
  • The Convolution is set up 1 × 1 + 1 × 0 + 1 × 1 + 0 × 0 + 1 × 1 + 1 × 0 + 0 × 1 + 0 × 0 + 1 × 1
  • Helps extract edges such as horizontal and vertical edges
  • Filters can be combined to extract more advanced features

Convolutional Dimensions and Benefits

  • The two filter map which each have 3 channel
  • When working with a color image, it should also use 3 channels
  • After the convolution, there is an activation function, to choose goes to ReLU
  • The output from convolution is the convolved feature map
  • There are advantages for convolutions
  • There are takeaways where Same filter was applied on different location to reduce parameters
  • The input dimension of 6 × 6 × 3 can be reduced to 3 × 3 × 2 (but not with padding)
  • A convolutional Layer is defined by hyperparameters, by applying feature detectors
  • Given an input is W x H x C; in, the output volume dimensions are (Win-F+2P)/S +1, where the last term becomes the K-Filter
  • Local connectivity, and Parameter sharing are other benefits
  • Benefits to using a Deep Convolutional Network are that
    • Some basic features are learned such as edges and darkspots
    • Some advance shapes learned such as eyes, ears, and nose
    • More complex ones and structure such as facial structure, allow layered representations
  • One objective is to reduce the spatial dimension, which increases the density channel
  • A pooling layer is typically used after convolution to reduce dimensions and retains essential information

CNN Components: Pooling Layer

    1. 7 - pooling layer
  • 7 - one of the objectives of convolution operation, spatial dimension, height, times the width
  • reduce the spatial dimension and retain essential information
    • Types of pooling are:
  • 7 max.
  • Average, takes average
  • The Pooling Layer does not have learnable parameters so no weight updates are required
  • A model of 6 × 4. And the final dimension output is two and 3 7 for this demo
  • It is achieved via window slide operation
  • Hyperparameters in input pooling layers; size of filters typically 2 x 2
  • The training of the CNN occurs through forward pass with propagation. With error; using gradient descent

Using CNN to Train Image Classification

  • Learn features in input through convolution. The feature learning happens via convolusional operation
  • Introduce non-linearity through activation.
  • Reduce dimensions and preserve spatial variance with the output.
  • Fully Connected Layer does classification of input by looking at the image features and it is called “end to end model
  • Final step is backward pass which has two main steps
  • Step 1. gradient of filter, to determine how much filter to loss the computer and does the convolution and has 2 Computing the gradient, but with the and that so it is also achieved window sliding.

Backward Pass for CNN

  • Backpropogation @ Convolutional Layer
  • Fully Connected (FC) Layer receives the flattened feature vector; needs reshaped
  • Follows chain rule:
    • Hidden: δ' = (W¹+18!+¹) Og'(z') are elemtnwise of output functions that are derived.
      • W¹+¹8¹+1 → propagates error from net lyr
    • Layer Output applies Softmax and has Cross Entropy:
      • δ' = ƏL/Əz = ŷ-y (error term elementwise) The backward pass of convolution layer; perform two main steps: (1)Computing the gradient w.r.t. the filter F that tells how much each filter contributes to the loss) +convolution input X and the gradient (2)computing the grandly w.r.t. the input x by calculating the convolution The final gradient computing formula:
    • ∂E/∂x = δ⊙F Looks like both forward and Compute Convolution Operation.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Use Quizgecko on...
Browser
Browser