Podcast
Questions and Answers
FCNs (Fully Connected Networks) follow a structured layered architecture. Which of the following is NOT a layer type in FCNs?
FCNs (Fully Connected Networks) follow a structured layered architecture. Which of the following is NOT a layer type in FCNs?
- Convolutional Layer (correct)
- Output Layer
- Hidden Layers
- Input Layer
Progressive feature extraction in FCNs allows deeper layers to capture lower-level abstractions, enabling the network to understand fine-grained details first before grasping broader concepts.
Progressive feature extraction in FCNs allows deeper layers to capture lower-level abstractions, enabling the network to understand fine-grained details first before grasping broader concepts.
False (B)
What key properties must a good activation function possess for effective neural network training?
What key properties must a good activation function possess for effective neural network training?
non-linearity, differentiability, computational efficiency, gradient behavior
The sigmoid activation function maps its output to a range between ______ and ______, making it suitable for binary classification tasks.
The sigmoid activation function maps its output to a range between ______ and ______, making it suitable for binary classification tasks.
Match each activation function with its potential issue in deep neural networks:
Match each activation function with its potential issue in deep neural networks:
To address non-linear decision boundaries, you can use:
To address non-linear decision boundaries, you can use:
Deep Neural Networks parameter efficiency is improved when pixel values are extracted as features.
Deep Neural Networks parameter efficiency is improved when pixel values are extracted as features.
What does flattening an image cause, and what can be done to prevent this to have spatial structure remain in tact?
What does flattening an image cause, and what can be done to prevent this to have spatial structure remain in tact?
In situations where there are excessive parameters the model can ______ the training data, rather than learning.
In situations where there are excessive parameters the model can ______ the training data, rather than learning.
Match the following parameters with their respective challenges:
Match the following parameters with their respective challenges:
What design does a Convolutional Neural Network have?
What design does a Convolutional Neural Network have?
Each layer in a brain is required to recognizes object all at once.
Each layer in a brain is required to recognizes object all at once.
To improve CNN, rather than using backpropagation like in modern deep learning models. What should be done instead?
To improve CNN, rather than using backpropagation like in modern deep learning models. What should be done instead?
The first use of ______ in neural networks can be tracked back to work of Paul Werbos in 1974.
The first use of ______ in neural networks can be tracked back to work of Paul Werbos in 1974.
Match:
Match:
The use of LeNet-5's architecture was critical to what particular action concerning image classification tasks?
The use of LeNet-5's architecture was critical to what particular action concerning image classification tasks?
AlexNet decreased the error rates of traditional computer visions by a minuscule margin.
AlexNet decreased the error rates of traditional computer visions by a minuscule margin.
What parameters make the CNNs unique in its use of fully connected neural networks?
What parameters make the CNNs unique in its use of fully connected neural networks?
Using CNN, spatial information is ______ while promoting weight ______.
Using CNN, spatial information is ______ while promoting weight ______.
What is the correct order?
What is the correct order?
The mathematical operation combining signals for a product functions is called:
The mathematical operation combining signals for a product functions is called:
Convolutions, in simple terms, only help detect patterns.
Convolutions, in simple terms, only help detect patterns.
In reference to sizes of the kernel of Filters K, what does the sum represent?
In reference to sizes of the kernel of Filters K, what does the sum represent?
If an image is grayscale with shape 28 x 28 x 1, Filters will also be?
If an image is grayscale with shape 28 x 28 x 1, Filters will also be?
Why is padding used generally?
Why is padding used generally?
Number of channels in Feature is always the same as the output.
Number of channels in Feature is always the same as the output.
In the benefits of convolution, convolution focuses on what in regards to the output.
In the benefits of convolution, convolution focuses on what in regards to the output.
In the Pooling Operation, it is used after convolution to reduce ______ dimensions.
In the Pooling Operation, it is used after convolution to reduce ______ dimensions.
Match by the amount of parameters something must have for significant training data:
Match by the amount of parameters something must have for significant training data:
Which of the following is a drawback of the Step Function (Heaviside Step Function)?
Which of the following is a drawback of the Step Function (Heaviside Step Function)?
The Sigmoid function avoids the vanishing gradient problem, making it effective in deep networks and hidden layers.
The Sigmoid function avoids the vanishing gradient problem, making it effective in deep networks and hidden layers.
How does the ReLU activation function address both efficient computation and potential sparsity in neural networks?
How does the ReLU activation function address both efficient computation and potential sparsity in neural networks?
Unlike ReLU, Leaky ReLU allows a small, non-zero gradient for ______ values, which helps avoid the '______' problem.
Unlike ReLU, Leaky ReLU allows a small, non-zero gradient for ______ values, which helps avoid the '______' problem.
Match the type of activation function to its application:
Match the type of activation function to its application:
What is the primary function of the Softmax function in the output layer of a neural network?
What is the primary function of the Softmax function in the output layer of a neural network?
Fully Connected Networks are known to be parametrically efficient, especially in image recognition tasks.
Fully Connected Networks are known to be parametrically efficient, especially in image recognition tasks.
Why is a smaller output feature, such as dimensionally reduced input features in convolution, more efficient for CNNs?
Why is a smaller output feature, such as dimensionally reduced input features in convolution, more efficient for CNNs?
With excessive ______, the model can memorize the training data instead of learning general patterns.
With excessive ______, the model can memorize the training data instead of learning general patterns.
What is the proper match:
What is the proper match:
Flashcards
Fully Connected Networks (FCN)
Fully Connected Networks (FCN)
Structured layered architecture for feature extraction and classification; consists of input, hidden, and output layers.
Activation Functions
Activation Functions
Group of functions introducing non-linearity to neuron output to enable complex decision boundaries. Examples: ReLU, Sigmoid, Tanh.
Non-Linearity
Non-Linearity
Allows the network to learn complex patterns and decision boundaries.
Differentiability
Differentiability
Signup and view all the flashcards
ReLU (Rectified Linear Unit)
ReLU (Rectified Linear Unit)
Signup and view all the flashcards
Leaky ReLU
Leaky ReLU
Signup and view all the flashcards
Softmax Function
Softmax Function
Signup and view all the flashcards
Loss of Spatial Information
Loss of Spatial Information
Signup and view all the flashcards
Overfitting
Overfitting
Signup and view all the flashcards
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
Signup and view all the flashcards
Convolution
Convolution
Signup and view all the flashcards
Kernel (Filter)
Kernel (Filter)
Signup and view all the flashcards
Stride
Stride
Signup and view all the flashcards
Padding
Padding
Signup and view all the flashcards
Pooling Layer
Pooling Layer
Signup and view all the flashcards
Parameter Sharing
Parameter Sharing
Signup and view all the flashcards
Local Connectivity
Local Connectivity
Signup and view all the flashcards
Study Notes
- Lecture 05 covers an introduction to Convolutional Neural Networks (CNNs) for image classification
- Aim: To solve challenges discussing image classification, understand CNN, layers for CNN, how to train CNN, and parameter sharing efficiency compared to FCN
- Goal: Solve challenges being discussed for Image Classification by training such model
Challenges Recap
- Challenge 2 highlighted the non-linear decision boundary problem
- While a fully connected neural network improves upon a single perceptron, some misclassification remains for complex data
- Challenge 1 addresses extracting features automatically i.e. dealing with high dimensionality of extracted pixel values, especially for larger images
Fully Connected Neural Networks
- FCNs have a hierarchical structure
- FCNs follow a structured, layered architecture, consisting of input, hidden, and output layers
- The Input layer receives raw data or features
- Hidden layers extract and refine hierarchical feature representations
- The output layer produces predictions or classifications
- The design enables progressive feature extraction; deeper layers capture higher-level abstractions
- Key factors, like layered representations and non-linear activation functions, enhance hierarchical learning
- Layered representations leads to each hidden layer learning complex patterns
- From low to high-level features
- Models complex decision boundaries: Step functions vs. ReLU, Sigmoid, Tanh to introduce non-linearity
- This architecture mimics human cognition, enabling intricate relationship learning in data
Activation Functions
- Activation functions introduce non-linearity to the output of neurons, enabling the learning of complex decision boundaries
- A good activation function must have:
- Non-linearity
- Differentiability
- Computational Efficiency
- Gradient Behavior
- Types of Modern Activation Functions
- Step Functions
- Outputs 1 if the input meets theshold
- Outputs 0 if it does not meet the threshold
- Easy to compute
- Effective for binary classification problems
- Can be used in threshold – based decision making.
- Non-differentiable at x = 0 which makes gradient-based optimization algorithms difficult
- Saturated Output
- Not Smooth
- Never used in Image Classification
- Sigmoid Function
- The Sigmoid activation function is differentiable everywhere, allowing for gradient – based optimization. Maps output to a range between 0 and 1 which makes it suitable for binary classification tasks
- The Sigmoid activation function suffers from Vanishing Gradient Problem; for very high or low inputs, the gradient becomes very small, slowing down the learning process and can lead to a loss of information in deeper networks
- The output is not zero centred
- The function is computationally expensive
- Useful for binary classification when you need a probability output
- Rarely ever used in deep networks and hidden layers
- The sigmoid activation function is given by: σ(z) = 1 / 1+e-z
- The sigmoid's derivative, used to compute the gradients: σ'(z) = σ(z) * (1 – σ(z)), where a(z) output
- When z approaches ∞ , σ(z) approaches 1
- σ'(z) becomes very small; for σ(z) = 1 the derivative is: {σ'(z) = σ(z) * (1 − σ(z)) ⇒ 1 * (1– 1) = 0}
- z approaches -∞, σ(z) approaches 0, and hence σ'(z) becomes very small = 0
- vanishing gradient leads weights update and change minimally plus slows learning
- ReLU (Rectified Linear Unit)
- ReLU outputs the input directly if it's positive; otherwise, it outputs zero
- Efficient Computation
- Easy to compute
- No Vanishing Gradients for positive values.
- Outputs 1 since ReLU doesn't suffer from vanishing gradient
- Since negative inputs are set to zero, ReLU produces sparse activations, reducing overfitting and improving model efficiency
- Suffers from Dying ReLU problem, when inputs aren't negative ReLU outputs zero
- If this happens often, some neurons may never activate
- Dying ReLu makes those neurons useless during training
- Not zero centered
- Useful for hidden layers in deep networks, and is the go-to activation function for CNNs
- Leaky ReLU
- Unlike ReLU, Leaky ReLU allows a small, non-zero gradient for negative values, avoids ReLU" problem
- Allows small slope for negative inputs mitigating neurons dying
- Is still Not Zero Centred
- Hyper-parameter tuning of a (alpha) parameters needs set, which may lead to a suboptimal performance
- Should be used in Hidden layers if deep network or in CNN if Overfitting is an issue.
- Softmax for Multi-class Classification on the Output Lyaer
- It is commonly used in the output layer of a neural network for multi class classification problems
- Transforms raw scores to probabilities by exponentiating/normalizing
- Outputs a probability distribution between 0 and 1
- Where the sum of all the outputs is 1
- f(zi) = ezi / ∑ej [ Here: z → raw score or logit for the ith class, C number of classes]
- Step Functions
Limitations of Fully Connected Networks
- Fully Connected Networks have limitations
- How are features extracted? -Deep Neural Networks use vectors as an input and convert two dimensional arrays of data into vectors
- What extracted features we use and how do we use them? -Pixel values from data converted to csv files
- Limitations: -Loss of spatial information when flattened into vector format -Too many parameters
Fully Connected Limitations Solved
- The FCNs not designed for Image Recognition Task so cannot automatically learn features
- FCN treat each pixel as independent features, which require lots of parameters and aren't designed for local structures in images
- FCNs cannot extract textures, shapes, or edges Classical computer vision approaches like edge detection could be used
- Handcrafted features may not generalize well or require domain expertise
- Algorithm required which promotes parameter sharing, by retaining dimensionality and keeping most significant features
- Challenges of Processing Image data: Viewpoint variation, deformation, intra-class variation, occlusion, background clutter
- Solutions inspired by how the brain recognizes images which leverages edges, texture, shapes
- The solution is a Convolutional Neural Network (CNN)
Convolutional Neural Networks (CNN)
- The Convolutional Neural Network is inspired by the brain as it processes layers from basic to shapes
- Neocognitron, a hierarchical neural network for pattern recognition created in 1980 was inspired by findings and used multiple layers used to extract features
- It uses local receptive fields to detect simple patterns
- And weight sharing to recognize shapes
- Depended self organizing forward structure for learning without back propagation
- Werbos and backpropogation in 1974 helps train multilayer perceptrons
- Demonstration by Hinton on backpropogation can efficiently train deep neural networks is considered a seminal work in AI research
- Convolutional Neural Networks with Convolutional and Pooling later helps train deep neural networks
- LeNet demonstrated that CNNs were a critical step in image classification
- In 2012, AlexNet made large strides through CNNs to large-scale image classification, such as winning ImageNet competition
- AlexNet used ReLU activations dropout, and data augmentation to train a deep network
- Convolution is a special case of a fully connected neural network where CNNs aka convnets
- Similar to neural networks, made of neurons with learnable weights and biases
- The essential difference is that CNNs are designed with implicit assumptions that inputs are in the image format
- Lets architecture encode certain properties to extract features, preserve spatial info, promote weight sharing
- CNN achieve this via Convolution, Activation, Pooling, Fully Connected layers
CNN Components: Convolution
- The Convolution is a mathematical operation that combines two signals
- This is often done to apply a kernel/filter in Image processing for edge detection, etc
- Using a small filter/matrix which slides over the image and performing mathematical operations help Convolutional Neural Networks models detect patterns
- The convolution operation between an image I and a filter (kernel) K is defined as: (I * K)x,y m-1 n-1 = Σ Σ=0 I(x + Zi=0 y) • K(i, j)
- Where (x, y) → are the output feature map coordinate (convolved image)
- m, n → the height and width of the kernel/filter K
- Filters must have same color channels dimensions as inputs
- The size of the sum represents a weighted sum of the pixel values in the Image I at positions that corresponds to the filter K
- Stride refers to the number of pixels in the input matrix
- Padding are added to the image around the boundaries
- The Convolution is set up 1 × 1 + 1 × 0 + 1 × 1 + 0 × 0 + 1 × 1 + 1 × 0 + 0 × 1 + 0 × 0 + 1 × 1
- Helps extract edges such as horizontal and vertical edges
- Filters can be combined to extract more advanced features
Convolutional Dimensions and Benefits
- The two filter map which each have 3 channel
- When working with a color image, it should also use 3 channels
- After the convolution, there is an activation function, to choose goes to ReLU
- The output from convolution is the convolved feature map
- There are advantages for convolutions
- There are takeaways where Same filter was applied on different location to reduce parameters
- The input dimension of 6 × 6 × 3 can be reduced to 3 × 3 × 2 (but not with padding)
- A convolutional Layer is defined by hyperparameters, by applying feature detectors
- Given an input is W x H x C; in, the output volume dimensions are (Win-F+2P)/S +1, where the last term becomes the K-Filter
- Local connectivity, and Parameter sharing are other benefits
- Benefits to using a Deep Convolutional Network are that
- Some basic features are learned such as edges and darkspots
- Some advance shapes learned such as eyes, ears, and nose
- More complex ones and structure such as facial structure, allow layered representations
- One objective is to reduce the spatial dimension, which increases the density channel
- A pooling layer is typically used after convolution to reduce dimensions and retains essential information
CNN Components: Pooling Layer
-
- 7 - pooling layer
- 7 - one of the objectives of convolution operation, spatial dimension, height, times the width
- reduce the spatial dimension and retain essential information
-
- Types of pooling are:
- 7 max.
- Average, takes average
- The Pooling Layer does not have learnable parameters so no weight updates are required
- A model of 6 × 4. And the final dimension output is two and 3 7 for this demo
- It is achieved via window slide operation
- Hyperparameters in input pooling layers; size of filters typically 2 x 2
- The training of the CNN occurs through forward pass with propagation. With error; using gradient descent
Using CNN to Train Image Classification
- Learn features in input through convolution. The feature learning happens via convolusional operation
- Introduce non-linearity through activation.
- Reduce dimensions and preserve spatial variance with the output.
- Fully Connected Layer does classification of input by looking at the image features and it is called “end to end model
- Final step is backward pass which has two main steps
- Step 1. gradient of filter, to determine how much filter to loss the computer and does the convolution and has 2 Computing the gradient, but with the and that so it is also achieved window sliding.
Backward Pass for CNN
- Backpropogation @ Convolutional Layer
- Fully Connected (FC) Layer receives the flattened feature vector; needs reshaped
- Follows chain rule:
- Hidden: δ' = (W¹+18!+¹) Og'(z') are elemtnwise of output functions that are derived.
- W¹+¹8¹+1 → propagates error from net lyr
- Layer Output applies Softmax and has Cross Entropy:
- δ' = ƏL/Əz = ŷ-y (error term elementwise) The backward pass of convolution layer; perform two main steps: (1)Computing the gradient w.r.t. the filter F that tells how much each filter contributes to the loss) +convolution input X and the gradient (2)computing the grandly w.r.t. the input x by calculating the convolution The final gradient computing formula:
- ∂E/∂x = δ⊙F Looks like both forward and Compute Convolution Operation.
- Hidden: δ' = (W¹+18!+¹) Og'(z') are elemtnwise of output functions that are derived.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.