Podcast
Questions and Answers
Which aspect of neural networks is inspired by the human brain?
Which aspect of neural networks is inspired by the human brain?
- The ability to store large amounts of data.
- The method of processing data. (correct)
- The use of transistors for computation.
- The physical structure resembling a brain.
How are pixels utilized in the initial steps of a neural network when processing an image?
How are pixels utilized in the initial steps of a neural network when processing an image?
- They are grouped into larger segments for analysis.
- They are compressed to reduce computational load.
- They are analyzed for color balance.
- Each pixel is fed as an individual input to a neuron. (correct)
What role do 'channels' play in the operation of a neural network?
What role do 'channels' play in the operation of a neural network?
- They connect each neuron to adjacent neurons. (correct)
- They regulate the speed of data processing.
- They filter input data to remove noise.
- They define the architecture of the network layers.
What is the primary function of the 'activation function' in a neural network?
What is the primary function of the 'activation function' in a neural network?
In neural networks, what is the purpose of 'Forward Propagation'?
In neural networks, what is the purpose of 'Forward Propagation'?
What is the role of 'Back Propagation' in the training of a neural network?
What is the role of 'Back Propagation' in the training of a neural network?
What is a key characteristic of hidden layers in a neural network?
What is a key characteristic of hidden layers in a neural network?
What is the correct sequence of steps in a neural network's operation?
What is the correct sequence of steps in a neural network's operation?
Which of the following is an example of a neural network application?
Which of the following is an example of a neural network application?
What is the primary role of an activation function?
What is the primary role of an activation function?
Why is it important for activation functions to introduce non-linearity?
Why is it important for activation functions to introduce non-linearity?
What is the purpose of activation functions?
What is the purpose of activation functions?
What benefit does introducing an activation function provide to a model?
What benefit does introducing an activation function provide to a model?
Which of the following activation functions outputs a probability between 0 and 1?
Which of the following activation functions outputs a probability between 0 and 1?
Which activation function is commonly used in hidden layers and is particularly effective for binary classification problems?
Which activation function is commonly used in hidden layers and is particularly effective for binary classification problems?
What is a key characteristic of the TanH activation function?
What is a key characteristic of the TanH activation function?
In what range does the TanH activation function output its values?
In what range does the TanH activation function output its values?
Which activation function is known for improving the learning of neural networks due to its simplicity?
Which activation function is known for improving the learning of neural networks due to its simplicity?
What issue does the Leaky ReLU activation function address?
What issue does the Leaky ReLU activation function address?
What specific problem is the Leaky ReLU activation function designed to address?
What specific problem is the Leaky ReLU activation function designed to address?
What is the primary purpose of the Softmax function?
What is the primary purpose of the Softmax function?
For what type of problem is the Softmax function best suited?
For what type of problem is the Softmax function best suited?
What type of task is the Softmax function primarily utilized for?
What type of task is the Softmax function primarily utilized for?
When training deep neural networks, what is identified as a tricky problem that makes lower level layers very hard to train?
When training deep neural networks, what is identified as a tricky problem that makes lower level layers very hard to train?
What is one of the potential risks when training a model with millions of parameters?
What is one of the potential risks when training a model with millions of parameters?
What is the vanishing gradient problem in neural networks?
What is the vanishing gradient problem in neural networks?
What causes the 'vanishing gradients' problem in deep neural networks?
What causes the 'vanishing gradients' problem in deep neural networks?
What is 'fan-in' and 'fan-out'?
What is 'fan-in' and 'fan-out'?
In the context of neural networks, what does 'He Initialization' aim to ensure?
In the context of neural networks, what does 'He Initialization' aim to ensure?
What is ReLU known not to do?
What is ReLU known not to do?
What is the range of the slope/gradient for the Leaky ReLU, for z < 0?
What is the range of the slope/gradient for the Leaky ReLU, for z < 0?
What is a key advantage of Leaky ReLU activation functions?
What is a key advantage of Leaky ReLU activation functions?
What version of leaky RELU can be modified during backpropagation?
What version of leaky RELU can be modified during backpropagation?
Which activation function has a risk of overfitting the training set on smaller training datasets?
Which activation function has a risk of overfitting the training set on smaller training datasets?
Which initialization method should be used when applying the logistic activation function?
Which initialization method should be used when applying the logistic activation function?
What is Xavier initialization after?
What is Xavier initialization after?
Which problem was the vanishing/ exploding gradients problems in part due to?
Which problem was the vanishing/ exploding gradients problems in part due to?
What should a function never do to be better in deep neural networks?
What should a function never do to be better in deep neural networks?
Which statement best describes the vanishing gradients algorithm?
Which statement best describes the vanishing gradients algorithm?
What is a result of the 'vanishing gradients' problem?
What is a result of the 'vanishing gradients' problem?
Flashcards
What is a neural network?
What is a neural network?
A method in artificial intelligence that teaches computers to process data inspired by the human brain.
How image is inputed?
How image is inputed?
Splitting an image into pixels, feeding each pixel as an input to each neuron.
What are channels?
What are channels?
Connections between adjacent neurons in a neural network.
Forward propagation
Forward propagation
Signup and view all the flashcards
Activation function
Activation function
Signup and view all the flashcards
Hidden layers
Hidden layers
Signup and view all the flashcards
Back propagation
Back propagation
Signup and view all the flashcards
What activation function does?
What activation function does?
Signup and view all the flashcards
Step function
Step function
Signup and view all the flashcards
Sigmoid function
Sigmoid function
Signup and view all the flashcards
TanH function
TanH function
Signup and view all the flashcards
ReLU function
ReLU function
Signup and view all the flashcards
Leaky ReLU function
Leaky ReLU function
Signup and view all the flashcards
Softmax Function
Softmax Function
Signup and view all the flashcards
Vanishing gradients problem
Vanishing gradients problem
Signup and view all the flashcards
Exploding gradients problem
Exploding gradients problem
Signup and view all the flashcards
Study Notes
Supervised Learning: Classification
- Introduction to supervised learning classification, specifically for CS 770 Machine Learning
Neural Networks
- Neural networks are a method in artificial intelligence.
- They teach computers to process data inspired by the way the human brain functions.
- Neural Networks can be classified into different types based on requirements and inputs.
- Neural Networks can be further sub-classified as well.
Neural Network Types:
- Feed Forward Neural Nets
- Multiple Layered Perceptron Neural Nets
- Convolution Neural Nets
- Radial Basis Function Neural Nets
- Recurrent Neural Nets
- Sequence to Sequence models
- Modular Neural Network
Neural Network Steps
- Consider an image as input.
- The image is split into pixels, and each pixel is fed as an input to each neuron.
- Each neuron connects to an adjacent neuron via webs, known as channels.
- The weights are multiplied with inputs and added to bias.
- The obtained value passes through a threshold function called the activation function.
- The result of the activation function determines if the neuron activates.
- Activated neurons transmit data to the next layer of neurons over channels.
- The process of transmitting data is known as Forward Propagation.
- In the output layer, neurons with the highest values fire and determine the output.
- If the result is wrong, implement back propagation
Neural Network Components:
- Neurons
- Input
- Output
- Pixels result in 28 x 28 = 784 Pixels
- Hidden Layers
- Channels
- Bias
- Activation Function
The process of Activated Neurons:
- Implemented through forward propagation.
The process of Output Prediction:
- Occurs before Back Propagation.
Model Training:
- Using back propagation.
Neural Network Examples:
- Face recognition.
- Fingerprint recognition.
- Music composition.
- Speech recognition.
Neural Network Applications:
- Growth in the field is foreseen by big names like Amazon, Google, and Nvidia.
- Utilizing libraries.
- Predictive models.
- Intuitive GPUs.
Activation Functions
- Applies a non-linear transformation function
- Decides whether a neuron should be activated.
- Without activation function, the model is just a linear regression model
- A linear regression model is not able to learn complex patterns
- Introduce activation function to learn and understand complex patterns and works more efficiently
Types of Activation Function:
- Step function
- Sigmoid function
- TanH function
- ReLU function
- Leaky ReLU function
- Softmax function
Step Function:
- If the input is greater than the threshold, the neuron activates.
- It's formula is f(x) = 1 if x ≥ 0, otherwise 0.
- This function is very common, too simple and not in use
Sigmoid Function:
- Outputs a probability between 0 and 1.
- If the input is a negative number, the sigmoid outputs a number close to 0.
- If the input is a positive number, the sigmoid outputs a number close to 1.
- It is used in hidden layers and mostly in the last layer for binary classification problems.
- The formula is f(x) = 1/(1+e⁻x)
TanH Function:
- Common choice for hidden layers.
- Outputs between -1 and 1.
- Formula: σ(z) = (eᶻ-e⁻ᶻ)/(eᶻ+e⁻ᶻ)
ReLU Function:
- Most common choice in the hidden layer.
- Looks simple, but improves the learning of neural networks.
- y=x
Leaky ReLU Function:
- Used we encounter the dying ReLU problem.
- A neuron can reach a dead state where no more updates take place in terms of weight.
- This function works the same as ReLU in case of positive numbers.
- In the case of negative numbers, a scaling factor is used.
Softmax Function:
- Squashes input numbers to output numbers so that a probability can be obtained.
- Used in the last layer for multiclass classification problems.
- Formula: S(yi) = e^yi / Σe^yi
Typical Regression MLP Architecture
- Input neurons: One per input feature
- Hidden layers: Typically 1 to 5 depending on the problem
- Neurons per hidden layer: Typically 10 to 100 depending on the problem
- Output neurons: 1 per prediction dimension
- Hidden activation: ReLU (or SELU)
- Output activation: None or ReLU/Softplus (if positive outputs) or Logistic/Tanh (if bounded outputs)
- Loss function: MSE or MAE/Huber (if outliers)
Training Neural Networks
- Artificial neural networks are trained through deep neural networks.
- Training a much deeper DNN, with layers or more, each containing hundreds of neurons, connected by hundreds of thousands of connections.
- The backpropagation algorithm propagates the error gradient by going from the output layer to the input layer
- Training might be extremely slow due to the vanishing gradients problem.
- Deep neural networks are affected and makes the lower layers very hard to train.
Training Data:
- You might not have enough training data for such a large network, or it might be too costly to label.
- A model with millions of parameters would severely risk overfitting the training set, especially if not enough training instances or are too noisy.
Vanishing/Exploding Gradients Problems
- The backpropagation algorithm propagates the error by going from the output layer to the input layer.
Gradient Descent:
- Once the algorithm has computed the gradient of the cost function with regards to each parameter in the network.
- Uses these gradients to update each parameter with a Gradient Descent step.
- Gradients often get smaller and smaller as the algorithm progresses down to the lower layers.
- The Gradient Descent update leaves the lower layer connection weights virtually unchanged, and training never converges to a good solution, known as The Vanishing Gradients Problem
Large Weight Updates can lead to diverging Algorithms
- Gradients can grow bigger and bigger, so many layers get insanely large weight updates, and the algorithm diverges.
- Mostly encountered in recurrent neural networks, or the exploding gradients problem
- When inputs become large (negative or positive), the function saturates at 0 or 1, with a derivative extremely close to 0
The Function must Propagate
- When backpropagation kicks in, it has virtually no gradient to propagate back through the network.
- What little gradient exists keeps getting diluted as backpropagation progresses down through the top layers, so there is really nothing left for the lower layers.
He Initialization:
- This means the signal flows properly in both directions: in the forward direction when making predictions.
- Signal variance of the outputs of each layer should be equal to the variance of its inputs.
- Need the gradients to have equal variance before and after flowing through a layer in the reverse direction. Ensuring layers have an equal number of inputs and number of neurons.
- The connection weights of each layer must be initialized randomly.
- The initialization strategy is called Xavier initialization / Glorot initialization.
ReLU Activation Function:
- A solution to Vanishing / Exploding Gradients Problems.
- Other activation functions behave much better in deep neural networks.
- ReLU activation function does not saturate for positive values.
- A variant of the ReLU function include a Leaky ReLU, defined as LeakyReLUa(z) = max(az, z).
- The hyperparameter a defines how much the function "leaks”.
- Set to 0.01
- A small slope ensures that leaky ReLUs never die and have a chance to eventually wake up.
Parametric leaky ReLU (PReLU)
- Where a is authorized to be learned during training a parameter that can be modified by (instead of being a hyperparameter, it becomes backpropagation like any other parameter)
ReLU v PReLU
-
PReLU was reported to strongly outperform ReLU on large image datasets, but on smaller datasets, but runs the risk of overfitting the training set.
-
Formula : ELUα (z) = { α( exp (z) - 1) if z < 0 , z if z ≥ 0
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.