CS 770: Supervised Learning Classification

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which aspect of neural networks is inspired by the human brain?

  • The ability to store large amounts of data.
  • The method of processing data. (correct)
  • The use of transistors for computation.
  • The physical structure resembling a brain.

How are pixels utilized in the initial steps of a neural network when processing an image?

  • They are grouped into larger segments for analysis.
  • They are compressed to reduce computational load.
  • They are analyzed for color balance.
  • Each pixel is fed as an individual input to a neuron. (correct)

What role do 'channels' play in the operation of a neural network?

  • They connect each neuron to adjacent neurons. (correct)
  • They regulate the speed of data processing.
  • They filter input data to remove noise.
  • They define the architecture of the network layers.

What is the primary function of the 'activation function' in a neural network?

<p>To determine if a neuron should be activated. (D)</p>
Signup and view all the answers

In neural networks, what is the purpose of 'Forward Propagation'?

<p>To transmit data from activated neurons to the next layer. (A)</p>
Signup and view all the answers

What is the role of 'Back Propagation' in the training of a neural network?

<p>To correct errors by adjusting weights in the network. (D)</p>
Signup and view all the answers

What is a key characteristic of hidden layers in a neural network?

<p>They process data between the input and output layers. (A)</p>
Signup and view all the answers

What is the correct sequence of steps in a neural network's operation?

<p>Pixels -&gt; Channels -&gt; Hidden Layers -&gt; Activation -&gt; Forward Propagation -&gt; Output (C)</p>
Signup and view all the answers

Which of the following is an example of a neural network application?

<p>Facial recognition software. (A)</p>
Signup and view all the answers

What is the primary role of an activation function?

<p>To introduce non-linearity into the neural network. (B)</p>
Signup and view all the answers

Why is it important for activation functions to introduce non-linearity?

<p>To enable the neural network to learn complex patterns. (C)</p>
Signup and view all the answers

What is the purpose of activation functions?

<p>Apply a non-linear transformation and decide whether a neuron should be activated. (C)</p>
Signup and view all the answers

What benefit does introducing an activation function provide to a model?

<p>It enables the model to learn and understand complex patterns more efficiently. (C)</p>
Signup and view all the answers

Which of the following activation functions outputs a probability between 0 and 1?

<p>Sigmoid (B)</p>
Signup and view all the answers

Which activation function is commonly used in hidden layers and is particularly effective for binary classification problems?

<p>Sigmoid (B)</p>
Signup and view all the answers

What is a key characteristic of the TanH activation function?

<p>It outputs values between -1 and 1. (B)</p>
Signup and view all the answers

In what range does the TanH activation function output its values?

<p>-1 to 1 (B)</p>
Signup and view all the answers

Which activation function is known for improving the learning of neural networks due to its simplicity?

<p>ReLU (B)</p>
Signup and view all the answers

What issue does the Leaky ReLU activation function address?

<p>The dying ReLU problem. (C)</p>
Signup and view all the answers

What specific problem is the Leaky ReLU activation function designed to address?

<p>Neurons becoming inactive during training. (A)</p>
Signup and view all the answers

What is the primary purpose of the Softmax function?

<p>To normalize the output into a probability distribution. (A)</p>
Signup and view all the answers

For what type of problem is the Softmax function best suited?

<p>Multiclass classification. (A)</p>
Signup and view all the answers

What type of task is the Softmax function primarily utilized for?

<p>Handling multi-class classification scenarios. (A)</p>
Signup and view all the answers

When training deep neural networks, what is identified as a tricky problem that makes lower level layers very hard to train?

<p>It is the exploding gradients problem. (C)</p>
Signup and view all the answers

What is one of the potential risks when training a model with millions of parameters?

<p>It can severely risk overfitting the training set. (D)</p>
Signup and view all the answers

What is the vanishing gradient problem in neural networks?

<p>The early layers of the network can no longer be trained. (C)</p>
Signup and view all the answers

What causes the 'vanishing gradients' problem in deep neural networks?

<p>The algorithm progresses down to the lower layers. (D)</p>
Signup and view all the answers

What is 'fan-in' and 'fan-out'?

<p>The number of inputs and the number of neurons in a layer respectively. (C)</p>
Signup and view all the answers

In the context of neural networks, what does 'He Initialization' aim to ensure?

<p>Variance of inputs and outputs. (C)</p>
Signup and view all the answers

What is ReLU known not to do?

<p>Saturate for positive values. (D)</p>
Signup and view all the answers

What is the range of the slope/gradient for the Leaky ReLU, for z < 0?

<p>Typically set to roughly 0.01. (A)</p>
Signup and view all the answers

What is a key advantage of Leaky ReLU activation functions?

<p>Guaranteed never to die. (B)</p>
Signup and view all the answers

What version of leaky RELU can be modified during backpropagation?

<p>parametric leaky ReLU (PReLU). (A)</p>
Signup and view all the answers

Which activation function has a risk of overfitting the training set on smaller training datasets?

<p>ReLU (PReLU). (C)</p>
Signup and view all the answers

Which initialization method should be used when applying the logistic activation function?

<p>Glorot initialization. (C)</p>
Signup and view all the answers

What is Xavier initialization after?

<p>The author's first name. (A)</p>
Signup and view all the answers

Which problem was the vanishing/ exploding gradients problems in part due to?

<p>Poor choice of activation function. (D)</p>
Signup and view all the answers

What should a function never do to be better in deep neural networks?

<p>Saturate for positive values. (D)</p>
Signup and view all the answers

Which statement best describes the vanishing gradients algorithm?

<p>Gradients will often get smaller and smaller as the algorithm progresses down to the lower layers. (A)</p>
Signup and view all the answers

What is a result of the 'vanishing gradients' problem?

<p>Gradient Descent update leaves the lower layer connection weights virtually unchanged. (A)</p>
Signup and view all the answers

Flashcards

What is a neural network?

A method in artificial intelligence that teaches computers to process data inspired by the human brain.

How image is inputed?

Splitting an image into pixels, feeding each pixel as an input to each neuron.

What are channels?

Connections between adjacent neurons in a neural network.

Forward propagation

The process where activated neurons transmit data through channels to the next layer.

Signup and view all the flashcards

Activation function

A function that decides if a neuron should be activated or not.

Signup and view all the flashcards

Hidden layers

Neurons that lie between the input and output layers in a neural network.

Signup and view all the flashcards

Back propagation

Adjusting the weights in a neural network to reduce the error.

Signup and view all the flashcards

What activation function does?

A function that applies a non-linear transformation and decides whether a neuron should be activated or not

Signup and view all the flashcards

Step function

If the input is greater than the threshold, the neuron is activated

Signup and view all the flashcards

Sigmoid function

Outputs a probability between 0 and 1, used in hidden layers, mostly in the last layer for binary classification problems

Signup and view all the flashcards

TanH function

Common choice for hidden layers, outputs between -1 and 1

Signup and view all the flashcards

ReLU function

Most common choice in the hidden layer, looks simple but improves learning of neural networks

Signup and view all the flashcards

Leaky ReLU function

A variant of ReLU that addresses the 'dying ReLU' problem by having a small slope for negative inputs.

Signup and view all the flashcards

Softmax Function

Squashes the input number to an output number between 0 and 1, produces probability distribution. Used in the last layer for multi class classification problems

Signup and view all the flashcards

Vanishing gradients problem

The problem where gradients shrink exponentially as they propagate backward.

Signup and view all the flashcards

Exploding gradients problem

The problem where gradients grow exponentially as they propagate backward.

Signup and view all the flashcards

Study Notes

Supervised Learning: Classification

  • Introduction to supervised learning classification, specifically for CS 770 Machine Learning

Neural Networks

  • Neural networks are a method in artificial intelligence.
  • They teach computers to process data inspired by the way the human brain functions.
  • Neural Networks can be classified into different types based on requirements and inputs.
  • Neural Networks can be further sub-classified as well.

Neural Network Types:

  • Feed Forward Neural Nets
  • Multiple Layered Perceptron Neural Nets
  • Convolution Neural Nets
  • Radial Basis Function Neural Nets
  • Recurrent Neural Nets
  • Sequence to Sequence models
  • Modular Neural Network

Neural Network Steps

  • Consider an image as input.
  • The image is split into pixels, and each pixel is fed as an input to each neuron.
  • Each neuron connects to an adjacent neuron via webs, known as channels.
  • The weights are multiplied with inputs and added to bias.
  • The obtained value passes through a threshold function called the activation function.
  • The result of the activation function determines if the neuron activates.
  • Activated neurons transmit data to the next layer of neurons over channels.
  • The process of transmitting data is known as Forward Propagation.
  • In the output layer, neurons with the highest values fire and determine the output.
  • If the result is wrong, implement back propagation

Neural Network Components:

  • Neurons
  • Input
  • Output
  • Pixels result in 28 x 28 = 784 Pixels
  • Hidden Layers
  • Channels
  • Bias
  • Activation Function

The process of Activated Neurons:

  • Implemented through forward propagation.

The process of Output Prediction:

  • Occurs before Back Propagation.

Model Training:

  • Using back propagation.

Neural Network Examples:

  • Face recognition.
  • Fingerprint recognition.
  • Music composition.
  • Speech recognition.

Neural Network Applications:

  • Growth in the field is foreseen by big names like Amazon, Google, and Nvidia.
  • Utilizing libraries.
  • Predictive models.
  • Intuitive GPUs.

Activation Functions

  • Applies a non-linear transformation function
  • Decides whether a neuron should be activated.
  • Without activation function, the model is just a linear regression model
  • A linear regression model is not able to learn complex patterns
  • Introduce activation function to learn and understand complex patterns and works more efficiently

Types of Activation Function:

  • Step function
  • Sigmoid function
  • TanH function
  • ReLU function
  • Leaky ReLU function
  • Softmax function

Step Function:

  • If the input is greater than the threshold, the neuron activates.
  • It's formula is f(x) = 1 if x ≥ 0, otherwise 0.
  • This function is very common, too simple and not in use

Sigmoid Function:

  • Outputs a probability between 0 and 1.
  • If the input is a negative number, the sigmoid outputs a number close to 0.
  • If the input is a positive number, the sigmoid outputs a number close to 1.
  • It is used in hidden layers and mostly in the last layer for binary classification problems.
  • The formula is f(x) = 1/(1+e⁻x)

TanH Function:

  • Common choice for hidden layers.
  • Outputs between -1 and 1.
  • Formula: σ(z) = (eᶻ-e⁻ᶻ)/(eᶻ+e⁻ᶻ)

ReLU Function:

  • Most common choice in the hidden layer.
  • Looks simple, but improves the learning of neural networks.
  • y=x

Leaky ReLU Function:

  • Used we encounter the dying ReLU problem.
  • A neuron can reach a dead state where no more updates take place in terms of weight.
  • This function works the same as ReLU in case of positive numbers.
  • In the case of negative numbers, a scaling factor is used.

Softmax Function:

  • Squashes input numbers to output numbers so that a probability can be obtained.
  • Used in the last layer for multiclass classification problems.
  • Formula: S(yi) = e^yi / Σe^yi

Typical Regression MLP Architecture

  • Input neurons: One per input feature
  • Hidden layers: Typically 1 to 5 depending on the problem
  • Neurons per hidden layer: Typically 10 to 100 depending on the problem
  • Output neurons: 1 per prediction dimension
  • Hidden activation: ReLU (or SELU)
  • Output activation: None or ReLU/Softplus (if positive outputs) or Logistic/Tanh (if bounded outputs)
  • Loss function: MSE or MAE/Huber (if outliers)

Training Neural Networks

  • Artificial neural networks are trained through deep neural networks.
  • Training a much deeper DNN, with layers or more, each containing hundreds of neurons, connected by hundreds of thousands of connections.
  • The backpropagation algorithm propagates the error gradient by going from the output layer to the input layer
  • Training might be extremely slow due to the vanishing gradients problem.
    • Deep neural networks are affected and makes the lower layers very hard to train.

Training Data:

  • You might not have enough training data for such a large network, or it might be too costly to label.
  • A model with millions of parameters would severely risk overfitting the training set, especially if not enough training instances or are too noisy.

Vanishing/Exploding Gradients Problems

  • The backpropagation algorithm propagates the error by going from the output layer to the input layer.

Gradient Descent:

  • Once the algorithm has computed the gradient of the cost function with regards to each parameter in the network.
  • Uses these gradients to update each parameter with a Gradient Descent step.
  • Gradients often get smaller and smaller as the algorithm progresses down to the lower layers.
  • The Gradient Descent update leaves the lower layer connection weights virtually unchanged, and training never converges to a good solution, known as The Vanishing Gradients Problem

Large Weight Updates can lead to diverging Algorithms

  • Gradients can grow bigger and bigger, so many layers get insanely large weight updates, and the algorithm diverges.
  • Mostly encountered in recurrent neural networks, or the exploding gradients problem
  • When inputs become large (negative or positive), the function saturates at 0 or 1, with a derivative extremely close to 0

The Function must Propagate

  • When backpropagation kicks in, it has virtually no gradient to propagate back through the network.
  • What little gradient exists keeps getting diluted as backpropagation progresses down through the top layers, so there is really nothing left for the lower layers.

He Initialization:

  • This means the signal flows properly in both directions: in the forward direction when making predictions.
  • Signal variance of the outputs of each layer should be equal to the variance of its inputs.
  • Need the gradients to have equal variance before and after flowing through a layer in the reverse direction. Ensuring layers have an equal number of inputs and number of neurons.
  • The connection weights of each layer must be initialized randomly.
  • The initialization strategy is called Xavier initialization / Glorot initialization.

ReLU Activation Function:

  • A solution to Vanishing / Exploding Gradients Problems.
  • Other activation functions behave much better in deep neural networks.
  • ReLU activation function does not saturate for positive values.
  • A variant of the ReLU function include a Leaky ReLU, defined as LeakyReLUa(z) = max(az, z).
    • The hyperparameter a defines how much the function "leaks”.
    • Set to 0.01
  • A small slope ensures that leaky ReLUs never die and have a chance to eventually wake up.

Parametric leaky ReLU (PReLU)

  • Where a is authorized to be learned during training a parameter that can be modified by (instead of being a hyperparameter, it becomes backpropagation like any other parameter)

ReLU v PReLU

  • PReLU was reported to strongly outperform ReLU on large image datasets, but on smaller datasets, but runs the risk of overfitting the training set.

  • Formula : ELUα (z) = { α( exp (z) - 1) if z < 0 , z if z ≥ 0

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Use Quizgecko on...
Browser
Browser