COSC 202: Gradient Descent

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

What is the primary goal of using gradient descent in the context of training a neural network?

To increase the magnitude of the gradient of the error function.
To find the best set of weights and biases that minimize a loss function. (correct)
To maximize the accuracy of predictions by adjusting the learning rate.
To simplify the network architecture by reducing the number of layers.

In the context of gradient descent, what does the term 'gradient' refer to?

The rate at which the learning rate should be adjusted.
The magnitude of the error between predicted and actual values.
The direction in which the function's output increases most rapidly.
The slope of the cost function with respect to the weights and biases. (correct)

Why is it important to adjust weights and biases in the opposite direction of the gradient in gradient descent?

To ensure the cost function increases with each step.
To prevent the model from converging too quickly.
To maintain the stability of the learning rate.
To decrease the cost function and move towards the minimum. (correct)

What is a common cost function used in regression problems that gradient descent aims to minimize?

Mean squared error (B) Signup and view all the answers

In gradient descent, if the derivative for a weight is positive, which direction should the algorithm move?

Move a step to the left. (C) Signup and view all the answers

What is the role of the 'learning rate' in gradient descent?

It controls the size of the steps taken during each iteration. (D) Signup and view all the answers

What might happen if the learning rate is set too high during gradient descent?

The algorithm may overshoot the minimum and diverge. (D) Signup and view all the answers

What is the effect of a learning rate that is set too small?

It will slow down the convergence and require many iterations. (D) Signup and view all the answers

What are 'epochs' in the context of gradient descent?

The number of complete passes through the entire training dataset. (B) Signup and view all the answers

What is the primary characteristic of Batch Gradient Descent?

It updates model parameters after processing the entire training set. (A) Signup and view all the answers

What scenario poses a significant drawback for using Batch Gradient Descent?

Training with large datasets. (C) Signup and view all the answers

How does Stochastic Gradient Descent (SGD) differ from Batch Gradient Descent?

SGD uses a single random instance for each update, while Batch Gradient Descent uses the entire dataset. (A) Signup and view all the answers

What is a key advantage of Stochastic Gradient Descent (SGD) compared to Batch Gradient Descent when dealing with large datasets?

SGD requires less data to manipulate at every iteration, making it faster. (B) Signup and view all the answers

Which of the following is a characteristic of Stochastic Gradient Descent due to its random nature?

The cost function will bounce around, decreasing only on average. (D) Signup and view all the answers

What is a potential benefit of Stochastic Gradient Descent (SGD) compared to Batch Gradient Descent, in terms of finding the optimal solution?

SGD is less likely to get stuck in local minima. (C) Signup and view all the answers

How does Mini-batch Gradient Descent work?

It computes the gradient on small random sets of instances. (B) Signup and view all the answers

What is the role of 'batch_size' in Mini-batch Gradient Descent?

It is a hyperparameter that determines the size of the small random sets of instances. (A) Signup and view all the answers

In what way is the progress of Mini-batch Gradient Descent in parameter space?

Less erratic than Stochastic Gradient Descent. (C) Signup and view all the answers

What is a potential disadvantage of Mini-batch Gradient Descent compared to Stochastic Gradient Descent?

It is more difficult to escape from local minima. (B) Signup and view all the answers

What preprocessing step is important to consider when using Gradient Descent?

Ensuring all features a have similar scale (A) Signup and view all the answers

Which gradient descent method updates parameters based on a single observation at a time?

Stochastic Gradient Descent (C) Signup and view all the answers

Which gradient descent method is computationally most expensive per iteration?

Batch Gradient Descent (D) Signup and view all the answers

What is the primary trade-off between Batch Gradient Descent and Stochastic Gradient Descent?

Smoothness of convergence vs. speed of convergence (B) Signup and view all the answers

Which of the following gradient descent algorithms would likely be the most suitable if you have a very large dataset that doesn't fit into memory??

Mini-batch Gradient Descent (B) Signup and view all the answers

Which factor has the greatest effect on the 'smoothness' of the cost function reductions during gradient descent?

The batch size (D) Signup and view all the answers

Flashcards

Gradient Descent

An optimization algorithm that finds the best weights and biases for a neural network to make accurate predictions.

Gradient

The direction in which the function increases most quickly; the steepness of a slope.

Gradient Descent Function

Adjusts weights and biases by calculating the gradients (slopes) of the cost function with respect to each weight and bias.

Learning Rate

A value that determines the size of the steps taken during gradient descent to reach a minimum of a cost function.