32 Questions
What is a key benefit of using the ReLU activation function in deep neural networks?
It helps prevent vanishing gradient problems
Which activation function has been the most popular for deep neural networks since 2012?
ReLU
What is a common issue with gradient descent when the learning rate is very slow?
It settles at local minima too easily
What makes a fast learning rate problematic in gradient descent?
It bounces off the optimization path
Why is weight initialization important in deep neural networks?
It affects how quickly the network learns
What problem does ReLU help address in deep neural networks?
Vanishing gradient problem
What two hyperparameters are used in the Adam (adaptive moment) update?
Momentum and squared momentum
In the Adam update equation, what does 𝑤𝑡+1 represent?
The updated weights
What is the primary purpose of implementing beta corrections to 𝑣𝑡+1 and 𝑟𝑡+1 in the Adam update equation?
To stabilize the algorithm convergence
What role does the hyperparameter squared momentum (beta 2) play in the Adam update?
It controls the decay rate of historical gradients
Which component in the Adam update equation is responsible for incorporating past gradients into the optimization process?
Momentum (beta 1)
What distinguishes the Adam update from RMSProp in terms of optimization performance?
Adam incorporates past gradients unlike RMSProp
What is a key issue with weight initialization where all weights are set to zero?
All updates will be the same because all outputs will be the same
Which regularization technique involves introducing noise into the training data to prevent overfitting?
Dropout
What is the purpose of Xavier/Glorot weight initialization in deep neural networks?
To avoid exploding/vanishing gradients
What does L1 regularization encourage in neural networks?
Feature selection by driving some weights to exactly zero
How does batch normalization help with training deep neural networks?
By normalizing the input to each layer of the network
What is the main risk associated with using too many 𝛽𝑖 terms in a regression model?
Overfitting the data by capturing noise rather than signal
What does momentum in gradient descent indicate?
How much importance is given to past values
What is the purpose of implementing RMS Prop in SGD with Momentum?
To decrease the update if the average update of a weight is high
How does RMS Prop differ from taking just the gradient for updating weights?
It considers EMA of gradient square instead of just gradient
In gradient descent with momentum, what does the term 'rho' represent?
How many past values are taken into account for averaging
Why is it important to avoid zig-zag movements in gradient descent?
To improve convergence speed
What is the role of momentum in updating weights in gradient descent?
To take the average of past few gradients
What is the purpose of Regularization in Standard Least Squares Regression?
To minimize the sum of squared errors
In the context of Regularization, what does penalizing large coefficients help prevent?
Overfitting
What is the key difference between Ridge and Lasso regularization functions?
Ridge has infinite lambda for no model, Lasso has high lambda for simplest models
What happens to the model complexity as lambda value increases in Regularization?
Model complexity decreases
Which term refers to the vector of all training responses in the context of Regularization?
$y$
What does N represent in the equation for Regularization?
of training samples
What is the formula used to minimize the sum of squared errors with regularization included?
$ ext{min} igg( ext{sum}(y - Xeta) + ext{lambda} imes eta_jigg)$
How does high lambda value affect the complexity of models in Regularization?
Decreases model complexity
Explore the differences between RMSProp and Adam updates in neural networks, including their implementation and performance in tackling saddle points. Learn about the hyperparameters involved in Adam update for improved optimization.
Make Your Own Quizzes and Flashcards
Convert your notes into interactive study material.
Get started for free