Deep Learning 1 with TensorFlow

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

According to Mitchell's definition, what is essential for a computer program to be considered as 'learning'?

The capacity to store and retrieve large amounts of data.
Improvement in performance at a task T, as measured by P, with experience E. (correct)
The ability to solve complex mathematical problems.
The capability to mimic human behavior.

What distinguishes deep learning from traditional machine learning?

Deep learning is primarily used for data storage and retrieval.
Deep learning uses simpler algorithms that are easier to train.
Deep learning extracts patterns from data using neural networks with multiple layers. (correct)
Deep learning relies solely on explicitly programmed rules.

How do deep neural networks handle complex tasks compared to shallow networks, assuming they express the same function?

Deeper networks and shallow networks require the same amount of neurons.
Deeper networks do not have the capacity to handle complex tasks.
Deeper networks require exponentially more neurons than shallow networks.
Deeper networks typically require exponentially fewer neurons than shallow networks. (correct)

Which of the following could negatively affect the ability for a training algorithm to learn a function, even if a large Multi-Layer Perceptron (MLP) is capable of representing that function?

The optimization algorithm struggling to find suitable parameter values. (D) Signup and view all the answers

What is a key characteristic of a feed-forward neural network architecture?

It has no feedback connections; information flows in one direction. (C) Signup and view all the answers

What does the Universal Approximation Theorem for neural networks mainly imply?

A feedforward neural network with a single layer can approximate any continuous function to arbitrary precision. (B) Signup and view all the answers

Which factor primarily contributes to the current effectiveness of deep learning?

The availability of larger datasets, hardware improvements, and software tools. (A) Signup and view all the answers

What is the role of Keras in the context of TensorFlow?

Keras serves as a streamlined API to simplify building deep learning systems on top of TensorFlow. (B) Signup and view all the answers

What is the purpose of an activation function in a neural network?

To introduce non-linearity, enabling the network to learn complex patterns. (A) Signup and view all the answers

Why are non-linear activation functions necessary in deep neural networks?

To allow the approximation of arbitrarily complex functions. (B) Signup and view all the answers

What is the main function of a loss function in machine learning?

To quantify the gap between the model's predictions and the actual ground truth. (C) Signup and view all the answers

In the context of a classification problem, what does the Cross-Entropy Loss measure?

The difference between predicted probability distribution and the true distribution of classes. (C) Signup and view all the answers

What is the primary role of the Softmax function in neural networks?

To convert a set of outputs into a probability distribution. (D) Signup and view all the answers

During backpropagation, what is the main purpose of computing the gradient of the loss function with respect to the weights?

To determine the direction and magnitude for updating weights to reduce the loss. (A) Signup and view all the answers

Why does the backpropagation algorithm require that the loss function be continuous and differentiable?

To enable the computation of the derivative (gradient) for weight updates. (B) Signup and view all the answers

Which of the following is a typical step in the backpropagation algorithm after the initialization of weights?

Feed-forward computation to calculate the loss. (B) Signup and view all the answers

In the context of training neural networks, what is the purpose of gradient descent?

To find the minimum of the loss function in weight space. (A) Signup and view all the answers

What characterizes the 'stochastic' aspect of stochastic gradient descent (SGD)?

The gradient calculated from a single training sample is a 'stochastic approximation' of the true cost gradient. (D) Signup and view all the answers

What is a potential problem with gradient descent, especially in deep networks, that adaptive learning rules aim to address?

Vanishing or exploding gradients. (C) Signup and view all the answers

What is the main issue with using Stochastic Gradient Descent (SGD) on non-convex loss functions?

It may not converge to a global minimum and is sensitive to initial parameters. (C) Signup and view all the answers

Why is initializing weights to small random values important when training feedforward neural networks using SGD?

To ensure faster convergence and avoid saturation of neurons. (D) Signup and view all the answers

What is the key difference between batch mode and sequential mode (on-line) gradient descent?

Batch mode updates weights after processing the complete training set, while sequential mode updates after each example. (A) Signup and view all the answers

What does 'regularization' primarily aim to prevent in the context of machine learning?

Overfitting. (D) Signup and view all the answers

How does dropout regularization work in neural networks?

By randomly dropping out (ignoring) a proportion of neurons during training. (B) Signup and view all the answers

What is one of the main benefits of using dropout regularization in neural networks?

It reduces the risk of overfitting by preventing excessive inter-dependencies between nodes. (D) Signup and view all the answers

In the context of neural network training, what is 'early stopping' used for?

To stop training when the model starts to overfit, based on performance on a validation set. (B) Signup and view all the answers

Which of the following network architectures is best suited for processing sequential data, like time series?

Recurrent Neural Networks. (D) Signup and view all the answers

If you want to generate new data that resembles your training dataset, which unsupervised learning architecture could be useful?

Generative Adversarial Network (GAN). (B) Signup and view all the answers

Which type of neural network architecture is often used for tasks that involve making decisions or taking actions in an environment to maximize a reward?

Networks for Actions, Values, Policies, and Models (Reinforcement Learning). (B) Signup and view all the answers

Which of the following are valid reasons for 'Why now' is Deep Learning so popular?

Larger Datasets, Hardware Improvements, Improved Techniques and new models. (D) Signup and view all the answers

What qualities should a neural network have?

Expressibility, Efficiency and Learnability. (A) Signup and view all the answers

What activation function could be defined as $g(z) = max(0,z)$?

Rectified Linear Unit (ReLU). (A) Signup and view all the answers

What functions does Backpropagation compose?

Function composition. (C) Signup and view all the answers

What is the meaning of: $W ← W – η \frac{dJ(W)}{dW}$?

Optimization through gradient descent. (D) Signup and view all the answers

What factors could affect the rate at which learning occurs?

All of the above. (D) Signup and view all the answers

What is the sequential mode of training also known as?

Online, pattern or stochastic mode. (D) Signup and view all the answers

What is batch mode?

Weights are updated only after the complete presentation of the epoch set. (D) Signup and view all the answers

If a model does not have capacity to fully learn the data, what is this called?

Underfitting. (D) Signup and view all the answers

What is the ideal initialization of biases for SGD?

Small Positive values or zero. (C) Signup and view all the answers

What do you call that process of modifying a learning algorithm so as to prevent overfitting?

Regularization. (D) Signup and view all the answers

Flashcards

ML Algorithm

A ML algorithm is able to learn from data.

Deep Learning

A ML technique that employs deep neural networks.