Neural Networks & Activation Functions

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

Which activation function is generally recommended as the default for hidden layers in modern neural networks?

Softmax
ReLU or GELU (correct)
Sigmoid
Tanh

For regression problems in the output layer of a neural network, the recommended activation function is Sigmoid.

False (B)

In multi-class classification problems, what type of activation function is typically applied to the output layer?

Softmax

In binary classification, the cost function aims to minimize 𝑝! if 𝑦! = 0 and to ________ if 𝑦! = 1.

maximize Signup and view all the answers

According to the Universal Approximation Theorem, what is a primary characteristic of feedforward networks with hidden layers?

They provide a universal approximation framework. (C) Signup and view all the answers

Match the use case with the appropriate activation function:

Binary Classification = Sigmoid Multiclass Classification = Softmax Hidden Layers = ReLU/GELU Regression = Linear Signup and view all the answers

What is the primary limitation of a single-layer perceptron?

It can only solve linearly separable problems. (D) Signup and view all the answers

Moravec's Paradox highlights that machines excel at tasks requiring sensory perception compared to logical reasoning.

False (B) Signup and view all the answers

Briefly describe the core idea behind 'Path 1: Better Inputs' in machine learning.

Encode domain knowledge to help machine learning algorithms. Signup and view all the answers

Deep learning models are massively optimized with ______ to encode domain knowledge.

stochastic gradient descent Signup and view all the answers

Match the following historical challenges faced by early neural networks with their corresponding description:

Lack of processing power = Limited computational resources hindered the training of complex models. Overfitting = Models learned the training data too well, leading to poor generalization on new data. Vanishing gradients = Gradients became too small during training, preventing weights from updating effectively in deeper layers. Lack of data = Insufficient amounts of data made it difficult to train robust and generalizable models. Signup and view all the answers

Which of the following is a characteristic of deep learning, as described in the content?

Parametric, non-linear and hierarchical (D) Signup and view all the answers

Why are activation functions necessary in neural networks?

To introduce non-linearity, allowing the network to learn complex patterns. (A) Signup and view all the answers

Using different activation functions in each hidden layer of a neural network is a common practice to optimize performance.

False (B) Signup and view all the answers

What is a key characteristic that activation functions must possess for use in neural networks?

differentiable Signup and view all the answers

Activation functions with a limited output range are often called ' ______ functions'.

squashing Signup and view all the answers

In deep feedforward networks, what is the primary goal?

To approximate a function $f$ and learn the best parameters $\theta$ for that approximation. (C) Signup and view all the answers

Which of the following is an advantage of using the Tanh activation function over the Sigmoid function in hidden layers?

Tanh outputs are centered around 0, which can lead to stronger gradients. (B) Signup and view all the answers

Recurrent neural networks (RNNs) are characterized by the absence of feedback connections, distinguishing them from feedforward networks.

False (B) Signup and view all the answers

ReLU activation functions are zero-centered.

False (B) Signup and view all the answers

In the context of neural networks, what is a 'module'?

A building block or transformation, such as a function, that receives input and returns an output based on its activation function. Signup and view all the answers

What is a potential problem associated with ReLU neurons, where they become inactive for all inputs?

dead neurons Signup and view all the answers

During the training of Multilayer Perceptrons (MLPs), weights and biases are learned through 'forward-backward' propagation, which involves mapping input to predicted output, comparing this output to the ground truth, and then propagating __________ to correct predictions.

gradients Signup and view all the answers

Which activation function is designed to address the 'dead neurons' problem by allowing a small, positive gradient when the unit is not active?

Leaky ReLU (D) Signup and view all the answers

Match each component with its corresponding description in the context of neural networks:

Input (x) = Data fed into the first layer of the network Parameters ($\theta$) = Values that are learned during training to optimize the network's function approximation Activation Function ($h$) = A (non-)linear function applied to the output of each layer Output ($a$) = The result produced by a module based on its activation function Signup and view all the answers

Which of the following is NOT a requirement for activation functions in neural networks?

Must be computationally expensive. (B) Signup and view all the answers

In Parametric ReLU (PReLU), the slope of the inactive part of the function is treated as a ______ parameter.

learnable Signup and view all the answers

Which of the following activation functions is most suitable for the output layer when the task is to emulate probabilities?

Sigmoid (C) Signup and view all the answers

What must be done when there are cycles in the architecture of blocks in a recurrent network?

Unfold the graph, often referred to as 'Recurrent Networks'. Signup and view all the answers

In feedforward networks, layers apply a series of functions. The notation $a^L = f(x; \theta) = h^L \circ h^{L-1} \circ … \circ h^1 \circ x$ shows that each function $h^l$ is parameterized by parameters ________.

\theta^l Signup and view all the answers

Flashcards

Perceptron

A single-layer neural network for binary classification. It multiplies inputs by weights, sums them, adds a bias, and outputs 1 if above a threshold, 0 otherwise.

Moravec's Paradox

The observation that tasks easy for humans (perception, motor skills) are hard for machines, and vice versa (logical reasoning).

Path 1: Better Inputs

One approach to improve machine learning by creating better input features, often encoding domain knowledge.

Path 2: Neural Networks

An approach to improve machine learning that focuses on increasing the complexity of neural networks beyond a single layer.