Recurrent Neural Networks (RNNs)

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

Which of the following is a key aspect of sequence learning?

Designing algorithms for sequential data (correct)
Treating all inputs as independent
Focusing solely on static data points
Ignoring the order of data

In sequence learning, what is a common aim regarding input and output data?

To keep input and output in the exact same domain.
To maximize data redundancy.
To turn an input sequence into an output sequence in a different domain. (correct)
To eliminate sequential dependencies.

Which task is given as an example of sequential learning?

Network configuration
Speech recognition (correct)
Image compression
Database management

What type of learning is sentiment analysis considered within the context of sequential learning?

Many-to-one learning (C) Signup and view all the answers

In machine translation, what characteristic describes the input and output sequences?

They have different and variable lengths. (B) Signup and view all the answers

Which of the following is true regarding the application of ANNs to video prediction?

ANNs face a trade-off between scalability and accuracy. (C) Signup and view all the answers

According to the features of sequential learning, what can be said about the elements of the sequence?

Inputs can be strongly correlated within a sequence. (C) Signup and view all the answers

What does weight sharing within a sequence allow?

Generalizing to sequences of variable lengths (A) Signup and view all the answers

In RNNs, which weights are typically shared across different input frames belonging to the same sequence?

$W_{ax}$ (C) Signup and view all the answers

Where is recursion added in RNNs in addition to a new set of weights?

The hidden layer (A) Signup and view all the answers

Which term describes the standard RNN diagram?

Computation graph (A) Signup and view all the answers

During forward propagation in RNNs, what does the internal state depend on?

All earlier inputs of the sequence. (C) Signup and view all the answers

In a many-to-many sequence model, when is the loss computed?

At each time step (B) Signup and view all the answers

During BPTT, what must the gradient of the loss function be calculated with respect to?

All of the weights (B) Signup and view all the answers

What issue arises if each partial derivative is less than unity in a deep network?

Vanishing gradients (D) Signup and view all the answers

What type of memory is short-term memory biased to capture in an RNN?

Short-term (D) Signup and view all the answers

What is one solution to the problem of vanishing gradients?

Both using another activation function and using gates (D) Signup and view all the answers

How do Rectified Linear Units saturate?

Towards one direction only (D) Signup and view all the answers

What can be used to mitigate the exploding gradient problem?

Gradient clipping (C) Signup and view all the answers

How does the GRU improve upon basic RNNs?

By capturing long-term dependencies (D) Signup and view all the answers

In GRUs, the cell state calculation is a weighted sum of which of the following?

Previous cell state and candidate cell state (D) Signup and view all the answers

What does the forget gate in LSTM control?

The amount of past information to erase from the cell state (B) Signup and view all the answers

What does a value of input gate near zero mean?

The new cell state won't be passed onto the next cell state. (B) Signup and view all the answers

What does that output gate determine?

How much of the new cell state will be output to the next hidden layer. (B) Signup and view all the answers

What can LSTM do?

Forget, store, output and update. (D) Signup and view all the answers

What are sequence-to-sequence models composed of?

An encoder and a decoder. (C) Signup and view all the answers

What does the attention mechanism attend to?

All encoder’s hidden states. (B) Signup and view all the answers

What do transformers rely on?

Self-attention and cross-attention mechanisms. (B) Signup and view all the answers

What is the full observability of the environments?

The tracking of other road users. (A) Signup and view all the answers

What is estimated by the problem of trajectory prediction is defined as?

The sequence of TV's future x-y position. (A) Signup and view all the answers

Which of the following are extracted from tracking data?

The input feature sequences. (A) Signup and view all the answers

Which is a baseline model?

Constant Velocity Model. (A) Signup and view all the answers

Vanilla LSTM Model is

An encoder-decoder LSTM model. (D) Signup and view all the answers

Attention mechanism in LSTMs allows

relating a subset of inputs. (D) Signup and view all the answers

Which algorithms can outperform standard algorithms for trajectory prediction in autonomous driving applications?

Attention mechanisms and Transformers (D) Signup and view all the answers

What is a key feature the attention mechanism has compared to LSTM?

It does not encode the whole input sentence into a single fixed-length vector. (A) Signup and view all the answers

What is used to control what information should pass through the LSTM cells?

Gates using pointwise multiplication and addition (B) Signup and view all the answers

Which factors are required for sequence learning using RNN, GRUs and LSTM?

High memory and computations (C) Signup and view all the answers

Which of these are typical applications of RNNs?

All of the above (D) Signup and view all the answers

Which of the following is not a step in the prediction pipeline for vehicle trajectory prediction?

Generating random noise (A) Signup and view all the answers

What kind of assumption about future trajectories can explain poor prediction results?

Independence (B) Signup and view all the answers

What is a constant velocity model used for primarily?

Providing a general solution (D) Signup and view all the answers

What does the GRU mitigate?

The short-term memory problem (B) Signup and view all the answers

What is the primary use of linear regression?

To estimate the linear relationship between variables. (B) Signup and view all the answers

What function does logistic regression use?

Sigmoid (B) Signup and view all the answers

For what type of task are CNNs primarily designed?

Image processing. (D) Signup and view all the answers

What happens when a model overfits?

It learns the training data too well. (A) Signup and view all the answers

What is a key consideration regarding input and output sequences in sequential learning?

They may have different and variable lengths (A) Signup and view all the answers

What assumption do traditional neural networks make about inputs?

Inputs are independent of each other. (D) Signup and view all the answers

What is the primary reason for the poor performance of vanilla RNNs when learning long sequences?

Vanishing gradient problem (C) Signup and view all the answers

What is a common solution to the vanishing gradient problem in RNNs?

Using gate mechanisms. (A) Signup and view all the answers

Why should machine learning be used?

When a task cannot be easily formulated mathematically. (A) Signup and view all the answers

What is one way to mitigate the exploding gradient problem?

Gradient Clipping (C) Signup and view all the answers

What does GRU stand for?

Gated Recurrent Unit (B) Signup and view all the answers

What is a key feature of LSTMs that helps alleviate the vanishing gradient problem?

Additive relations (D) Signup and view all the answers

When extracting input feature sequences for trajectory prediction, from what is the feature from?

Tracking date (C) Signup and view all the answers

What does the Constant Velocity Model assume?

The TV will have the same speed. (A) Signup and view all the answers

For video prediction, what can be used to extract features?

CNNs (A) Signup and view all the answers

For processing video data, what is a downside of using ANNs?

ANNs face a trade-off between scalability and accuracy. (A) Signup and view all the answers

What characterizes the length of input and output sequences in sequential learning tasks?

They may be different and variable. (B) Signup and view all the answers

What is the term $W_{ax}$ related to?

Weight sharing (A) Signup and view all the answers

How is total loss calculated in a many-to-many sequence model?

Based on all the losses (D) Signup and view all the answers

Which type of memory can the RNN capture with biases?

Short-term (D) Signup and view all the answers

What is the result of high values in the forget gate?

Most of the past states go to the next (B) Signup and view all the answers

In sequence-to-sequence models, what are LSTMs?

Encoders and decoders (D) Signup and view all the answers

For trajectory prediction, what is the full observation of the environment?

We assume it (C) Signup and view all the answers

According to what the text says, what model can perform better under long horizons?

Transformer (D) Signup and view all the answers

Flashcards

When to use machine learning?

ML can be used when a task can't be easily defined mathematically or lacks a closed-form solution.

What is linear regression?

A regression model to estimate the linear realtionship between a variable and a vector of features.

What is sequence learning?

It is a many to many model to turn sequential data into an output sequence in a different domain.

Why not use standard neural networks for sequential data?

Standard neural networks process each input independently, ignoring sequential relationships, leading to inefficiency.