Recurrent Neural Networks (RNNs)
67 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which of the following is a key aspect of sequence learning?

  • Designing algorithms for sequential data (correct)
  • Treating all inputs as independent
  • Focusing solely on static data points
  • Ignoring the order of data

In sequence learning, what is a common aim regarding input and output data?

  • To keep input and output in the exact same domain.
  • To maximize data redundancy.
  • To turn an input sequence into an output sequence in a different domain. (correct)
  • To eliminate sequential dependencies.

Which task is given as an example of sequential learning?

  • Network configuration
  • Speech recognition (correct)
  • Image compression
  • Database management

What type of learning is sentiment analysis considered within the context of sequential learning?

<p>Many-to-one learning (C)</p> Signup and view all the answers

In machine translation, what characteristic describes the input and output sequences?

<p>They have different and variable lengths. (B)</p> Signup and view all the answers

Which of the following is true regarding the application of ANNs to video prediction?

<p>ANNs face a trade-off between scalability and accuracy. (C)</p> Signup and view all the answers

According to the features of sequential learning, what can be said about the elements of the sequence?

<p>Inputs can be strongly correlated within a sequence. (C)</p> Signup and view all the answers

What does weight sharing within a sequence allow?

<p>Generalizing to sequences of variable lengths (A)</p> Signup and view all the answers

In RNNs, which weights are typically shared across different input frames belonging to the same sequence?

<p>$W_{ax}$ (C)</p> Signup and view all the answers

Where is recursion added in RNNs in addition to a new set of weights?

<p>The hidden layer (A)</p> Signup and view all the answers

Which term describes the standard RNN diagram?

<p>Computation graph (A)</p> Signup and view all the answers

During forward propagation in RNNs, what does the internal state depend on?

<p>All earlier inputs of the sequence. (C)</p> Signup and view all the answers

In a many-to-many sequence model, when is the loss computed?

<p>At each time step (B)</p> Signup and view all the answers

During BPTT, what must the gradient of the loss function be calculated with respect to?

<p>All of the weights (B)</p> Signup and view all the answers

What issue arises if each partial derivative is less than unity in a deep network?

<p>Vanishing gradients (D)</p> Signup and view all the answers

What type of memory is short-term memory biased to capture in an RNN?

<p>Short-term (D)</p> Signup and view all the answers

What is one solution to the problem of vanishing gradients?

<p>Both using another activation function and using gates (D)</p> Signup and view all the answers

How do Rectified Linear Units saturate?

<p>Towards one direction only (D)</p> Signup and view all the answers

What can be used to mitigate the exploding gradient problem?

<p>Gradient clipping (C)</p> Signup and view all the answers

How does the GRU improve upon basic RNNs?

<p>By capturing long-term dependencies (D)</p> Signup and view all the answers

In GRUs, the cell state calculation is a weighted sum of which of the following?

<p>Previous cell state and candidate cell state (D)</p> Signup and view all the answers

What does the forget gate in LSTM control?

<p>The amount of past information to erase from the cell state (B)</p> Signup and view all the answers

What does a value of input gate near zero mean?

<p>The new cell state won't be passed onto the next cell state. (B)</p> Signup and view all the answers

What does that output gate determine?

<p>How much of the new cell state will be output to the next hidden layer. (B)</p> Signup and view all the answers

What can LSTM do?

<p>Forget, store, output and update. (D)</p> Signup and view all the answers

What are sequence-to-sequence models composed of?

<p>An encoder and a decoder. (C)</p> Signup and view all the answers

What does the attention mechanism attend to?

<p>All encoder’s hidden states. (B)</p> Signup and view all the answers

What do transformers rely on?

<p>Self-attention and cross-attention mechanisms. (B)</p> Signup and view all the answers

What is the full observability of the environments?

<p>The tracking of other road users. (A)</p> Signup and view all the answers

What is estimated by the problem of trajectory prediction is defined as?

<p>The sequence of TV's future x-y position. (A)</p> Signup and view all the answers

Which of the following are extracted from tracking data?

<p>The input feature sequences. (A)</p> Signup and view all the answers

Which is a baseline model?

<p>Constant Velocity Model. (A)</p> Signup and view all the answers

Vanilla LSTM Model is

<p>An encoder-decoder LSTM model. (D)</p> Signup and view all the answers

Attention mechanism in LSTMs allows

<p>relating a subset of inputs. (D)</p> Signup and view all the answers

Which algorithms can outperform standard algorithms for trajectory prediction in autonomous driving applications?

<p>Attention mechanisms and Transformers (D)</p> Signup and view all the answers

What is a key feature the attention mechanism has compared to LSTM?

<p>It does not encode the whole input sentence into a single fixed-length vector. (A)</p> Signup and view all the answers

What is used to control what information should pass through the LSTM cells?

<p>Gates using pointwise multiplication and addition (B)</p> Signup and view all the answers

Which factors are required for sequence learning using RNN, GRUs and LSTM?

<p>High memory and computations (C)</p> Signup and view all the answers

Which of these are typical applications of RNNs?

<p>All of the above (D)</p> Signup and view all the answers

Which of the following is not a step in the prediction pipeline for vehicle trajectory prediction?

<p>Generating random noise (A)</p> Signup and view all the answers

What kind of assumption about future trajectories can explain poor prediction results?

<p>Independence (B)</p> Signup and view all the answers

What is a constant velocity model used for primarily?

<p>Providing a general solution (D)</p> Signup and view all the answers

What does the GRU mitigate?

<p>The short-term memory problem (B)</p> Signup and view all the answers

What is the primary use of linear regression?

<p>To estimate the linear relationship between variables. (B)</p> Signup and view all the answers

What function does logistic regression use?

<p>Sigmoid (B)</p> Signup and view all the answers

For what type of task are CNNs primarily designed?

<p>Image processing. (D)</p> Signup and view all the answers

What happens when a model overfits?

<p>It learns the training data too well. (A)</p> Signup and view all the answers

What is a key consideration regarding input and output sequences in sequential learning?

<p>They may have different and variable lengths (A)</p> Signup and view all the answers

What assumption do traditional neural networks make about inputs?

<p>Inputs are independent of each other. (D)</p> Signup and view all the answers

What is the primary reason for the poor performance of vanilla RNNs when learning long sequences?

<p>Vanishing gradient problem (C)</p> Signup and view all the answers

What is a common solution to the vanishing gradient problem in RNNs?

<p>Using gate mechanisms. (A)</p> Signup and view all the answers

Why should machine learning be used?

<p>When a task cannot be easily formulated mathematically. (A)</p> Signup and view all the answers

What is one way to mitigate the exploding gradient problem?

<p>Gradient Clipping (C)</p> Signup and view all the answers

What does GRU stand for?

<p>Gated Recurrent Unit (B)</p> Signup and view all the answers

What is a key feature of LSTMs that helps alleviate the vanishing gradient problem?

<p>Additive relations (D)</p> Signup and view all the answers

When extracting input feature sequences for trajectory prediction, from what is the feature from?

<p>Tracking date (C)</p> Signup and view all the answers

What does the Constant Velocity Model assume?

<p>The TV will have the same speed. (A)</p> Signup and view all the answers

For video prediction, what can be used to extract features?

<p>CNNs (A)</p> Signup and view all the answers

For processing video data, what is a downside of using ANNs?

<p>ANNs face a trade-off between scalability and accuracy. (A)</p> Signup and view all the answers

What characterizes the length of input and output sequences in sequential learning tasks?

<p>They may be different and variable. (B)</p> Signup and view all the answers

What is the term $W_{ax}$ related to?

<p>Weight sharing (A)</p> Signup and view all the answers

How is total loss calculated in a many-to-many sequence model?

<p>Based on all the losses (D)</p> Signup and view all the answers

Which type of memory can the RNN capture with biases?

<p>Short-term (D)</p> Signup and view all the answers

What is the result of high values in the forget gate?

<p>Most of the past states go to the next (B)</p> Signup and view all the answers

In sequence-to-sequence models, what are LSTMs?

<p>Encoders and decoders (D)</p> Signup and view all the answers

For trajectory prediction, what is the full observation of the environment?

<p>We assume it (C)</p> Signup and view all the answers

According to what the text says, what model can perform better under long horizons?

<p>Transformer (D)</p> Signup and view all the answers

Flashcards

When to use machine learning?

ML can be used when a task can't be easily defined mathematically or lacks a closed-form solution.

What is linear regression?

A regression model to estimate the linear realtionship between a variable and a vector of features.

What is sequence learning?

It is a many to many model to turn sequential data into an output sequence in a different domain.

Why not use standard neural networks for sequential data?

Standard neural networks process each input independently, ignoring sequential relationships, leading to inefficiency.

Signup and view all the flashcards

What is Weight sharing in RNN?

Sharing weights across different positions in the sequence to capture temporal correlations and generalise variable lengths.

Signup and view all the flashcards

What is an RNN?

A type of neural network designed to process sequential data by maintaining a hidden state that captures information about past elements in sequence.

Signup and view all the flashcards

What is Forward Pass?

RNNs process inputs sequentially. At each step, the input and the previous hidden state are used to update the current hidden state, which then produces an output.

Signup and view all the flashcards

What is Back-propagation through time (BPTT)?

BPTT is a technique used to train RNNs, where the gradient of the loss function is calculated with respect to the weights by propagating errors backward through time.

Signup and view all the flashcards

What are Vanishing gradient problems?

Vanishing gradients can occur when training RNNs due to repeated multiplication, the gradients become too small to update the weights effectively.

Signup and view all the flashcards

What are Exploding gradient problems?

Exploding gradients can occur when training RNNs becausethe gradients become too large, leading to unstable training.

Signup and view all the flashcards

What is LSTM?

A type of recurrent neural network architecture that aims to address the vanishing gradient problem by using memory cells and gates. These gates regulate the flow of information into and out of the cell.

Signup and view all the flashcards

What is GRU?

Is a simplified version of LSTM to solve the vanishing gradient problem. Uses update and reset gates.

Signup and view all the flashcards

What is Attention Mechanism?

A mechanism that allows a neural network to focus on different parts of the input sequence when making predictions, improving performance on tasks such as machine translation.

Signup and view all the flashcards

What are Transformers?

A model architecture relying entirely on attention mechanisms. Allowing for parallelization and has achieved state-of-the-art results in various NLP tasks.

Signup and view all the flashcards

What are Sequence-to-sequence (Seq2Seq) models?

An encoder-decoder architecture that processes the input sequence and produces an output sequence.

Signup and view all the flashcards

What is Attention mechanism?

A technique used in RNNs (especially LSTMs and GRUs) where the hidden state from multiple time steps are considered to improve performance.

Signup and view all the flashcards

What is constant velocity model?

The constant velocity model assumes the TV will continue to have the same speed as its speed during the observation window.

Signup and view all the flashcards

Why are RNNs well suited for sequence learning?

These models are well-suited for sequence learning. However, they suffer from long-term memory problems, such and exploding gradients.

Signup and view all the flashcards

Study Notes

Recap

  • Machine learning is used when tasks can't be mathematically formulated or are too complex for closed-form solutions. Mathematical models, used in linear regression, approximate the relationship between predicted variables and predictors.
  • A classifier with a sigmoid function underlies logistic regression forming the basis of feedforward neural networks (FFNNs).
  • Convolutional neural networks (CNNs), designed for image processing, extract spatial hierarchies with fewer parameters than fully connected neural networks.
  • Overfitting happens if a model is trained too well on training data.

Learning Outcomes

  • Standard FFNNs and CNNs aren't effective for modeling sequential data.
  • Recurrent neural networks' (RNNs) architecture is capable of sequential learning.
  • Vanilla RNNs struggle with the vanishing gradient problem, impacting performance.
  • State-of-the-art vehicle trajectory prediction models use transformers to outperform common models.

Outline of Topics

  • Sequence learning fundamentals
  • Shortcomings of standard neural networks for sequence learning
  • Importance of weight sharing in RNNs
  • Forward pass, Back-propagation through time (BPTT), and the challenges of vanishing and exploding gradients in RNNs
  • LSTM and GRU architectures
  • Utilisation of Attention Mechanisms & Transformers
  • Real-world application examples of RNNs
  • Using RNNs for vehicle trajectory prediction

Sequential Learning

  • Focuses on machine learning algorithms designed for sequential data.
  • Transforms input sequential data into an output sequence in a different domain.
  • Speech recognition (audio to text) is an example of many-to-many learning.
  • Sentiment analysis and abuse language identification are applications of sequential learning.
  • Many-to-one learning is when an input is a sequence, but not the actual output.
  • Machine translation is a many-to-many paradigm, which allows input and output sequences of variable lengths.
  • Frame-level video classification is another example of a many-to-many model.
  • Image captioning: one-to-many learning approach

Video Prediction with ANNs

  • By inputting a certain time of frames predict the next frame.
  • Lots of parameters are needed to train the model.
  • Scalability and accuracy are compromised when capturing temporal correlations for sequential learning.

Features of Sequential Learning

  • Sequential learning involves input and output sequences, whose lengths can vary across tasks.
  • Traditional networks treat inputs as independent, unlike sequential data where strong correlations exist.
  • The ordering of inputs matters significantly as changes to the order will impact the meaning.
  • Long-term dependencies are a key factor.

Weight Sharing in RNN

  • Sequences of frames are entered as vectors in standard neural networks.
  • RNNs take into account pixels shared among frames to find temporal relations.
  • Weight sharing allows generalisation to sequences of variable lengths.
  • Weights Wax are distributed to frames in the same sequence.
  • Recursion is added at hidden layer with a new set of weights Waa shared within the training sequence.
  • RNN architecture extends to cover many-to-many models.
  • Weight matrices such as Waa, Wax, and Wya, are shared across time.
  • On the computation graph, that's the RNN representation at the left-hand-side.

Forward Propagation in RNN

  • Forward propagation is similar to standard FFNNs.
  • Internal state relies on inputs from all sequence.

Forward Propagation in Many-to-Many RNN

  • A loss must happen in each time step, while the the total loss L is derived from each loss at time step Lt.

Back-Propagation Through Time (BPTT)

  • An RNN with a few cells and single output is used to illustrate.

BPTT Details

  • During training, loss function gradient must be calculated re all weights: Wya, Wax, Waa.
  • The loss gradient must be calculated over previous time steps in sequence.

BPTT calculations

  • α<3> = tanh(Wax x<3> + Waa α<2>)

Vanishing Gradients

  • Gradients vanish when a large sequence with long-term dependencies is used.
  • The chain rule dictates repeated multiplications occur in calculating the loss gradient relative to the weights Wax or Waa.
  • Weights are not updated if each partial derivative is below one and n is large.
  • The vanishing gradient problem occurs with both the tanh() and sigmoid functions.
  • RNNs tend to capture short-term memory due to vanishing gradients.
  • Solutions include different activation functions such as ReLU, using weight initialisation, or using gates.

Vanishing and Exploding Gradients

  • The Rectified Linear Unit (ReLU) heads towards one direction.
  • Gradients might also explode.
  • Gradient clipping mitigates the exploding gradient problem.

Gated Recurrent Unit (GRU)

  • A GRU modifies the hidden layer of the basic RNN.
  • Improves in capturing long-term dependencies and mitigating the vanishing gradient problem.
  • Hidden state formula: c = tanh(Wxc x + Wcc c)

GRU cont'd

  • GRU adds a sigmoid gate Γu alongside extra weights.
  • It calculates a new cell state c different to that of standard RNNs.
  • Parameter č has the standard RNN calculation. č = tanh(W[c, x] + bc)
  • The cell state is a weighted sum of previous state and č.
  • c* = Γu č + (1 – Γu)c
  • GRU adds more memory calculating by taking a fraction from the other state c
  • The parameter's value will determine info that is irrelevant and the significance to store č
  • The GRU and RNNs compute the hidden state given the previous, but in different ways.

Long Short-Term Memory (LSTM)

  • LSTM was proposed in 1997 and has 3 sigmoids beyond the tanh().
  • Gates will work in tandem to identify what is let through.
  • A value of the forget gate ft near 1 mostly passes state ct-1 to the next ct
  • An gate where the new value c~ is not passed onto the next state ct when gate it is close to zero.
  • The current plus previous cell state is known as additive.
  • The output gate determines how much gets output to the next hidden layer.
  • Hidden is a filtered version of the filter ct.
  • The memory ht will be used to readout as yt.
  • LSTM capable of of forgetting, storing, outputting and updating data.
  • Relationship between the current and previous is additive and helps with the vanishing and exploding gradients.

LSTM vs Attention Mechanism

  • Sequence-to-sequence (Seq2Seq) consist of an encoder/decoder model.
  • An LSTM encodes the input sentence and another LSTM receives the translation.
  • The last hidden and cell states' encoder may not be good to use translation.
  • The attention mechanism makes predictions at the current time-step at the decoder by attending all hidden states of the encoder.
  • Each time step has different states, to define what is important at each time step.

Attention Mechanism

  • Weight at,j shows the relevance of hidden state hj of said encoder.
  • Softmax shows the decoder's inputs xj to state st-1
  • Context vector includes these parameters. ct = ΣTj=1 atj hj.
  • The predicted state st at the decoder is trained via a small feedforward neural network with inputs via context ct and previous output st−1.
  • Instead of encoding to the hidden state, it leverags the hidden state to select them adaptively until termination.

Transformers

  • Transformers outperform models such as LSTMs.
  • Employs self and cross-attention to find relationship between input and output sequences.

Typical Applications

  • Includes Video, Audio and Text.
  • Video Action Recognition
  • Speech Recognition
  • Sentiment Analysis
  • Video classification
  • Video prediction
  • Language modeling
  • Image captioning
  • Machine translation

Video Prediction

  • Can pre-trained CNNs be extracted like this?
  • Can a pro-trained CNN be used?
  • CNN is used during inferences stage with the existing training.

Vehicle Trajectory Predications

  • Detecting the surrounding vehicles' location is insufficient for safe decision making.
  • The problem involves estimating the sequence of trajectory TV's future positions via a prediction window YTv = {(X1,Y1), ..., (XTpred, YT pred)} from info of the TV during the observation window.
  • Observe with Tobs
  • There are 3 pipelines: Inputs, Encoder, and a Decoder.
  • Baseline models: Constant Velocity Model (CV) when they contain the same speed in the observation window.
  • Vanilla LSTM is known as V-LSTM which makes predictions with the same trajectory.

Model

  • Highway Drone Dataset (highd) consists of more than 16 hours of recording.
  • Root Mean Squared Error
  • Final Displacement Error Final Displacement Error (FDE). This metric computes the distance between the predicted and ground truth position of the TV at the final time-step of the prediction window.

Qualitative Results

  • Show results in lane change detection with a 3 second window.

Reflection Questions

  • Why are standard FFNN and CNN models inefficient for sequential data?
  • What are the key features?
  • Can you explain how vanilla RNNs learn sequentially?
  • Primary reason for the RNN's poor learning for long sequences?
  • How does Attention mechanisms improve learning via LSTMs?
  • Why do attention mechanisms and transformers outperform algorithms for trajectory prediction in autonomous driving applications?

Summary

  • MLPs and FFNN are inadequate due to scalability.
  • RNNs have long term issues, and use exploding gradients.
  • LSTM provides solutions to the memory issues. LSTM/GRUs use pointwise multiplication and addition for gated info.
  • RNNs, GRUs, LSTMs used high memory and computation.
  • Difficulty occurs due to the nature of computations that happen.
  • Very long encodings might miss certain temporal dependencies and inputs in the sequence.

Summary

  • Attention mechanism allows a subset of inputs which will enhance the vanilla LSTM.
  • Transformers will perform well in driving simulations.
  • Independece can explain long horizon results.
  • models are enough for driving but not those more difficult.
  • Implemented predictions and motion planning should be sensory.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Description

Explore recurrent neural networks (RNNs) for sequential data modeling, addressing the limitations of FFNNs and CNNs. Understand the challenges of vanishing gradients in vanilla RNNs. Learn how transformers enhance vehicle trajectory prediction, surpassing traditional models.

Use Quizgecko on...
Browser
Browser