Podcast
Questions and Answers
Which of the following is a key aspect of sequence learning?
Which of the following is a key aspect of sequence learning?
- Designing algorithms for sequential data (correct)
- Treating all inputs as independent
- Focusing solely on static data points
- Ignoring the order of data
In sequence learning, what is a common aim regarding input and output data?
In sequence learning, what is a common aim regarding input and output data?
- To keep input and output in the exact same domain.
- To maximize data redundancy.
- To turn an input sequence into an output sequence in a different domain. (correct)
- To eliminate sequential dependencies.
Which task is given as an example of sequential learning?
Which task is given as an example of sequential learning?
- Network configuration
- Speech recognition (correct)
- Image compression
- Database management
What type of learning is sentiment analysis considered within the context of sequential learning?
What type of learning is sentiment analysis considered within the context of sequential learning?
In machine translation, what characteristic describes the input and output sequences?
In machine translation, what characteristic describes the input and output sequences?
Which of the following is true regarding the application of ANNs to video prediction?
Which of the following is true regarding the application of ANNs to video prediction?
According to the features of sequential learning, what can be said about the elements of the sequence?
According to the features of sequential learning, what can be said about the elements of the sequence?
What does weight sharing within a sequence allow?
What does weight sharing within a sequence allow?
In RNNs, which weights are typically shared across different input frames belonging to the same sequence?
In RNNs, which weights are typically shared across different input frames belonging to the same sequence?
Where is recursion added in RNNs in addition to a new set of weights?
Where is recursion added in RNNs in addition to a new set of weights?
Which term describes the standard RNN diagram?
Which term describes the standard RNN diagram?
During forward propagation in RNNs, what does the internal state depend on?
During forward propagation in RNNs, what does the internal state depend on?
In a many-to-many sequence model, when is the loss computed?
In a many-to-many sequence model, when is the loss computed?
During BPTT, what must the gradient of the loss function be calculated with respect to?
During BPTT, what must the gradient of the loss function be calculated with respect to?
What issue arises if each partial derivative is less than unity in a deep network?
What issue arises if each partial derivative is less than unity in a deep network?
What type of memory is short-term memory biased to capture in an RNN?
What type of memory is short-term memory biased to capture in an RNN?
What is one solution to the problem of vanishing gradients?
What is one solution to the problem of vanishing gradients?
How do Rectified Linear Units saturate?
How do Rectified Linear Units saturate?
What can be used to mitigate the exploding gradient problem?
What can be used to mitigate the exploding gradient problem?
How does the GRU improve upon basic RNNs?
How does the GRU improve upon basic RNNs?
In GRUs, the cell state calculation is a weighted sum of which of the following?
In GRUs, the cell state calculation is a weighted sum of which of the following?
What does the forget gate in LSTM control?
What does the forget gate in LSTM control?
What does a value of input gate near zero mean?
What does a value of input gate near zero mean?
What does that output gate determine?
What does that output gate determine?
What can LSTM do?
What can LSTM do?
What are sequence-to-sequence models composed of?
What are sequence-to-sequence models composed of?
What does the attention mechanism attend to?
What does the attention mechanism attend to?
What do transformers rely on?
What do transformers rely on?
What is the full observability of the environments?
What is the full observability of the environments?
What is estimated by the problem of trajectory prediction is defined as?
What is estimated by the problem of trajectory prediction is defined as?
Which of the following are extracted from tracking data?
Which of the following are extracted from tracking data?
Which is a baseline model?
Which is a baseline model?
Vanilla LSTM Model is
Vanilla LSTM Model is
Attention mechanism in LSTMs allows
Attention mechanism in LSTMs allows
Which algorithms can outperform standard algorithms for trajectory prediction in autonomous driving applications?
Which algorithms can outperform standard algorithms for trajectory prediction in autonomous driving applications?
What is a key feature the attention mechanism has compared to LSTM?
What is a key feature the attention mechanism has compared to LSTM?
What is used to control what information should pass through the LSTM cells?
What is used to control what information should pass through the LSTM cells?
Which factors are required for sequence learning using RNN, GRUs and LSTM?
Which factors are required for sequence learning using RNN, GRUs and LSTM?
Which of these are typical applications of RNNs?
Which of these are typical applications of RNNs?
Which of the following is not a step in the prediction pipeline for vehicle trajectory prediction?
Which of the following is not a step in the prediction pipeline for vehicle trajectory prediction?
What kind of assumption about future trajectories can explain poor prediction results?
What kind of assumption about future trajectories can explain poor prediction results?
What is a constant velocity model used for primarily?
What is a constant velocity model used for primarily?
What does the GRU mitigate?
What does the GRU mitigate?
What is the primary use of linear regression?
What is the primary use of linear regression?
What function does logistic regression use?
What function does logistic regression use?
For what type of task are CNNs primarily designed?
For what type of task are CNNs primarily designed?
What happens when a model overfits?
What happens when a model overfits?
What is a key consideration regarding input and output sequences in sequential learning?
What is a key consideration regarding input and output sequences in sequential learning?
What assumption do traditional neural networks make about inputs?
What assumption do traditional neural networks make about inputs?
What is the primary reason for the poor performance of vanilla RNNs when learning long sequences?
What is the primary reason for the poor performance of vanilla RNNs when learning long sequences?
What is a common solution to the vanishing gradient problem in RNNs?
What is a common solution to the vanishing gradient problem in RNNs?
Why should machine learning be used?
Why should machine learning be used?
What is one way to mitigate the exploding gradient problem?
What is one way to mitigate the exploding gradient problem?
What does GRU stand for?
What does GRU stand for?
What is a key feature of LSTMs that helps alleviate the vanishing gradient problem?
What is a key feature of LSTMs that helps alleviate the vanishing gradient problem?
When extracting input feature sequences for trajectory prediction, from what is the feature from?
When extracting input feature sequences for trajectory prediction, from what is the feature from?
What does the Constant Velocity Model assume?
What does the Constant Velocity Model assume?
For video prediction, what can be used to extract features?
For video prediction, what can be used to extract features?
For processing video data, what is a downside of using ANNs?
For processing video data, what is a downside of using ANNs?
What characterizes the length of input and output sequences in sequential learning tasks?
What characterizes the length of input and output sequences in sequential learning tasks?
What is the term $W_{ax}$ related to?
What is the term $W_{ax}$ related to?
How is total loss calculated in a many-to-many sequence model?
How is total loss calculated in a many-to-many sequence model?
Which type of memory can the RNN capture with biases?
Which type of memory can the RNN capture with biases?
What is the result of high values in the forget gate?
What is the result of high values in the forget gate?
In sequence-to-sequence models, what are LSTMs?
In sequence-to-sequence models, what are LSTMs?
For trajectory prediction, what is the full observation of the environment?
For trajectory prediction, what is the full observation of the environment?
According to what the text says, what model can perform better under long horizons?
According to what the text says, what model can perform better under long horizons?
Flashcards
When to use machine learning?
When to use machine learning?
ML can be used when a task can't be easily defined mathematically or lacks a closed-form solution.
What is linear regression?
What is linear regression?
A regression model to estimate the linear realtionship between a variable and a vector of features.
What is sequence learning?
What is sequence learning?
It is a many to many model to turn sequential data into an output sequence in a different domain.
Why not use standard neural networks for sequential data?
Why not use standard neural networks for sequential data?
Signup and view all the flashcards
What is Weight sharing in RNN?
What is Weight sharing in RNN?
Signup and view all the flashcards
What is an RNN?
What is an RNN?
Signup and view all the flashcards
What is Forward Pass?
What is Forward Pass?
Signup and view all the flashcards
What is Back-propagation through time (BPTT)?
What is Back-propagation through time (BPTT)?
Signup and view all the flashcards
What are Vanishing gradient problems?
What are Vanishing gradient problems?
Signup and view all the flashcards
What are Exploding gradient problems?
What are Exploding gradient problems?
Signup and view all the flashcards
What is LSTM?
What is LSTM?
Signup and view all the flashcards
What is GRU?
What is GRU?
Signup and view all the flashcards
What is Attention Mechanism?
What is Attention Mechanism?
Signup and view all the flashcards
What are Transformers?
What are Transformers?
Signup and view all the flashcards
What are Sequence-to-sequence (Seq2Seq) models?
What are Sequence-to-sequence (Seq2Seq) models?
Signup and view all the flashcards
What is Attention mechanism?
What is Attention mechanism?
Signup and view all the flashcards
What is constant velocity model?
What is constant velocity model?
Signup and view all the flashcards
Why are RNNs well suited for sequence learning?
Why are RNNs well suited for sequence learning?
Signup and view all the flashcards
Study Notes
Recap
- Machine learning is used when tasks can't be mathematically formulated or are too complex for closed-form solutions. Mathematical models, used in linear regression, approximate the relationship between predicted variables and predictors.
- A classifier with a sigmoid function underlies logistic regression forming the basis of feedforward neural networks (FFNNs).
- Convolutional neural networks (CNNs), designed for image processing, extract spatial hierarchies with fewer parameters than fully connected neural networks.
- Overfitting happens if a model is trained too well on training data.
Learning Outcomes
- Standard FFNNs and CNNs aren't effective for modeling sequential data.
- Recurrent neural networks' (RNNs) architecture is capable of sequential learning.
- Vanilla RNNs struggle with the vanishing gradient problem, impacting performance.
- State-of-the-art vehicle trajectory prediction models use transformers to outperform common models.
Outline of Topics
- Sequence learning fundamentals
- Shortcomings of standard neural networks for sequence learning
- Importance of weight sharing in RNNs
- Forward pass, Back-propagation through time (BPTT), and the challenges of vanishing and exploding gradients in RNNs
- LSTM and GRU architectures
- Utilisation of Attention Mechanisms & Transformers
- Real-world application examples of RNNs
- Using RNNs for vehicle trajectory prediction
Sequential Learning
- Focuses on machine learning algorithms designed for sequential data.
- Transforms input sequential data into an output sequence in a different domain.
- Speech recognition (audio to text) is an example of many-to-many learning.
- Sentiment analysis and abuse language identification are applications of sequential learning.
- Many-to-one learning is when an input is a sequence, but not the actual output.
- Machine translation is a many-to-many paradigm, which allows input and output sequences of variable lengths.
- Frame-level video classification is another example of a many-to-many model.
- Image captioning: one-to-many learning approach
Video Prediction with ANNs
- By inputting a certain time of frames predict the next frame.
- Lots of parameters are needed to train the model.
- Scalability and accuracy are compromised when capturing temporal correlations for sequential learning.
Features of Sequential Learning
- Sequential learning involves input and output sequences, whose lengths can vary across tasks.
- Traditional networks treat inputs as independent, unlike sequential data where strong correlations exist.
- The ordering of inputs matters significantly as changes to the order will impact the meaning.
- Long-term dependencies are a key factor.
Weight Sharing in RNN
- Sequences of frames are entered as vectors in standard neural networks.
- RNNs take into account pixels shared among frames to find temporal relations.
- Weight sharing allows generalisation to sequences of variable lengths.
- Weights Wax are distributed to frames in the same sequence.
- Recursion is added at hidden layer with a new set of weights Waa shared within the training sequence.
- RNN architecture extends to cover many-to-many models.
- Weight matrices such as Waa, Wax, and Wya, are shared across time.
- On the computation graph, that's the RNN representation at the left-hand-side.
Forward Propagation in RNN
- Forward propagation is similar to standard FFNNs.
- Internal state relies on inputs from all sequence.
Forward Propagation in Many-to-Many RNN
- A loss must happen in each time step, while the the total loss L is derived from each loss at time step Lt.
Back-Propagation Through Time (BPTT)
- An RNN with a few cells and single output is used to illustrate.
BPTT Details
- During training, loss function gradient must be calculated re all weights: Wya, Wax, Waa.
- The loss gradient must be calculated over previous time steps in sequence.
BPTT calculations
- α<3> = tanh(Wax x<3> + Waa α<2>)
Vanishing Gradients
- Gradients vanish when a large sequence with long-term dependencies is used.
- The chain rule dictates repeated multiplications occur in calculating the loss gradient relative to the weights Wax or Waa.
- Weights are not updated if each partial derivative is below one and n is large.
- The vanishing gradient problem occurs with both the tanh() and sigmoid functions.
- RNNs tend to capture short-term memory due to vanishing gradients.
- Solutions include different activation functions such as ReLU, using weight initialisation, or using gates.
Vanishing and Exploding Gradients
- The Rectified Linear Unit (ReLU) heads towards one direction.
- Gradients might also explode.
- Gradient clipping mitigates the exploding gradient problem.
Gated Recurrent Unit (GRU)
- A GRU modifies the hidden layer of the basic RNN.
- Improves in capturing long-term dependencies and mitigating the vanishing gradient problem.
- Hidden state formula: c
= tanh(Wxc x + Wcc c )
GRU cont'd
- GRU adds a sigmoid gate Γu alongside extra weights.
- It calculates a new cell state c
different to that of standard RNNs. - Parameter č
has the standard RNN calculation. č = tanh(W[c , x ] + bc) - The cell state is a weighted sum of previous state and č
. - c*
= Γu č + (1 – Γu)c - GRU adds more memory calculating by taking a fraction from the other state c
- The parameter's value will determine info that is irrelevant and the significance to store č
- The GRU and RNNs compute the hidden state given the previous, but in different ways.
Long Short-Term Memory (LSTM)
- LSTM was proposed in 1997 and has 3 sigmoids beyond the tanh().
- Gates will work in tandem to identify what is let through.
- A value of the forget gate ft near 1 mostly passes state ct-1 to the next ct
- An gate where the new value c~ is not passed onto the next state ct when gate it is close to zero.
- The current plus previous cell state is known as additive.
- The output gate determines how much gets output to the next hidden layer.
- Hidden is a filtered version of the filter ct.
- The memory ht will be used to readout as yt.
- LSTM capable of of forgetting, storing, outputting and updating data.
- Relationship between the current and previous is additive and helps with the vanishing and exploding gradients.
LSTM vs Attention Mechanism
- Sequence-to-sequence (Seq2Seq) consist of an encoder/decoder model.
- An LSTM encodes the input sentence and another LSTM receives the translation.
- The last hidden and cell states' encoder may not be good to use translation.
- The attention mechanism makes predictions at the current time-step at the decoder by attending all hidden states of the encoder.
- Each time step has different states, to define what is important at each time step.
Attention Mechanism
- Weight at,j shows the relevance of hidden state hj of said encoder.
- Softmax shows the decoder's inputs xj to state st-1
- Context vector includes these parameters. ct = ΣTj=1 atj hj.
- The predicted state st at the decoder is trained via a small feedforward neural network with inputs via context ct and previous output st−1.
- Instead of encoding to the hidden state, it leverags the hidden state to select them adaptively until termination.
Transformers
- Transformers outperform models such as LSTMs.
- Employs self and cross-attention to find relationship between input and output sequences.
Typical Applications
- Includes Video, Audio and Text.
- Video Action Recognition
- Speech Recognition
- Sentiment Analysis
- Video classification
- Video prediction
- Language modeling
- Image captioning
- Machine translation
Video Prediction
- Can pre-trained CNNs be extracted like this?
- Can a pro-trained CNN be used?
- CNN is used during inferences stage with the existing training.
Vehicle Trajectory Predications
- Detecting the surrounding vehicles' location is insufficient for safe decision making.
- The problem involves estimating the sequence of trajectory TV's future positions via a prediction window YTv = {(X1,Y1), ..., (XTpred, YT pred)} from info of the TV during the observation window.
- Observe with Tobs
- There are 3 pipelines: Inputs, Encoder, and a Decoder.
- Baseline models: Constant Velocity Model (CV) when they contain the same speed in the observation window.
- Vanilla LSTM is known as V-LSTM which makes predictions with the same trajectory.
Model
- Highway Drone Dataset (highd) consists of more than 16 hours of recording.
- Root Mean Squared Error
- Final Displacement Error Final Displacement Error (FDE). This metric computes the distance between the predicted and ground truth position of the TV at the final time-step of the prediction window.
Qualitative Results
- Show results in lane change detection with a 3 second window.
Reflection Questions
- Why are standard FFNN and CNN models inefficient for sequential data?
- What are the key features?
- Can you explain how vanilla RNNs learn sequentially?
- Primary reason for the RNN's poor learning for long sequences?
- How does Attention mechanisms improve learning via LSTMs?
- Why do attention mechanisms and transformers outperform algorithms for trajectory prediction in autonomous driving applications?
Summary
- MLPs and FFNN are inadequate due to scalability.
- RNNs have long term issues, and use exploding gradients.
- LSTM provides solutions to the memory issues. LSTM/GRUs use pointwise multiplication and addition for gated info.
- RNNs, GRUs, LSTMs used high memory and computation.
- Difficulty occurs due to the nature of computations that happen.
- Very long encodings might miss certain temporal dependencies and inputs in the sequence.
Summary
- Attention mechanism allows a subset of inputs which will enhance the vanilla LSTM.
- Transformers will perform well in driving simulations.
- Independece can explain long horizon results.
- models are enough for driving but not those more difficult.
- Implemented predictions and motion planning should be sensory.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Explore recurrent neural networks (RNNs) for sequential data modeling, addressing the limitations of FFNNs and CNNs. Understand the challenges of vanishing gradients in vanilla RNNs. Learn how transformers enhance vehicle trajectory prediction, surpassing traditional models.