Mastering Attention Mechanisms in Sequential Decoders

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Which of the following best describes the purpose of attention in a sequential decoder?

  • To alleviate the vanishing gradient problem
  • To compute the alignment model f
  • To focus on the most relevant parts of the input sequence for each output (correct)
  • To compute the context vector c

What is the formula for computing the attention score αᵢⱼ in the context of attention?

  • αᵢⱼ = softmax(eâ±¼) (correct)
  • αᵢⱼ = softmax(f(i, j))
  • αᵢⱼ = softmax(e)
  • αᵢⱼ = softmax(hâ±¼)

What does the alignment model f in the context of attention represent?

  • The amount of attention the ith output should pay to the jth input
  • The scores of how well the inputs around position j and the output at position i match (correct)
  • The hidden state from the previous timestep
  • The encoder state for the jth input

How can the alignment model f be approximated?

<p>By using a small neural network (B)</p> Signup and view all the answers

What is the purpose of the context vector c in the context of attention?

<p>To compute the attention scores αᵢⱼ (B)</p> Signup and view all the answers

Flashcards are hidden until you start studying

More Like This

Use Quizgecko on...
Browser
Browser