Podcast
Questions and Answers
Which of the following best describes the purpose of attention in a sequential decoder?
Which of the following best describes the purpose of attention in a sequential decoder?
- To alleviate the vanishing gradient problem
- To compute the alignment model f
- To focus on the most relevant parts of the input sequence for each output (correct)
- To compute the context vector c
What is the formula for computing the attention score αᵢⱼ in the context of attention?
What is the formula for computing the attention score αᵢⱼ in the context of attention?
- αᵢⱼ = softmax(eⱼ) (correct)
- αᵢⱼ = softmax(f(i, j))
- αᵢⱼ = softmax(e)
- αᵢⱼ = softmax(hⱼ)
What does the alignment model f in the context of attention represent?
What does the alignment model f in the context of attention represent?
- The amount of attention the ith output should pay to the jth input
- The scores of how well the inputs around position j and the output at position i match (correct)
- The hidden state from the previous timestep
- The encoder state for the jth input
How can the alignment model f be approximated?
How can the alignment model f be approximated?
What is the purpose of the context vector c in the context of attention?
What is the purpose of the context vector c in the context of attention?