Podcast
Questions and Answers
What does the input to the transformer consist of?
What does the input to the transformer consist of?
How does the encoder enrich the embedding vectors?
How does the encoder enrich the embedding vectors?
What is the purpose of the start-of-sentence marker?
What is the purpose of the start-of-sentence marker?
What is the processing pattern used by the decoder?
What is the processing pattern used by the decoder?
Signup and view all the answers
What is the output of the decoder?
What is the output of the decoder?
Signup and view all the answers
What is the role of positional encodings in the input embeddings?
What is the role of positional encodings in the input embeddings?
Signup and view all the answers
What is the primary purpose of the decoder in the transformer architecture?
What is the primary purpose of the decoder in the transformer architecture?
Signup and view all the answers
What is the dimension of the output vector from the decoder after the linear transformation?
What is the dimension of the output vector from the decoder after the linear transformation?
Signup and view all the answers
What is the purpose of the softmax layer in the transformer architecture?
What is the purpose of the softmax layer in the transformer architecture?
Signup and view all the answers
What is the purpose of the beam search approach?
What is the purpose of the beam search approach?
Signup and view all the answers
What is the greedy method used for in the transformer architecture?
What is the greedy method used for in the transformer architecture?
Signup and view all the answers
What is the main idea behind the paper title 'Attention Is All You Need'?
What is the main idea behind the paper title 'Attention Is All You Need'?
Signup and view all the answers
What is the input to the decoder after the first token is predicted?
What is the input to the decoder after the first token is predicted?
Signup and view all the answers
Why do we need to encode word positions in machine translation models?
Why do we need to encode word positions in machine translation models?
Signup and view all the answers
Where does the positional encoding happen in the transformer architecture?
Where does the positional encoding happen in the transformer architecture?
Signup and view all the answers
What is the dimensionality of the positional encoding in the base transformer?
What is the dimensionality of the positional encoding in the base transformer?
Signup and view all the answers
Why do we use element-wise addition to combine word embeddings and positional encodings?
Why do we use element-wise addition to combine word embeddings and positional encodings?
Signup and view all the answers
What is an alternative to element-wise addition for combining word embeddings and positional encodings?
What is an alternative to element-wise addition for combining word embeddings and positional encodings?
Signup and view all the answers