18 Questions
What does the input to the transformer consist of?
Sequence of embedding vectors
How does the encoder enrich the embedding vectors?
Through multiple encoder blocks
What is the purpose of the start-of-sentence marker?
To initiate the translation process
What is the processing pattern used by the decoder?
Auto-regressive
What is the output of the decoder?
One token at a time
What is the role of positional encodings in the input embeddings?
To capture token positions
What is the primary purpose of the decoder in the transformer architecture?
To enrich the embedding vector with contextual information
What is the dimension of the output vector from the decoder after the linear transformation?
The size of the vocabulary
What is the purpose of the softmax layer in the transformer architecture?
To predict the probabilities of the entire vocabulary
What is the purpose of the beam search approach?
To look for the best combination of tokens
What is the greedy method used for in the transformer architecture?
To select the most probable token at a time
What is the main idea behind the paper title 'Attention Is All You Need'?
Recurrence and convolution are unnecessary for sequence-to-sequence tasks.
What is the input to the decoder after the first token is predicted?
The predicted token and the input features
Why do we need to encode word positions in machine translation models?
To differentiate between semantically identical sentences.
Where does the positional encoding happen in the transformer architecture?
After the input word embedding and before the encoder.
What is the dimensionality of the positional encoding in the base transformer?
512 elements.
Why do we use element-wise addition to combine word embeddings and positional encodings?
Because the positional encodings have the same dimension as the embeddings.
What is an alternative to element-wise addition for combining word embeddings and positional encodings?
Vector concatenation.
Learn about the transformer's encoder-decoder architecture and how it uses attention mechanisms without recurrence and convolution. Understand the importance of positional information in machine translation models.
Make Your Own Quizzes and Flashcards
Convert your notes into interactive study material.
Get started for free