Transformer Architecture in Machine Translation

RapturousFeministArt avatar
RapturousFeministArt
·
·
Download

Start Quiz

Study Flashcards

18 Questions

What does the input to the transformer consist of?

Sequence of embedding vectors

How does the encoder enrich the embedding vectors?

Through multiple encoder blocks

What is the purpose of the start-of-sentence marker?

To initiate the translation process

What is the processing pattern used by the decoder?

Auto-regressive

What is the output of the decoder?

One token at a time

What is the role of positional encodings in the input embeddings?

To capture token positions

What is the primary purpose of the decoder in the transformer architecture?

To enrich the embedding vector with contextual information

What is the dimension of the output vector from the decoder after the linear transformation?

The size of the vocabulary

What is the purpose of the softmax layer in the transformer architecture?

To predict the probabilities of the entire vocabulary

What is the purpose of the beam search approach?

To look for the best combination of tokens

What is the greedy method used for in the transformer architecture?

To select the most probable token at a time

What is the main idea behind the paper title 'Attention Is All You Need'?

Recurrence and convolution are unnecessary for sequence-to-sequence tasks.

What is the input to the decoder after the first token is predicted?

The predicted token and the input features

Why do we need to encode word positions in machine translation models?

To differentiate between semantically identical sentences.

Where does the positional encoding happen in the transformer architecture?

After the input word embedding and before the encoder.

What is the dimensionality of the positional encoding in the base transformer?

512 elements.

Why do we use element-wise addition to combine word embeddings and positional encodings?

Because the positional encodings have the same dimension as the embeddings.

What is an alternative to element-wise addition for combining word embeddings and positional encodings?

Vector concatenation.

Learn about the transformer's encoder-decoder architecture and how it uses attention mechanisms without recurrence and convolution. Understand the importance of positional information in machine translation models.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free

More Quizzes Like This

Use Quizgecko on...
Browser
Browser