Recent Lessons

Show all results for ""

Transformer Architecture in Machine Translation

18 Questions

1 Views

Transformer Architecture in Machine Translation

Choose a study mode

Play Quiz

Study Flashcards

Spaced Repetition

Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What does the input to the transformer consist of?

Tokenized words of the input sentence
Token frequencies in the input text
Sequence of embedding vectors (correct)
Characters of the input text

How does the encoder enrich the embedding vectors?

Through frequency analysis
Through complex neural networks
Through dimensionality reduction
Through multiple encoder blocks (correct)

What is the purpose of the start-of-sentence marker?

To denote the importance of the sentence
To separate sentences in a paragraph
To indicate the end of a sentence
To initiate the translation process (correct)

What is the processing pattern used by the decoder?

Auto-regressive (C) Signup and view all the answers

What is the output of the decoder?

One token at a time (B) Signup and view all the answers

What is the role of positional encodings in the input embeddings?

To capture token positions (A) Signup and view all the answers

What is the primary purpose of the decoder in the transformer architecture?

To enrich the embedding vector with contextual information (A) Signup and view all the answers

What is the dimension of the output vector from the decoder after the linear transformation?

The size of the vocabulary (B) Signup and view all the answers

What is the purpose of the softmax layer in the transformer architecture?

To predict the probabilities of the entire vocabulary (C) Signup and view all the answers

What is the purpose of the beam search approach?

To look for the best combination of tokens (B) Signup and view all the answers

What is the greedy method used for in the transformer architecture?

To select the most probable token at a time (A) Signup and view all the answers

What is the main idea behind the paper title 'Attention Is All You Need'?

Recurrence and convolution are unnecessary for sequence-to-sequence tasks. (C) Signup and view all the answers

What is the input to the decoder after the first token is predicted?

The predicted token and the input features (B) Signup and view all the answers

Why do we need to encode word positions in machine translation models?

To differentiate between semantically identical sentences. (C) Signup and view all the answers

Where does the positional encoding happen in the transformer architecture?

After the input word embedding and before the encoder. (C) Signup and view all the answers

What is the dimensionality of the positional encoding in the base transformer?

512 elements. (D) Signup and view all the answers

Why do we use element-wise addition to combine word embeddings and positional encodings?

Because the positional encodings have the same dimension as the embeddings. (D) Signup and view all the answers

What is an alternative to element-wise addition for combining word embeddings and positional encodings?

Vector concatenation. (C) Signup and view all the answers