Transformer Architecture in Machine Translation
18 Questions
1 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What does the input to the transformer consist of?

  • Tokenized words of the input sentence
  • Token frequencies in the input text
  • Sequence of embedding vectors (correct)
  • Characters of the input text

How does the encoder enrich the embedding vectors?

  • Through frequency analysis
  • Through complex neural networks
  • Through dimensionality reduction
  • Through multiple encoder blocks (correct)

What is the purpose of the start-of-sentence marker?

  • To denote the importance of the sentence
  • To separate sentences in a paragraph
  • To indicate the end of a sentence
  • To initiate the translation process (correct)

What is the processing pattern used by the decoder?

<p>Auto-regressive (C)</p> Signup and view all the answers

What is the output of the decoder?

<p>One token at a time (B)</p> Signup and view all the answers

What is the role of positional encodings in the input embeddings?

<p>To capture token positions (A)</p> Signup and view all the answers

What is the primary purpose of the decoder in the transformer architecture?

<p>To enrich the embedding vector with contextual information (A)</p> Signup and view all the answers

What is the dimension of the output vector from the decoder after the linear transformation?

<p>The size of the vocabulary (B)</p> Signup and view all the answers

What is the purpose of the softmax layer in the transformer architecture?

<p>To predict the probabilities of the entire vocabulary (C)</p> Signup and view all the answers

What is the purpose of the beam search approach?

<p>To look for the best combination of tokens (B)</p> Signup and view all the answers

What is the greedy method used for in the transformer architecture?

<p>To select the most probable token at a time (A)</p> Signup and view all the answers

What is the main idea behind the paper title 'Attention Is All You Need'?

<p>Recurrence and convolution are unnecessary for sequence-to-sequence tasks. (C)</p> Signup and view all the answers

What is the input to the decoder after the first token is predicted?

<p>The predicted token and the input features (B)</p> Signup and view all the answers

Why do we need to encode word positions in machine translation models?

<p>To differentiate between semantically identical sentences. (C)</p> Signup and view all the answers

Where does the positional encoding happen in the transformer architecture?

<p>After the input word embedding and before the encoder. (C)</p> Signup and view all the answers

What is the dimensionality of the positional encoding in the base transformer?

<p>512 elements. (D)</p> Signup and view all the answers

Why do we use element-wise addition to combine word embeddings and positional encodings?

<p>Because the positional encodings have the same dimension as the embeddings. (D)</p> Signup and view all the answers

What is an alternative to element-wise addition for combining word embeddings and positional encodings?

<p>Vector concatenation. (C)</p> Signup and view all the answers

More Like This

Replicating Machine Learning Research
12 questions
Procesamiento del Lenguaje Natural con T2
90 questions
Transformer in Analytics
8 questions
Use Quizgecko on...
Browser
Browser