Transformer Architecture in Machine Translation
18 Questions
1 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What does the input to the transformer consist of?

  • Tokenized words of the input sentence
  • Token frequencies in the input text
  • Sequence of embedding vectors (correct)
  • Characters of the input text
  • How does the encoder enrich the embedding vectors?

  • Through frequency analysis
  • Through complex neural networks
  • Through dimensionality reduction
  • Through multiple encoder blocks (correct)
  • What is the purpose of the start-of-sentence marker?

  • To denote the importance of the sentence
  • To separate sentences in a paragraph
  • To indicate the end of a sentence
  • To initiate the translation process (correct)
  • What is the processing pattern used by the decoder?

    <p>Auto-regressive</p> Signup and view all the answers

    What is the output of the decoder?

    <p>One token at a time</p> Signup and view all the answers

    What is the role of positional encodings in the input embeddings?

    <p>To capture token positions</p> Signup and view all the answers

    What is the primary purpose of the decoder in the transformer architecture?

    <p>To enrich the embedding vector with contextual information</p> Signup and view all the answers

    What is the dimension of the output vector from the decoder after the linear transformation?

    <p>The size of the vocabulary</p> Signup and view all the answers

    What is the purpose of the softmax layer in the transformer architecture?

    <p>To predict the probabilities of the entire vocabulary</p> Signup and view all the answers

    What is the purpose of the beam search approach?

    <p>To look for the best combination of tokens</p> Signup and view all the answers

    What is the greedy method used for in the transformer architecture?

    <p>To select the most probable token at a time</p> Signup and view all the answers

    What is the main idea behind the paper title 'Attention Is All You Need'?

    <p>Recurrence and convolution are unnecessary for sequence-to-sequence tasks.</p> Signup and view all the answers

    What is the input to the decoder after the first token is predicted?

    <p>The predicted token and the input features</p> Signup and view all the answers

    Why do we need to encode word positions in machine translation models?

    <p>To differentiate between semantically identical sentences.</p> Signup and view all the answers

    Where does the positional encoding happen in the transformer architecture?

    <p>After the input word embedding and before the encoder.</p> Signup and view all the answers

    What is the dimensionality of the positional encoding in the base transformer?

    <p>512 elements.</p> Signup and view all the answers

    Why do we use element-wise addition to combine word embeddings and positional encodings?

    <p>Because the positional encodings have the same dimension as the embeddings.</p> Signup and view all the answers

    What is an alternative to element-wise addition for combining word embeddings and positional encodings?

    <p>Vector concatenation.</p> Signup and view all the answers

    More Like This

    Replicating Machine Learning Research
    12 questions
    Language Models and Transformers
    40 questions
    Transformer in Analytics
    8 questions
    Use Quizgecko on...
    Browser
    Browser