Transformers and Sequence Embeddings
10 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the value of Emb in the equation Emb = [S0S] 𝑋?

  • (4096, 4096)
  • (1, 1)
  • (4096, 1) (correct)
  • (1, 4096)
  • What is the output shape of the first attention block?

  • (4096, 4096)
  • (4096, 1)
  • (1, 1)
  • (1, 4096) (correct)
  • What is the formula for calculating attention in the self-attention mechanism?

  • softmax(𝑄𝐾𝑇 / 𝑑𝑘)𝑉 (correct)
  • softmax(𝑄𝐾)𝑇 / 𝑑𝑘 𝑉
  • softmax(𝑄𝐾𝑇) / 𝑑𝑘 𝑉
  • softmax(𝑄𝐾𝑇) 𝑉
  • What is the purpose of the cache in the self-attention mechanism?

    <p>To reduce the computational cost of the attention mechanism</p> Signup and view all the answers

    What is the shape of the input sequence to the second attention block?

    <p>(2, 4096)</p> Signup and view all the answers

    What is the output shape of the second attention block?

    <p>(2, 4096)</p> Signup and view all the answers

    What is the type of attention used in the decoder?

    <p>Scaled dot-product attention</p> Signup and view all the answers

    What is the purpose of the softmax function in the attention mechanism?

    <p>To normalize the output of the attention mechanism</p> Signup and view all the answers

    What is the shape of the input sequence to the third attention block?

    <p>(3, 4096)</p> Signup and view all the answers

    What is the output shape of the fourth attention block?

    <p>(4, 4096)</p> Signup and view all the answers

    More Like This

    112 BERT
    78 questions

    112 BERT

    HumourousBowenite avatar
    HumourousBowenite
    Transformer Networks
    5 questions

    Transformer Networks

    SupportiveStarlitSky avatar
    SupportiveStarlitSky
    Use Quizgecko on...
    Browser
    Browser