Transformers and Sequence Embeddings
10 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the value of Emb in the equation Emb = [S0S] 𝑋?

  • (4096, 4096)
  • (1, 1)
  • (4096, 1) (correct)
  • (1, 4096)

What is the output shape of the first attention block?

  • (4096, 4096)
  • (4096, 1)
  • (1, 1)
  • (1, 4096) (correct)

What is the formula for calculating attention in the self-attention mechanism?

  • softmax(𝑄𝐾𝑇 / 𝑑𝑘)𝑉 (correct)
  • softmax(𝑄𝐾)𝑇 / 𝑑𝑘 𝑉
  • softmax(𝑄𝐾𝑇) / 𝑑𝑘 𝑉
  • softmax(𝑄𝐾𝑇) 𝑉

What is the purpose of the cache in the self-attention mechanism?

<p>To reduce the computational cost of the attention mechanism (B)</p> Signup and view all the answers

What is the shape of the input sequence to the second attention block?

<p>(2, 4096) (B)</p> Signup and view all the answers

What is the output shape of the second attention block?

<p>(2, 4096) (D)</p> Signup and view all the answers

What is the type of attention used in the decoder?

<p>Scaled dot-product attention (B)</p> Signup and view all the answers

What is the purpose of the softmax function in the attention mechanism?

<p>To normalize the output of the attention mechanism (B)</p> Signup and view all the answers

What is the shape of the input sequence to the third attention block?

<p>(3, 4096) (B)</p> Signup and view all the answers

What is the output shape of the fourth attention block?

<p>(4, 4096) (B)</p> Signup and view all the answers

More Like This

112 BERT
78 questions

112 BERT

HumourousBowenite avatar
HumourousBowenite
Transformer Networks
5 questions

Transformer Networks

SupportiveStarlitSky avatar
SupportiveStarlitSky
Use Quizgecko on...
Browser
Browser