Transformers and Sequence Embeddings

AppropriateCynicalRealism avatar
AppropriateCynicalRealism
·
·
Download

Start Quiz

Study Flashcards

Questions and Answers

What is the value of Emb in the equation Emb = [S0S] 𝑋?

(4096, 1)

What is the output shape of the first attention block?

(1, 4096)

What is the formula for calculating attention in the self-attention mechanism?

softmax(𝑄𝐾𝑇 / 𝑑𝑘)𝑉

What is the purpose of the cache in the self-attention mechanism?

<p>To reduce the computational cost of the attention mechanism</p> Signup and view all the answers

What is the shape of the input sequence to the second attention block?

<p>(2, 4096)</p> Signup and view all the answers

What is the output shape of the second attention block?

<p>(2, 4096)</p> Signup and view all the answers

What is the type of attention used in the decoder?

<p>Scaled dot-product attention</p> Signup and view all the answers

What is the purpose of the softmax function in the attention mechanism?

<p>To normalize the output of the attention mechanism</p> Signup and view all the answers

What is the shape of the input sequence to the third attention block?

<p>(3, 4096)</p> Signup and view all the answers

What is the output shape of the fourth attention block?

<p>(4, 4096)</p> Signup and view all the answers

More Quizzes Like This

Test Your Knowledge
8 questions
Transformer Networks
5 questions

Transformer Networks

SupportiveStarlitSky avatar
SupportiveStarlitSky
Transformers Demystified
1 questions
Use Quizgecko on...
Browser
Browser