Recent Lessons

Show all results for ""

Transformers and Sequence Embeddings

Choose a study mode

Play Quiz

Study Flashcards

Spaced Repetition

Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

What is the value of Emb in the equation Emb = [S0S] 𝑋?

(4096, 4096)
(1, 1)
(4096, 1) (correct)
(1, 4096)

What is the output shape of the first attention block?

(4096, 4096)
(4096, 1)
(1, 1)
(1, 4096) (correct)

What is the formula for calculating attention in the self-attention mechanism?

softmax(𝑄𝐾𝑇 / 𝑑𝑘)𝑉 (correct)
softmax(𝑄𝐾)𝑇 / 𝑑𝑘 𝑉
softmax(𝑄𝐾𝑇) / 𝑑𝑘 𝑉
softmax(𝑄𝐾𝑇) 𝑉

What is the purpose of the cache in the self-attention mechanism?

To reduce the computational cost of the attention mechanism (B) Signup and view all the answers

What is the shape of the input sequence to the second attention block?

(2, 4096) (B) Signup and view all the answers

What is the output shape of the second attention block?

(2, 4096) (D) Signup and view all the answers

What is the type of attention used in the decoder?

Scaled dot-product attention (B) Signup and view all the answers

What is the purpose of the softmax function in the attention mechanism?

To normalize the output of the attention mechanism (B) Signup and view all the answers

What is the shape of the input sequence to the third attention block?

(3, 4096) (B) Signup and view all the answers

What is the output shape of the fourth attention block?

(4, 4096) (B) Signup and view all the answers

Flashcards are hidden until you start studying