Podcast
Questions and Answers
What is the value of Emb in the equation Emb = [S0S] 𝑋?
What is the value of Emb in the equation Emb = [S0S] 𝑋?
- (4096, 4096)
- (1, 1)
- (4096, 1) (correct)
- (1, 4096)
What is the output shape of the first attention block?
What is the output shape of the first attention block?
- (4096, 4096)
- (4096, 1)
- (1, 1)
- (1, 4096) (correct)
What is the formula for calculating attention in the self-attention mechanism?
What is the formula for calculating attention in the self-attention mechanism?
- softmax(𝑄𝐾𝑇 / 𝑑𝑘)𝑉 (correct)
- softmax(𝑄𝐾)𝑇 / 𝑑𝑘 𝑉
- softmax(𝑄𝐾𝑇) / 𝑑𝑘 𝑉
- softmax(𝑄𝐾𝑇) 𝑉
What is the purpose of the cache in the self-attention mechanism?
What is the purpose of the cache in the self-attention mechanism?
What is the shape of the input sequence to the second attention block?
What is the shape of the input sequence to the second attention block?
What is the output shape of the second attention block?
What is the output shape of the second attention block?
What is the type of attention used in the decoder?
What is the type of attention used in the decoder?
What is the purpose of the softmax function in the attention mechanism?
What is the purpose of the softmax function in the attention mechanism?
What is the shape of the input sequence to the third attention block?
What is the shape of the input sequence to the third attention block?
What is the output shape of the fourth attention block?
What is the output shape of the fourth attention block?