Podcast
Questions and Answers
What is the loss after epoch 1?
What is the loss after epoch 1?
3.38
What is the perplexity after epoch 1?
What is the perplexity after epoch 1?
29.50
What is the valid loss after epoch 2?
What is the valid loss after epoch 2?
1.83
What is the valid perplexity after epoch 3?
What is the valid perplexity after epoch 3?
Signup and view all the answers
Why isn't the log-softmax function applied in this case?
Why isn't the log-softmax function applied in this case?
Signup and view all the answers
What is the shape of the src tensor?
What is the shape of the src tensor?
Signup and view all the answers
What is the purpose of positional encodings?
What is the purpose of positional encodings?
Signup and view all the answers
What does batching enable?
What does batching enable?
Signup and view all the answers
What is the main difference between the transformer model and recurrent neural networks (RNNs)?
What is the main difference between the transformer model and recurrent neural networks (RNNs)?
Signup and view all the answers
What is the purpose of the square attention mask in the language modeling task?
What is the purpose of the square attention mask in the language modeling task?
Signup and view all the answers
How does the model produce a probability distribution over output words?
How does the model produce a probability distribution over output words?
Signup and view all the answers