Podcast
Questions and Answers
What is the loss after epoch 1?
What is the loss after epoch 1?
3.38
What is the perplexity after epoch 1?
What is the perplexity after epoch 1?
29.50
What is the valid loss after epoch 2?
What is the valid loss after epoch 2?
1.83
What is the valid perplexity after epoch 3?
What is the valid perplexity after epoch 3?
Why isn't the log-softmax function applied in this case?
Why isn't the log-softmax function applied in this case?
What is the shape of the src tensor?
What is the shape of the src tensor?
What is the purpose of positional encodings?
What is the purpose of positional encodings?
What does batching enable?
What does batching enable?
What is the main difference between the transformer model and recurrent neural networks (RNNs)?
What is the main difference between the transformer model and recurrent neural networks (RNNs)?
What is the purpose of the square attention mask in the language modeling task?
What is the purpose of the square attention mask in the language modeling task?
How does the model produce a probability distribution over output words?
How does the model produce a probability distribution over output words?