Language Modeling with nn

HumourousBowenite avatar
HumourousBowenite
·
·
Download

Start Quiz

Study Flashcards

11 Questions

What is the loss after epoch 1?

3.38

What is the perplexity after epoch 1?

29.50

What is the valid loss after epoch 2?

1.83

What is the valid perplexity after epoch 3?

3.60

Why isn't the log-softmax function applied in this case?

The log-softmax function isn't applied here due to the later use of CrossEntropyLoss, which requires the inputs to be unnormalized logits.

What is the shape of the src tensor?

The shape of the src tensor is [seq_len, batch_size].

What is the purpose of positional encodings?

The positional encodings inject some information about the relative or absolute position of the tokens in the sequence.

What does batching enable?

Batching enables more parallelizable processing.

What is the main difference between the transformer model and recurrent neural networks (RNNs)?

The transformer model has proven to be superior in quality for many sequence-to-sequence tasks while being more parallelizable compared to RNNs.

What is the purpose of the square attention mask in the language modeling task?

The square attention mask is required because the self-attention layers are only allowed to attend the earlier positions in the sequence. It masks any tokens on the future positions.

How does the model produce a probability distribution over output words?

The output of the model is passed through a linear layer to output unnormalized logits. These logits are then used to calculate the probability distribution using a softmax function.

Test your understanding of language modeling with nn.Transformer and torchtext by taking this quiz. Assess your knowledge on training a model to predict the next word in a sequence using the nn.Transformer module and learn how it compares to RNNs.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free
Use Quizgecko on...
Browser
Browser