Language Modeling with nn

Choose a study mode

Play Quiz

Study Flashcards

Spaced Repetition

Chat to Lesson

Podcast

Listen to an AI-generated conversation about this lesson

Download our mobile app to listen on the go

Get App

Questions and Answers

What is the loss after epoch 1?

3.38

What is the perplexity after epoch 1?

29.50

What is the valid loss after epoch 2?

1.83

What is the valid perplexity after epoch 3?

3.60

Signup and view all the answers

Why isn't the log-softmax function applied in this case?

The log-softmax function isn't applied here due to the later use of CrossEntropyLoss, which requires the inputs to be unnormalized logits.

Signup and view all the answers

What is the shape of the src tensor?

The shape of the src tensor is [seq_len, batch_size].

Signup and view all the answers

What is the purpose of positional encodings?

The positional encodings inject some information about the relative or absolute position of the tokens in the sequence.

Signup and view all the answers

What does batching enable?

Batching enables more parallelizable processing.

Signup and view all the answers

What is the main difference between the transformer model and recurrent neural networks (RNNs)?

The transformer model has proven to be superior in quality for many sequence-to-sequence tasks while being more parallelizable compared to RNNs.

Signup and view all the answers

What is the purpose of the square attention mask in the language modeling task?

The square attention mask is required because the self-attention layers are only allowed to attend the earlier positions in the sequence. It masks any tokens on the future positions.

Signup and view all the answers

How does the model produce a probability distribution over output words?

The output of the model is passed through a linear layer to output unnormalized logits. These logits are then used to calculate the probability distribution using a softmax function.

Signup and view all the answers

Flashcards are hidden until you start studying