Neural Machine Translation Components Overview

ImpressiveMountainPeak avatar
ImpressiveMountainPeak
·
·
Download

Start Quiz

Study Flashcards

Questions and Answers

What is the purpose of the self-attention layer in the encoding component?

To help the encoder look at other words in the input sentence as it encodes a specific word

What is the primary function of the embedding layer in a Transformer model?

Capture the meaning of each word or token in a vector space

Where does the embedding algorithm operate in the encoder-decoder model described?

In the bottom-most encoder

In the Transformer architecture, what are the two sub-layers present in each encoder or decoder layer?

<p>Self-attention mechanism and feedforward neural network</p> Signup and view all the answers

How does the multi-head attention mechanism in Transformers handle attending to different parts of the input sequence simultaneously?

<p>By applying multiple self-attention mechanisms in parallel</p> Signup and view all the answers

What is the purpose of the attention layer between the decoder's self-attention and feed-forward layers?

<p>To help the decoder focus on relevant parts of the input sentence</p> Signup and view all the answers

What is common to all the encoders described in the text?

<p>They receive a list of vectors, each of size 512</p> Signup and view all the answers

What is the purpose of the feedforward neural network component in the Transformer architecture?

<p>Applying non-linear transformations to the input</p> Signup and view all the answers

How does the self-attention mechanism in Transformers allow the model to focus on different parts of the input sequence?

<p>By learning and calculating attention weights for each position</p> Signup and view all the answers

How does each word in the input sequence flow through an encoder?

<p>Each word flows through each of the two layers of the encoder</p> Signup and view all the answers

Which component of Transformer models helps in capturing the semantic meaning of individual words or tokens?

<p>Embedding Layer</p> Signup and view all the answers

What determines the length of the list of vectors received by each encoder?

<p>The length of the longest sentence in the training dataset</p> Signup and view all the answers

What is the purpose of the Output layer in the described model architecture?

<p>Converting the encoded representation into word probabilities</p> Signup and view all the answers

What role does the Decoder stack play in the processing of the target sequence?

<p>It processes the encoded representation from the Encoder stack</p> Signup and view all the answers

How does a pre-trained model benefit downstream NLP tasks?

<p>By fine-tuning on a specific downstream task</p> Signup and view all the answers

In the described model architecture, what happens after taking the last word of the output sequence as the predicted word?

<p>The word is filled into the second position of the Decoder input sequence</p> Signup and view all the answers

What is the primary purpose of training a model on a general task before fine-tuning it on a specific downstream task?

<p>To learn general language representations</p> Signup and view all the answers

Why is it unnecessary to repeat steps #1 and #2 for each iteration in the described model architecture?

<p>Because the Encoder sequence remains unchanged</p> Signup and view all the answers

Use Quizgecko on...
Browser
Browser