quiz image

27 - Decoder Architectures and Retrieval

ThrillingTuba avatar
ThrillingTuba
·
·
Download

Start Quiz

Study Flashcards

15 Questions

What is the purpose of using a modified softmax in decoder-only models?

To control the entropy of the distribution

What is the solution to the issue of sampling a poor word in decoder-only models?

Consider more than one candidate, such as using beam search or top-k sampling

What is the key task in the pre-training phase of Generative Pre-trained Transformer (GPT)?

Generative language modelling task

What is the purpose of finetuning in the context of Generative Pre-trained Transformer (GPT)?

Better generalization and faster convergence

What is the significance of using a delimiter token in GPT for separating structured data?

To separate structured data like questions and possible answers

How does Transformer-XL extend the context from a fixed size to a variable size?

By introducing recurrence into self-attention

What distinguishes GPT-2 and GPT-3 in terms of architecture?

GPT-3 has more parameters (175B) compared to GPT-2 (1.5B)

How does the Transformer-XL deal with avoiding temporal confusion when reusing hidden states?

By introducing new positional encoding

Why are start, end, and extract special tokens randomly initialized during finetuning?

To prevent bias and ensure flexibility in learning

Explain why Generative AI models 'hallucinate' and how it affects their performance.

Generative AI models 'hallucinate' due to the lack of 'facts' in their understanding, and minimizing randomness can decrease their performance.

What are the challenges associated with scaling language models?

The cost of larger models grows quickly, leading to increased latency. Additionally, training data needs to be scaled along with the models.

Explain the concept of Retriever Augmented Generation (RAG) and its approach.

RAG is an encoder-decoder approach based on BERT and BART that indexes short sequences, retrieves relevant documents based on user queries, and conditions word probabilities on retrieved documents.

Why is it challenging to train Generative Pre-trained Transformer (GPT) models on new information?

Training GPT models on new information is challenging due to the limited availability of training data for such information.

What is the significance of connecting models to a database or search in the context of Generative AI?

Connecting models to a database or search is important for accessing domain-specific data that may be limited but crucial for generating accurate outputs.

How does the use of domain-specific data impact the performance of language models?

Utilizing domain-specific data is essential for improving the accuracy and relevance of language models to specific tasks or topics.

Learn about finetuning a model using supervised learning with labelled data sets, decoder blocks, task-specific linear layers, and weighting parameters for better generalization and faster convergence. Discover how delimiter tokens are used to separate structured data like questions and answers in a single sequence input.

Make Your Own Quizzes and Flashcards

Convert your notes into interactive study material.

Get started for free

More Quizzes Like This

Use Quizgecko on...
Browser
Browser