27 - Decoder Architectures and Retrieval
15 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the purpose of using a modified softmax in decoder-only models?

To control the entropy of the distribution

What is the solution to the issue of sampling a poor word in decoder-only models?

Consider more than one candidate, such as using beam search or top-k sampling

What is the key task in the pre-training phase of Generative Pre-trained Transformer (GPT)?

Generative language modelling task

What is the purpose of finetuning in the context of Generative Pre-trained Transformer (GPT)?

<p>Better generalization and faster convergence</p> Signup and view all the answers

What is the significance of using a delimiter token in GPT for separating structured data?

<p>To separate structured data like questions and possible answers</p> Signup and view all the answers

How does Transformer-XL extend the context from a fixed size to a variable size?

<p>By introducing recurrence into self-attention</p> Signup and view all the answers

What distinguishes GPT-2 and GPT-3 in terms of architecture?

<p>GPT-3 has more parameters (175B) compared to GPT-2 (1.5B)</p> Signup and view all the answers

How does the Transformer-XL deal with avoiding temporal confusion when reusing hidden states?

<p>By introducing new positional encoding</p> Signup and view all the answers

Why are start, end, and extract special tokens randomly initialized during finetuning?

<p>To prevent bias and ensure flexibility in learning</p> Signup and view all the answers

Explain why Generative AI models 'hallucinate' and how it affects their performance.

<p>Generative AI models 'hallucinate' due to the lack of 'facts' in their understanding, and minimizing randomness can decrease their performance.</p> Signup and view all the answers

What are the challenges associated with scaling language models?

<p>The cost of larger models grows quickly, leading to increased latency. Additionally, training data needs to be scaled along with the models.</p> Signup and view all the answers

Explain the concept of Retriever Augmented Generation (RAG) and its approach.

<p>RAG is an encoder-decoder approach based on BERT and BART that indexes short sequences, retrieves relevant documents based on user queries, and conditions word probabilities on retrieved documents.</p> Signup and view all the answers

Why is it challenging to train Generative Pre-trained Transformer (GPT) models on new information?

<p>Training GPT models on new information is challenging due to the limited availability of training data for such information.</p> Signup and view all the answers

What is the significance of connecting models to a database or search in the context of Generative AI?

<p>Connecting models to a database or search is important for accessing domain-specific data that may be limited but crucial for generating accurate outputs.</p> Signup and view all the answers

How does the use of domain-specific data impact the performance of language models?

<p>Utilizing domain-specific data is essential for improving the accuracy and relevance of language models to specific tasks or topics.</p> Signup and view all the answers

More Like This

Insurance Fine-Tuning Quiz
5 questions
Benefits of Fine-Tuning in AI Models
10 questions
Fine-Tuning Language Models
20 questions

Fine-Tuning Language Models

FlawlessFantasy4551 avatar
FlawlessFantasy4551
Use Quizgecko on...
Browser
Browser