Recent Lessons

Show all results for ""

27 - Decoder Architectures and Retrieval

Choose a study mode

Play Quiz

Study Flashcards

Spaced Repetition

Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the purpose of using a modified softmax in decoder-only models?

To control the entropy of the distribution

What is the solution to the issue of sampling a poor word in decoder-only models?

Consider more than one candidate, such as using beam search or top-k sampling

What is the key task in the pre-training phase of Generative Pre-trained Transformer (GPT)?

Generative language modelling task

What is the purpose of finetuning in the context of Generative Pre-trained Transformer (GPT)?

Better generalization and faster convergence Signup and view all the answers

What is the significance of using a delimiter token in GPT for separating structured data?

To separate structured data like questions and possible answers Signup and view all the answers

How does Transformer-XL extend the context from a fixed size to a variable size?

By introducing recurrence into self-attention Signup and view all the answers

What distinguishes GPT-2 and GPT-3 in terms of architecture?

GPT-3 has more parameters (175B) compared to GPT-2 (1.5B) Signup and view all the answers

How does the Transformer-XL deal with avoiding temporal confusion when reusing hidden states?

By introducing new positional encoding Signup and view all the answers

Why are start, end, and extract special tokens randomly initialized during finetuning?

To prevent bias and ensure flexibility in learning Signup and view all the answers

Explain why Generative AI models 'hallucinate' and how it affects their performance.

Generative AI models 'hallucinate' due to the lack of 'facts' in their understanding, and minimizing randomness can decrease their performance. Signup and view all the answers

What are the challenges associated with scaling language models?

The cost of larger models grows quickly, leading to increased latency. Additionally, training data needs to be scaled along with the models. Signup and view all the answers

Explain the concept of Retriever Augmented Generation (RAG) and its approach.

RAG is an encoder-decoder approach based on BERT and BART that indexes short sequences, retrieves relevant documents based on user queries, and conditions word probabilities on retrieved documents. Signup and view all the answers

Why is it challenging to train Generative Pre-trained Transformer (GPT) models on new information?

Training GPT models on new information is challenging due to the limited availability of training data for such information. Signup and view all the answers

What is the significance of connecting models to a database or search in the context of Generative AI?

Connecting models to a database or search is important for accessing domain-specific data that may be limited but crucial for generating accurate outputs. Signup and view all the answers

How does the use of domain-specific data impact the performance of language models?

Utilizing domain-specific data is essential for improving the accuracy and relevance of language models to specific tasks or topics. Signup and view all the answers

Flashcards are hidden until you start studying