28 - Finetuning, Beyond Transformers and Limitations
18 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the purpose of using 16-bit floats in neural networks?

Reduce memory by 50%

How can we translate weights to larger types during matrix multiplication on smaller GPUs?

Using a lookup table in the GPU

What technique can be used for finetuning in LoRA?

Quantization

What method does InstructGPT use to 'follow the user's intent'?

<p>Reinforcement Learning from Human Feedback (RLHF)</p> Signup and view all the answers

What do most AI engineers prefer for chat optimization?

<p>Direct Preference Optimization (DPO)</p> Signup and view all the answers

What is the key concept behind Reinforcement Learning from Human Feedback?

<p>Collect prompts and human example answers</p> Signup and view all the answers

How much self-attention heads can be removed from BERT without a noticeable effect on BLEU/Accuracy scores?

<p>Up to 40%</p> Signup and view all the answers

According to BhPaGo20, what can Transformers (and RNN) simulate?

<p>Turing machines</p> Signup and view all the answers

What have recent methods in benchmark leaderboards been criticized for?

<p>Being empirical with little supporting theory</p> Signup and view all the answers

What is a key concern when it comes to the memory efficiency of large models?

<p>Memory conservation</p> Signup and view all the answers

According to HeHo23, what can be omitted in Transformers to simplify the architecture?

<p>Skip connections, value and projection parameters, and positional encoding</p> Signup and view all the answers

What did TaCh18 conclude about the ability of current neural network models in capturing the semantics of natural language inference?

<p>They are not able to generalize</p> Signup and view all the answers

What is the main idea behind Fourier networks (FNet)?

<p>The main idea is to replace self-attention with non-parametric Fast Fourier Transform to reveal periodicities in the input.</p> Signup and view all the answers

How does doubling the model size affect the training data requirement empirically?

<p>Doubling the model size empirically requires double the training data.</p> Signup and view all the answers

What is one of the challenges mentioned with training data used for large models?

<p>Some of the training data used is copyrighted, making it unclear if it can be used, especially as news outlets now disallow GPT crawlers.</p> Signup and view all the answers

How did OpenAI reduce toxicity in ChatGPT?

<p>OpenAI used low-wage workers in the third world to scrub toxicity from ChatGPT.</p> Signup and view all the answers

What is a common practice with non-English models during training?

<p>Non-English models are often trained on poorly translated data.</p> Signup and view all the answers

What type of data do open-source models often rely on for training?

<p>Open-source models often rely on training data generated by GPT or have GPT produce the 'human' feedback.</p> Signup and view all the answers

More Like This

Use Quizgecko on...
Browser
Browser