Recent Lessons

Show all results for ""

28 - Finetuning, Beyond Transformers and Limitations

18 Questions

0 Views

28 - Finetuning, Beyond Transformers and Limitations

Choose a study mode

Play Quiz

Study Flashcards

Spaced Repetition

Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the purpose of using 16-bit floats in neural networks?

Reduce memory by 50%

How can we translate weights to larger types during matrix multiplication on smaller GPUs?

Using a lookup table in the GPU

What technique can be used for finetuning in LoRA?

Quantization

What method does InstructGPT use to 'follow the user's intent'?

Reinforcement Learning from Human Feedback (RLHF) Signup and view all the answers

What do most AI engineers prefer for chat optimization?

Direct Preference Optimization (DPO) Signup and view all the answers

What is the key concept behind Reinforcement Learning from Human Feedback?

Collect prompts and human example answers Signup and view all the answers

How much self-attention heads can be removed from BERT without a noticeable effect on BLEU/Accuracy scores?

Up to 40% Signup and view all the answers

According to BhPaGo20, what can Transformers (and RNN) simulate?

Turing machines Signup and view all the answers

What have recent methods in benchmark leaderboards been criticized for?

Being empirical with little supporting theory Signup and view all the answers

What is a key concern when it comes to the memory efficiency of large models?

Memory conservation Signup and view all the answers

According to HeHo23, what can be omitted in Transformers to simplify the architecture?

Skip connections, value and projection parameters, and positional encoding Signup and view all the answers

What did TaCh18 conclude about the ability of current neural network models in capturing the semantics of natural language inference?

They are not able to generalize Signup and view all the answers

What is the main idea behind Fourier networks (FNet)?

The main idea is to replace self-attention with non-parametric Fast Fourier Transform to reveal periodicities in the input. Signup and view all the answers

How does doubling the model size affect the training data requirement empirically?

Doubling the model size empirically requires double the training data. Signup and view all the answers

What is one of the challenges mentioned with training data used for large models?

Some of the training data used is copyrighted, making it unclear if it can be used, especially as news outlets now disallow GPT crawlers. Signup and view all the answers

How did OpenAI reduce toxicity in ChatGPT?

OpenAI used low-wage workers in the third world to scrub toxicity from ChatGPT. Signup and view all the answers

What is a common practice with non-English models during training?

Non-English models are often trained on poorly translated data. Signup and view all the answers

What type of data do open-source models often rely on for training?

Open-source models often rely on training data generated by GPT or have GPT produce the 'human' feedback. Signup and view all the answers