Podcast
Questions and Answers
What is the purpose of using 16-bit floats in neural networks?
What is the purpose of using 16-bit floats in neural networks?
Reduce memory by 50%
How can we translate weights to larger types during matrix multiplication on smaller GPUs?
How can we translate weights to larger types during matrix multiplication on smaller GPUs?
Using a lookup table in the GPU
What technique can be used for finetuning in LoRA?
What technique can be used for finetuning in LoRA?
Quantization
What method does InstructGPT use to 'follow the user's intent'?
What method does InstructGPT use to 'follow the user's intent'?
What do most AI engineers prefer for chat optimization?
What do most AI engineers prefer for chat optimization?
What is the key concept behind Reinforcement Learning from Human Feedback?
What is the key concept behind Reinforcement Learning from Human Feedback?
How much self-attention heads can be removed from BERT without a noticeable effect on BLEU/Accuracy scores?
How much self-attention heads can be removed from BERT without a noticeable effect on BLEU/Accuracy scores?
According to BhPaGo20, what can Transformers (and RNN) simulate?
According to BhPaGo20, what can Transformers (and RNN) simulate?
What have recent methods in benchmark leaderboards been criticized for?
What have recent methods in benchmark leaderboards been criticized for?
What is a key concern when it comes to the memory efficiency of large models?
What is a key concern when it comes to the memory efficiency of large models?
According to HeHo23, what can be omitted in Transformers to simplify the architecture?
According to HeHo23, what can be omitted in Transformers to simplify the architecture?
What did TaCh18 conclude about the ability of current neural network models in capturing the semantics of natural language inference?
What did TaCh18 conclude about the ability of current neural network models in capturing the semantics of natural language inference?
What is the main idea behind Fourier networks (FNet)?
What is the main idea behind Fourier networks (FNet)?
How does doubling the model size affect the training data requirement empirically?
How does doubling the model size affect the training data requirement empirically?
What is one of the challenges mentioned with training data used for large models?
What is one of the challenges mentioned with training data used for large models?
How did OpenAI reduce toxicity in ChatGPT?
How did OpenAI reduce toxicity in ChatGPT?
What is a common practice with non-English models during training?
What is a common practice with non-English models during training?
What type of data do open-source models often rely on for training?
What type of data do open-source models often rely on for training?