Chapter 2: Understanding Foundation Models

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is a foundation model?

A foundation model is a type of machine learning model that can be used to build applications.

Which of the following is NOT a common design decision for foundation models?

Training data
Model architecture
Model size
Number of GPUs used (correct)

Transformer architecture is the only architecture used in language-based foundation models.

False (B)

What are the two steps involved in the pre-training process of a foundation model?

Pre-training is often divided into two steps: pre-training and post-training. Pre-training makes a model capable, but not necessarily safe or easy to use. Post-training is where you align the model with human preferences. Signup and view all the answers

What is the difference between parameters and hyperparameters in a model?

Parameters are learned by the model during training, while hyperparameters are set by users to control how the model learns. Signup and view all the answers

The scaling law states that the number of training tokens should be 20 times the model size for optimal performance.

True (A) Signup and view all the answers

What are the two main types of post-training?

Post-training is generally divided into two steps: Supervised Fine-tuning (SFT) and Preference Finetuning. SFT focuses on making the model better at understanding instructions and performing tasks, while Preference Finetuning focuses on aligning the model with human preferences. Signup and view all the answers

How does the "best of N" method work for test time compute?

The "best of N" method involves generating multiple outputs from the model and then selecting the output that performs best based on a defined metric. Signup and view all the answers

Hallucinations are a major obstacle in training large language models but have no real-world impact when the model is deployed.

False (B) Signup and view all the answers

What is the primary reason for the internet data bottleneck in the training of large language models?

The increasing rate of data generation (C) Signup and view all the answers

What is the most common category of tasks that require structured outputs?

Semantic parsing, which involves converting natural language to a structured, machine-readable format. Signup and view all the answers

What is the purpose of constrained sampling?

Constrained sampling guides the model's generation process to ensure that the generated outputs adhere to specific format constraints. Signup and view all the answers

Finetuning is the most effective and general approach to ensure that models generate structured outputs.

True (A) Signup and view all the answers

The probabilistic nature of large language models is always a positive factor for their performance and reliability.

False (B) Signup and view all the answers

What are the two main scenarios that demonstrate model inconsistency?

Model inconsistency can manifest in two ways: (1) same input, different outputs - where identical prompts produce differing responses, and (2) slightly different input, drastically different outputs - where minor changes in the prompt can result in significantly varied responses. Signup and view all the answers

What two potential approaches can help mitigate hallucinations in language models?

Two primary approaches are: (1) incorporating factual and counterfactual signals in training data to encourage the model to rely on verified information, and (2) refining the model to provide more accurate information and flag uncertainties by prompting the model to say "I don't know" when necessary. Signup and view all the answers

Flashcards

Post-Training

The process of adjusting a pre-trained model to produce outputs that align with human preferences.

Supervised Finetuning (SFT)

A process that uses high-quality instruction data to fine-tune a pre-trained model for conversational tasks.

Self-Supervised Pre-training

A type of machine learning where a model learns to predict the next token in a sequence based on previous tokens.