Podcast
Questions and Answers
Select all that apply to overfitting. (Zero, one, or more choices can be correct.)
Select all that apply to overfitting. (Zero, one, or more choices can be correct.)
Given a sequence dataset with a vocabulary with V possible words, how many n-grams (of length n) could in theory be observed?
Given a sequence dataset with a vocabulary with V possible words, how many n-grams (of length n) could in theory be observed?
Given a sequence of length L, how many total n-grams are observed?
Given a sequence of length L, how many total n-grams are observed?
Given a dataset of N sequences, and total number of tokens T, we insert a special token <S> at the beginning of each sequence, and </S> at the end. How many total ngrams are observed in this dataset?
Given a dataset of N sequences, and total number of tokens T, we insert a special token <S> at the beginning of each sequence, and </S> at the end. How many total ngrams are observed in this dataset?
Signup and view all the answers
How many image patches of size (m x n) can be constructed from an image of size (W x H) if we form patches centered at a pixel in the image (using padding at the edges)?
How many image patches of size (m x n) can be constructed from an image of size (W x H) if we form patches centered at a pixel in the image (using padding at the edges)?
Signup and view all the answers
We are trying to represent a conversation tree. The vocabulary has V possible words, and we want pairwise features of bigrams. An example feature would count when the bigram "do_you" appears in a parent message and “i_do” appears in a reply to that message. How many such features could, in theory, be observed?
We are trying to represent a conversation tree. The vocabulary has V possible words, and we want pairwise features of bigrams. An example feature would count when the bigram "do_you" appears in a parent message and “i_do” appears in a reply to that message. How many such features could, in theory, be observed?
Signup and view all the answers
Consider a particular edge in this conversation tree: the parent has A tokens and the reply has B tokens. How many pairwise bigram features (constructed as in the previous question) are observed?
Consider a particular edge in this conversation tree: the parent has A tokens and the reply has B tokens. How many pairwise bigram features (constructed as in the previous question) are observed?
Signup and view all the answers
When using a linear model, can we encode a sequence (text/dna) just as a vector of "word" indices e.g. [42, 11, 2, 7, 11, 2] with no padding? Answer with 'yes' or 'no' and explain why. Hint: consider two different sequences.
When using a linear model, can we encode a sequence (text/dna) just as a vector of "word" indices e.g. [42, 11, 2, 7, 11, 2] with no padding? Answer with 'yes' or 'no' and explain why. Hint: consider two different sequences.
Signup and view all the answers
Signup and view all the answers
Study Notes
Quiz 2 - Study Notes
-
Question 1: Overfitting
- Overfitting occurs when a model learns the training data's characteristics excessively, leading to poor performance on new data.
- This happens when the model is overly complex, fitting the noise in the data.
- Overfitting contrasts with a model that is too simple, failing to capture the nuances.
- Overfitting also contradicts a model with sufficient regularization.
-
Question 2: N-grams
- Given a vocabulary of V words, the number of possible n-grams (sequences of length n) is Vn.
-
Question 3: Total N-grams
- In a sequence of length L, the total number of n-grams is L - (n-1).
-
Question 4: N-grams in a Dataset of Sequences
- In a dataset of N sequences, each of length T and adding beginning and end special tokens, the total number of n-grams is T + N * (3 - n).
-
Question 5: Image Patches
- The number of (m x n) patches that can be extracted from a (W x H) image, centered at a pixel with padding, is (W - m + 1) * (H - n + 1).
-
Question 6: Bigram Features in Conversation Tree
- If there's a vocabulary of V words, the total number of bigram features that can be observed is V2.
-
Question 7: Bigram Features in a Conversation Tree Edge
- With A tokens in the parent and B tokens in the reply, the number of pairwise bigram features is (A - 1) * (B - 1).
-
Question 8: Sequence Encoding with Linear Model
- A sequence (e.g., text or DNA) can be encoded as a vector of word indices without padding. This is a valid approach when using a linear model. Different sequences with the same words would yield different vectors.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Test your understanding of key machine learning concepts such as overfitting, n-grams, and image patches. This quiz covers fundamental principles that are crucial for developing robust models. Perfect for students looking to reinforce their learning in machine learning.