Quiz 2 PDF

Quiz 2   This is a preview of the published version of the quiz Started: 14 Dec at 8:36 Quiz instruc ons Quiz on feature encoding for structured data. Question 1...

Quiz 2   This is a preview of the published version of the quiz Started: 14 Dec at 8:36 Quiz instruc ons Quiz on feature encoding for structured data. Question 1 1 pts Select all that apply to overfitting. (Zero, one, or more choices can be correct.) Overfitting is when the model learns the characteristics of the training data so much that the model worsens its performance Overfitting is when the model is too simple and cannot learn the characteristics of the training data. Overfitting is when the model has too much regularization. Overfitting is when the model learns the noise of the training data, hence the model cannot generalize. Question 2 1 pts Given a sequence dataset with a vocabulary with V possible words, how many n-grams (of length n) could in theory be observed? V^n n! V choose n V n V*n Question 3 1 pts  Given a sequence of length L, how many total n-grams are observed? L*n L - (n - 1) L * V^n L^n Question 4 1 pts Given a dataset of N sequences, and total number of tokens T, we insert a special token at the beginning of each sequence, and at the end. How many total ngrams are observed in this dataset? T - N*(n-1) T + N*(3-n) T + 3N Question 5 1 pts How many image patches of size (m x n) can be constructed from an image of size (W x H) if we form patches centered at a pixel in the image (using padding at the edges)? mn WH (W-m)(H-n) W^2 * H^2 Question 6 1 pts  We are trying to represent a conversation tree. The vocabulary has V possible words, and we want pairwise features of bigrams. An example feature would count when the bigram “do_you” appears in a parent message and “i_do” appears in a reply to that message. How many such features could, in theory, be observed? V^2 V^3 V^4 V^6 Question 7 1 pts Consider a particular edge in this conversation tree: the parent has A tokens and the reply has B tokens. How many pairwise bigram features (constructed as in the previous question) are observed? AB A+B (A-1) + (B-1) (A-1) * (B-1) Question 8 1 pts When using a linear model, can we encode a sequence (text/dna) just as a vector of "word" indices e.g. [42, 11, 2, 7, 11, 2] with no padding? Answer with 'yes' or 'no' and explain why. Hint: consider two different sequences. Edit View Insert Format Tools Table 12pt Paragraph  p 0 words Not saved Submit quiz

Document Details

Tags

Related

Summary

Full Transcript