Podcast
Questions and Answers
What is the main issue with One-Hot Encoding in terms of vocabulary size?
What is the main issue with One-Hot Encoding in terms of vocabulary size?
The size of a one-hot vector is directly proportional to the size of the vocabulary, which can be very large in real-world corpora.
What is a major drawback of One-Hot Encoding in terms of word relationships?
What is a major drawback of One-Hot Encoding in terms of word relationships?
It treats words as atomic units and has no notion of (dis)similarity between words.
What is the 'out of vocabulary' (OOV) problem in text representation?
What is the 'out of vocabulary' (OOV) problem in text representation?
It occurs when a model encounters a word that was not present in the training data.
What is the main idea behind the Bag of Words (BoW) representation technique?
What is the main idea behind the Bag of Words (BoW) representation technique?
Signup and view all the answers
How does the Bag of Words (BoW) technique determine the class of a text piece?
How does the Bag of Words (BoW) technique determine the class of a text piece?
Signup and view all the answers
What is the primary advantage of One-Hot Encoding over Bag of Words (BoW)?
What is the primary advantage of One-Hot Encoding over Bag of Words (BoW)?
Signup and view all the answers
How do One-Hot Encoding and Bag of Words (BoW) differ in their treatment of word order?
How do One-Hot Encoding and Bag of Words (BoW) differ in their treatment of word order?
Signup and view all the answers
What is the relationship between the size of the vocabulary and the size of a one-hot vector?
What is the relationship between the size of the vocabulary and the size of a one-hot vector?
Signup and view all the answers
How does the scikit-learn implementation of One-Hot Encoding address the limitations of this technique?
How does the scikit-learn implementation of One-Hot Encoding address the limitations of this technique?
Signup and view all the answers
What is the main benefit of using a fixed-length representation for text?
What is the main benefit of using a fixed-length representation for text?
Signup and view all the answers