Podcast
Questions and Answers
What is the purpose of tokenization in text preprocessing?
What is the purpose of tokenization in text preprocessing?
- To convert text into numerical vectors directly
- To remove all punctuation marks from the text
- To break up the words in the text into individual tokens (correct)
- To combine words into phrases for better analysis
What is a key step in constructing a vocabulary for text vectorization?
What is a key step in constructing a vocabulary for text vectorization?
- Tokenization to form individual tokens (correct)
- Stemming and lemmatization of words
- Removing all stopwords from the text
- Converting text into numerical vectors
What does the vocabulary of tokens represent in text preprocessing?
What does the vocabulary of tokens represent in text preprocessing?
- The length of each token in the text data
- The position of each token in the original text
- The set of all unique tokens occurring in the data (correct)
- The frequency of each token in the text data