Podcast
Questions and Answers
What is the purpose of stop word filtering in text preprocessing?
What is the purpose of stop word filtering in text preprocessing?
- Removing any special characters from the text
- Converting all text to lowercase
- Removing unnecessary words from the text (correct)
- Correcting spelling mistakes in the text
Which tool can be used to create a document vector table for all the documents in the corpus?
Which tool can be used to create a document vector table for all the documents in the corpus?
- Tokenisation
- Bag of Words (correct)
- Sentence Segmentation
- Stemming
In the context of the corpus, what does 'TFIDF' stand for?
In the context of the corpus, what does 'TFIDF' stand for?
- Total Frequency of Important Document Features
- True Frequency of Document Files
- Text Feature Identification for Documents
- Term Frequency-Inverse Document Frequency (correct)
What is the purpose of Lemmatisation in text analysis?
What is the purpose of Lemmatisation in text analysis?
What is the purpose of tokenisation in text normalisation?
What is the purpose of tokenisation in text normalisation?
Why are stopwords removed in the text normalisation process?
Why are stopwords removed in the text normalisation process?
In text normalisation, what is the purpose of sentence segmentation?
In text normalisation, what is the purpose of sentence segmentation?
What is the main advantage of working on a corpus after tokenisation compared to before?
What is the main advantage of working on a corpus after tokenisation compared to before?
In the context of creating document vectors, what is the purpose of the step 'Create Dictionary'?
In the context of creating document vectors, what is the purpose of the step 'Create Dictionary'?
How does text normalisation affect the data before creating document vectors?
How does text normalisation affect the data before creating document vectors?
What does a value of '1' under a word in a document vector indicate?
What does a value of '1' under a word in a document vector indicate?
Why is it important to increment the value by 1 for a word that appears more than once in a document?
Why is it important to increment the value by 1 for a word that appears more than once in a document?