Natural Language Processing Quiz

Definition: Predicts the likelihood of a sequence of words (e.g., next word prediction).
Types:
- Statistical Models: n-grams, which use probability based on word sequences.
- Neural Models: Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, Transformers.
Applications: Speech recognition, text generation, and autocomplete systems.

Definition: Automatic conversion of text from one language to another.
Approaches:
- Rule-Based: Uses linguistic rules and dictionaries.
- Statistical: Based on statistical models of language pairs.
- Neural Machine Translation (NMT): Uses deep learning models for improved fluency and context understanding.
Challenges: Ambiguity, cultural context, idiomatic expressions.

Importance: Prepares raw text data for NLP tasks.
Steps:
- Tokenization: Splitting text into words or phrases.
- Normalization: Lowercasing, removing punctuation, stemming, and lemmatization.
- Stopword Removal: Eliminating commonly used words that may not add significant meaning.
- Vectorization: Converting text into numerical format (e.g., TF-IDF, Word Embeddings).

Definition: Identifies and categorizes emotions expressed in text.
Approaches:
- Lexicon-Based: Uses predefined lists of words with associated sentiments.
- Machine Learning: Trains models on annotated datasets to classify sentiments.
- Deep Learning: Utilizes neural networks for feature extraction and sentiment classification.
Applications: Customer feedback, social media monitoring, brand analysis.

Definition: Identifies and classifies key entities in text (e.g., names, organizations, locations).
Approaches:
- Rule-Based: Uses predefined patterns or dictionaries.
- Statistical Models: Conditional Random Fields (CRFs) or support vector machines (SVMs).
- Deep Learning: Employs neural networks such as LSTMs or Transformers.
Applications: Information extraction, question answering, and content recommendation.

Language models predict the likelihood of a sequence of words, essentially forecasting what word comes next.
Statistical models, like n-grams, rely on probabilities derived from analyzing word sequences.
Neural models, including RNNs, LSTMs, and Transformers, learn patterns in language using complex networks.
These models are used in applications like speech recognition, text generation, and auto-complete systems.

Machine translation automatically converts text from one language to another.
Rule-Based methods use linguistic rules and dictionaries for translation.
Statistical methods leverage statistical models trained on pairs of languages.
Neural Machine Translation (NMT) utilizes deep learning to achieve greater fluency and context understanding.
Challenges include ambiguity, cultural nuances, and idiomatic expressions.

Text preprocessing prepares raw text data for NLP tasks.
Tokenization involves splitting text into meaningful units, like words or phrases.
Normalization includes steps like lowercasing, removing punctuation, stemming, and lemmatization to standardize text.
Stopword removal eliminates commonly used words that don't contribute much to meaning.
Vectorization converts text into numerical representations, using techniques like TF-IDF and Word Embeddings.

Sentiment analysis analyzes text to determine the expressed emotion.
Lexicon-Based methods rely on pre-defined lists of words with associated sentiments.
Machine learning employs models trained on labeled data for sentiment classification.
Deep learning leverages neural networks for feature extraction and sentiment classification.
Applications include gauging customer feedback, monitoring social media, and analyzing brand sentiment.

NER identifies and classifies key entities, like names, organizations, and locations, within a text.
Rule-Based approaches use predefined patterns or dictionaries to recognize entities.
Statistical models utilize techniques like Conditional Random Fields (CRFs) or support vector machines (SVMs).
Deep learning relies on neural networks like LSTMs or Transformers.
Applications in information extraction, question answering, and content recommendation.