Word Embeddings and Bag of Words Overview

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the primary advantage of using subword embeddings like FastText?

They improve the speed of model training.
They can encode morphological information and handle OOV words. (correct)
They eliminate bias in word associations.
They effectively handle polysemy.

In the context of word embeddings, what does 'polysemy' refer to?

The improved speed of model training through word embeddings.
The presence of bias in the training data.
The ability of a word to have multiple meanings. (correct)
The issue of a model being unable to process out-of-vocabulary words.

Which task benefits most from word embeddings by capturing semantic similarities, leading to improved word alignment across languages?

Information retrieval.
Sentiment analysis.
Machine translation. (correct)
Named Entity Recognition.

What is a significant drawback of word embeddings related to training data?

They can reflect and perpetuate societal biases present in dataset. (C) Signup and view all the answers

Which method is NOT mentioned as a way to handle the identified challenges of word embeddings?

Data augmentation to increase data volume. (A) Signup and view all the answers

What is the primary purpose of word embeddings in Natural Language Processing?

To transform words into dense vectors that capture semantic relationships. (A) Signup and view all the answers

Which of the following best describes the Bag of Words (BoW) model?

A method that represents text as a collection of its unique words, disregarding syntax. (C) Signup and view all the answers

In the Bag of Words model, what does the 'vector' represent for each document?

A representation of word counts or occurrences based on a vocabulary. (B) Signup and view all the answers

What is the essential first step when using Bag of Words on the provided text?

Tokenize the text into words and build a vocabulary. (C) Signup and view all the answers

What information does the Bag of Words model primarily preserve about a text?

The frequency of each word. (D) Signup and view all the answers

In the provided example, if the vocabulary is [“I”, “love”, “NLP”, “programming”, “in”, “Python”], what would be the vector representation for the document 'I love Python'?

[1, 1, 0, 0, 0, 1] (D) Signup and view all the answers

What is the primary purpose of using TF-IDF in text analysis?

To evaluate the importance of a word in a document relative to a corpus. (A) Signup and view all the answers

What do word embeddings aim to reflect about words?

Their semantic similarity based on context. (C) Signup and view all the answers

In the context of TF-IDF, what does the 'Term Frequency' (TF) measure?

How often a word appears in a specific document. (D) Signup and view all the answers

What is the primary goal of using vector arithmetic with word embeddings, as seen in the "analogies" section?

To understand semantic relationships between words. (B) Signup and view all the answers

What does the 'Inverse Document Frequency' (IDF) component of TF-IDF primarily measure?

The rarity or uniqueness of a term across the corpus. (A) Signup and view all the answers

Which formula correctly represents the calculation of the TF-IDF score for a term 't' in a document 'd'?

TF(t,d) * IDF(t,d) (D) Signup and view all the answers

In the context of Word2Vec, what is the main objective of the training process?

To maximize the likelihood of context words appearing given a target word. (C) Signup and view all the answers

What is the primary difference between the CBOW and Skip-gram approaches in Word2Vec?

CBOW predicts the target word from context, and Skip-gram predicts context words from the target word. (A) Signup and view all the answers

When building word embeddings using the skip-gram model, what is the purpose of selecting a vocabulary size M?

To simplify the complexity of the embedding task by focusing on the most frequent words (D) Signup and view all the answers

In the IDF calculation, why is smoothing added when calculating the total number of documents and total number of documents containing a term?

To avoid division by zero when a term is not present in any document. (D) Signup and view all the answers

What is the primary purpose of setting a context window size in the skip-gram model?

To identify the words that are considered context for each target word. (A) Signup and view all the answers

In the skip-gram model, how is the co-occurrence dictionary built?

By recording the words that appear within the context window of each target word. (D) Signup and view all the answers

What is the significance of the embedding size 'N' when creating word embeddings?

It defines the number of dimensions in each word's vector representation. (D) Signup and view all the answers

During the training phase of the skip-gram model, what is the goal for positive and negative examples?

Adjusting target and context word embeddings to outputs close to 1 for positive examples and 0 for negative examples. (A) Signup and view all the answers

How does GloVe (Global Vectors) differ from Word2Vec?

GloVe incorporates corpus-level co-occurrences in addition to local context. (C) Signup and view all the answers

What does the result `Vector("king") - Vector("man") + Vector("woman") ≈ Vector("queen")` in GloVe embeddings illustrate?

The encoding of analogies within the embedding space by GloVe. (A) Signup and view all the answers

What is the primary motivation behind the development of FastText?

To improve the handling of rare words and subword information. (A) Signup and view all the answers

How does FastText represent words differently from other methods?

It represents words as combinations of subword units (character n-grams). (A) Signup and view all the answers

What does the length of a vector between two words represent in the context of word embeddings?

The semantic dissimilarity between the words. (D) Signup and view all the answers

Given the word vectors 'dog' 3,4 and 'cat' 1,1, what is the resulting vector when calculating the difference from 'dog' to 'cat'?

2, 3 (C) Signup and view all the answers

Why are word embeddings an improvement over one-hot encoding?

They capture semantic similarity in a dense, low-dimensional space. (B) Signup and view all the answers

In the context of one-hot encoding, a categorical variable 'Fruit' with values ['apple', 'banana', 'cherry'] would be represented as vectors such as:

apple 1, 0, 0, banana 0, 1, 0, cherry 0, 0, 1 (D) Signup and view all the answers

Which of the following best describes the core limitation of frequency-based word embeddings?

They do not encode deep semantic relationships between terms. (B) Signup and view all the answers

Which of these methods is NOT a frequency-based embedding technique?

GloVe (C) Signup and view all the answers

What is the primary focus of prediction-based word embeddings like Word2Vec?

Training models to predict context words or target words. (D) Signup and view all the answers

What is a key characteristic of the vectors of one-hot encoded categorical variable?

They are sparce and of high dimensionality (C) Signup and view all the answers

Flashcards

Context Window Size (C)

A pre-defined size of text that surrounds a target word to determine its context in a given document.

Co-occurrence Dictionary

Records the words that appear within the context window of a target word.