Podcast
Questions and Answers
Which method ignores the context words in the input while predicting words from the paragraph?
Which method ignores the context words in the input while predicting words from the paragraph?
Paragraph vectors are shared among all paragraphs in the text.
Paragraph vectors are shared among all paragraphs in the text.
False
What is the primary purpose of the Paragraph Vector inference stage?
What is the primary purpose of the Paragraph Vector inference stage?
To get paragraph vectors for new paragraphs by adjusting the vector space.
The DSSM model was initially created for __________.
The DSSM model was initially created for __________.
Signup and view all the answers
Match the following terms with their descriptions:
Match the following terms with their descriptions:
Signup and view all the answers
What is one of the criticisms of Latent Semantic Analysis (LSA)?
What is one of the criticisms of Latent Semantic Analysis (LSA)?
Signup and view all the answers
The Continuous Bag of Words (CBoW) model considers the order of words in a sentence.
The Continuous Bag of Words (CBoW) model considers the order of words in a sentence.
Signup and view all the answers
What approach does word2vec use to handle out-of-vocabulary words?
What approach does word2vec use to handle out-of-vocabulary words?
Signup and view all the answers
The _____ method predicts a word based on its surrounding context.
The _____ method predicts a word based on its surrounding context.
Signup and view all the answers
Which variant of SVD is known for being fast?
Which variant of SVD is known for being fast?
Signup and view all the answers
The Continuous skip-gram model prefers distant words over closer words in its training.
The Continuous skip-gram model prefers distant words over closer words in its training.
Signup and view all the answers
What type of training method did the researchers use in word2vec?
What type of training method did the researchers use in word2vec?
Signup and view all the answers
Study Notes
Vector Space Modelling with ML
- Vector space modeling using machine learning (ML) techniques is discussed.
- Latent Semantic Analysis (LSA) is examined, including its speed limitations and model assumptions.
- Fast Randomized SVD (from Facebook) and Alternating Least Squares (ALS) are presented as potential alternatives to standard LSA, proposing distributed and streaming versions as solutions.
- Principal Component Analysis (PCA) is discussed, noting its assumption of normal data distribution and the global nature of both dimension reduction methods.
- Probabilistic Latent Semantic Analysis (PLSA) is mentioned, contrasting its treatment of statistical independence with the linear orthogonality of other methods.
- The presentation highlights the need for modeling approaches that can adapt to new data and aren't tied to strict data distributions, statistics or memory constraints.
Word2Vec
- Word2Vec (2013) is a group of methods used for predicting words within a given text.
- It does not necessitate computation; instead it focuses on prediction.
- Skip-grams predict words surrounding a given word.
- Continuous Bag-of-Words (CBOW) predicts a word based on surrounding words.
- BoW (Bag-of-Words) – a multiset of objects which disregards order, applicable to images and texts.
- CBOW uses a continuous sample of text within a window.
- The input uses one-hot context encoding (vector representing the dictionary size);
- the output represents the overall collection distribution (dict-size).
- The activation function uses a softmax function to model the likelihood
- Word embeddings are extracted from the model outputs
Continuous Skip-Gram Model
- The Continuous Skip-gram model focuses on maximizing the probability of classifying a word based on another word in the same sentence. This method is distinct from CBOW.
- While increasing the word range can improve word vector quality, it also increases computational complexity.
- Strategies for reducing the weight of distant words during training are discussed.
Vector Space Arithmetics
- A visual representation of vector space arithmetic shows relationships between different word vectors (man, woman, king, queen).
- The examples demonstrate how relationships between words can be modeled and represented in a geometrical way in the vector space.
- Examples given are (man - woman + woman) = queen.
Training and Quality
- Three training epochs were used with stochastic gradient descent and backpropagation.
- Learning rate commenced at 0.025 and gradually decreased to zero, over the three training epochs.
- The dataset comprised 8869 semantic and 10675 syntactic questions.
- Specific word relationship models (RNNLM, NNLM, CBOW, Skip-gram) and their accuracies were provided for evaluation of different approaches and models.
word2vec Problems
- Out-of-Vocabulary (OOV) words (words not in the vocabulary; can be dealt with sub-word tokenization).
- The model doesn't handle aspects like grammar, abbreviations, forms and homographs.
- The model focuses on word level rather than considering larger text structures.
doc2vec (Paragraph Vectors)
- Paragraph Vectors can be used to represent entire paragraphs as a vector.
- Paragraph vectors are generated using several word vectors combined with a paragraph vector.
- The model predicts the next word in the context, providing a unique approach to paragraph representation.
- Paragraph token length varies and not subject to a fixed structure and approach to modeling.
DSSM - Deep Structured Semantic Model
- DSSM is a deep learning model trained to predict the cosine similarity of words.
- Using letter trigrams (combinations of three consecutive letters).
- Initially developed for search engines with a specific metric.
- The relatively small input size poses a challenge to the efficacy of deeper neural networks.
- Training data includes positive examples (clicked headers) and negative examples (headers that were shown but not clicked).
DSSM update by Yandex
- DSSM improvements include using trigrams and word bigrams as input features for an expanded vocabulary.
- This model has challenges with handling random and fake negatives in the training data.
- A solution is mentioned using hard negative mining, which is similar to GAN training while being simpler.
- Another training objective is to maximize dwell time as opposed to solely maximizing cosine similarity.
BERT (Bidirectional Encoder Representations from Transformers)
- This model focuses on learning language to solve tasks related to language, through attention and self-attention mechanisms.
- It predicts 15% of masked words.
- It also predicts logical relationships between phrases.
- The main limitation of the training approach discussed is its unidirectional nature, limiting architectural options.
- Different input layers, such as token embeddings, segment embeddings, and positional embeddings, are combined for efficient representations of words, phrases and context within a given input.
Reading
- There are many related articles and papers. (Links are available within the presentation.)
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
This quiz explores various machine learning techniques for vector space modeling, including Latent Semantic Analysis (LSA) and alternatives like Fast Randomized SVD and Alternating Least Squares. It delves into methods such as Principal Component Analysis and Probabilistic Latent Semantic Analysis, highlighting their assumptions and limitations in handling new data. Test your understanding of these concepts and their applications in natural language processing.