Recent Lessons

Show all results for ""

Vector Space Modeling with ML Techniques

Vector Space Modeling with ML Techniques

Choose a study mode

Play Quiz

Study Flashcards

Spaced Repetition

Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Which method ignores the context words in the input while predicting words from the paragraph?

Gradient descent optimization
Distributed Bag of Words (PV-DBOW) (correct)
Distributed Memory (PV-DM)
Cosine similarity prediction

Paragraph vectors are shared among all paragraphs in the text.

False (B)

What is the primary purpose of the Paragraph Vector inference stage?

To get paragraph vectors for new paragraphs by adjusting the vector space.

The DSSM model was initially created for __________.

<p>search</p> Signup and view all the answers

Match the following terms with their descriptions:

<p>PV-DM = Concatenates paragraph and word vectors PV-DBOW = Ignores context words while predicting words DSSM = Trained to predict cosine similarity Gradient Descent = Optimization technique for training</p> Signup and view all the answers

What is one of the criticisms of Latent Semantic Analysis (LSA)?

<p>It assumes a normal data distribution. (D)</p> Signup and view all the answers

The Continuous Bag of Words (CBoW) model considers the order of words in a sentence.

<p>False (B)</p> Signup and view all the answers

What approach does word2vec use to handle out-of-vocabulary words?

<p>Subword tokenization</p> Signup and view all the answers

The _____ method predicts a word based on its surrounding context.

<p>Skip-gram</p> Signup and view all the answers

Which variant of SVD is known for being fast?

<p>Randomized SVD (D)</p> Signup and view all the answers

The Continuous skip-gram model prefers distant words over closer words in its training.

<p>False (B)</p> Signup and view all the answers

What type of training method did the researchers use in word2vec?

<p>Stochastic gradient descent and backpropagation</p> Signup and view all the answers

Flashcards

Vector Space Modeling with ML

A method used to represent text or words as vectors in a multi-dimensional space. This allows for mathematical analysis and machine learning to compute semantic relations between words or documents.

LSA (Latent Semantic Analysis)

A technique for reducing the dimensionality of text data by discovering latent semantic relationships.

word2vec

A family of algorithms used to learn word embeddings by predicting neighboring words.

Skip-grams (word2vec)

A word2vec method where a word is predicted using its nearby words.

Signup and view all the flashcards

Continuous Bag-of-Words (CBoW)

A word2vec method where the context around a target word is used to predict the word itself.

Signup and view all the flashcards

Embedding

A representation of a word or document as a vector, enabling computation of semantic relationships.

Signup and view all the flashcards

Out-of-vocabulary (OOV) words

Words or phrases not present in a dataset used to train embedding models.

Signup and view all the flashcards

Fast Randomized SVD

A faster alternative to traditional Singular Value Decomposition for dimensionality reduction.

Signup and view all the flashcards

Paragraph Vectors (doc2vec)

A method for representing paragraphs as vectors, combining paragraph and word vectors to predict the next word.

Signup and view all the flashcards

PV-DM (Distributed Memory)

A type of paragraph vector model that concatenates paragraph and word vectors.

Signup and view all the flashcards

PV-DBOW (Distributed Bag of Words)

A paragraph vector model that ignores context words, instead training to predict randomly sampled words from the paragraph.

Signup and view all the flashcards

DSSM (Deep Structured Semantic Model)

A model trained to predict cosine similarity between text inputs and using a bag of letter trigrams. Used for search.

Signup and view all the flashcards

Training DSSM

DSSM training uses positive (clicked headers) and negative (shown but not clicked) examples, focusing on relevance between headers and search results.

Signup and view all the flashcards

Study Notes

Vector Space Modelling with ML

Vector space modeling using machine learning (ML) techniques is discussed.
Latent Semantic Analysis (LSA) is examined, including its speed limitations and model assumptions.
Fast Randomized SVD (from Facebook) and Alternating Least Squares (ALS) are presented as potential alternatives to standard LSA, proposing distributed and streaming versions as solutions.
Principal Component Analysis (PCA) is discussed, noting its assumption of normal data distribution and the global nature of both dimension reduction methods.
Probabilistic Latent Semantic Analysis (PLSA) is mentioned, contrasting its treatment of statistical independence with the linear orthogonality of other methods.
The presentation highlights the need for modeling approaches that can adapt to new data and aren't tied to strict data distributions, statistics or memory constraints.

Word2Vec

Word2Vec (2013) is a group of methods used for predicting words within a given text.
It does not necessitate computation; instead it focuses on prediction.
Skip-grams predict words surrounding a given word.
Continuous Bag-of-Words (CBOW) predicts a word based on surrounding words.
BoW (Bag-of-Words) – a multiset of objects which disregards order, applicable to images and texts.
CBOW uses a continuous sample of text within a window.
The input uses one-hot context encoding (vector representing the dictionary size);
the output represents the overall collection distribution (dict-size).
The activation function uses a softmax function to model the likelihood
Word embeddings are extracted from the model outputs

Continuous Skip-Gram Model

The Continuous Skip-gram model focuses on maximizing the probability of classifying a word based on another word in the same sentence. This method is distinct from CBOW.
While increasing the word range can improve word vector quality, it also increases computational complexity.
Strategies for reducing the weight of distant words during training are discussed.

Vector Space Arithmetics

A visual representation of vector space arithmetic shows relationships between different word vectors (man, woman, king, queen).
The examples demonstrate how relationships between words can be modeled and represented in a geometrical way in the vector space.
- Examples given are (man - woman + woman) = queen.

Training and Quality

Three training epochs were used with stochastic gradient descent and backpropagation.
Learning rate commenced at 0.025 and gradually decreased to zero, over the three training epochs.
The dataset comprised 8869 semantic and 10675 syntactic questions.
Specific word relationship models (RNNLM, NNLM, CBOW, Skip-gram) and their accuracies were provided for evaluation of different approaches and models.

word2vec Problems

Out-of-Vocabulary (OOV) words (words not in the vocabulary; can be dealt with sub-word tokenization).
The model doesn't handle aspects like grammar, abbreviations, forms and homographs.
The model focuses on word level rather than considering larger text structures.

doc2vec (Paragraph Vectors)

Paragraph Vectors can be used to represent entire paragraphs as a vector.
Paragraph vectors are generated using several word vectors combined with a paragraph vector.
The model predicts the next word in the context, providing a unique approach to paragraph representation.
Paragraph token length varies and not subject to a fixed structure and approach to modeling.

DSSM - Deep Structured Semantic Model

DSSM is a deep learning model trained to predict the cosine similarity of words.
Using letter trigrams (combinations of three consecutive letters).
Initially developed for search engines with a specific metric.
The relatively small input size poses a challenge to the efficacy of deeper neural networks.
Training data includes positive examples (clicked headers) and negative examples (headers that were shown but not clicked).

DSSM update by Yandex

DSSM improvements include using trigrams and word bigrams as input features for an expanded vocabulary.
This model has challenges with handling random and fake negatives in the training data.
A solution is mentioned using hard negative mining, which is similar to GAN training while being simpler.
Another training objective is to maximize dwell time as opposed to solely maximizing cosine similarity.

BERT (Bidirectional Encoder Representations from Transformers)

This model focuses on learning language to solve tasks related to language, through attention and self-attention mechanisms.
It predicts 15% of masked words.
It also predicts logical relationships between phrases.
The main limitation of the training approach discussed is its unidirectional nature, limiting architectural options.
Different input layers, such as token embeddings, segment embeddings, and positional embeddings, are combined for efficient representations of words, phrases and context within a given input.

Reading

There are many related articles and papers. (Links are available within the presentation.)

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

Vector Space Modelling with Machine Learning PDF

More Like This

Vector Space Fundamentals Quiz

10 questions

Vector Space Fundamentals Quiz

WellBacklitMystery

Vector Space and Linear Applications - Chapter 1

16 questions

Vector Space and Linear Applications - Chapter 1

MotivatedLongBeach

Vector Space Axioms

10 questions

Vector Space Axioms

DelightedParadise7114

Vector Space Definition

46 questions

Vector Space Definition

ReadyVector

Use Quizgecko on...

Browser