N-Gram Language Models

Introduction to N-Gram Language Models

Probabilistic models of word sequences can suggest more probable English phrases.
Language models (LMs) assign probabilities to sequences of words and are important for augmentative and alternative communication systems.
N-gram is the simplest LM that assigns probabilities to sentences and sequences of words.
An n-gram is a sequence of n words, and a bigram is a two-word sequence while a trigram is a three-word sequence.
N-gram models estimate the probability of the last word of an n-gram given the previous words and assign probabilities to entire sequences.
The Markov assumption is that the probability of a word depends only on the previous word.
Markov models predict the probability of some future unit without looking too far into the past.
The best way to evaluate the performance of an LM is to embed it in an application and measure how much the application improves.
Extrinsic evaluation is the only way to know if a particular improvement in a component is really going to help the task at hand.
Intrinsic evaluation metric measures the quality of a model independent of any application.
The probabilities of an n-gram model come from the training corpus it is trained on, and its quality is measured by its performance on the test corpus.
It's important not to let the test sentences into the training set to avoid training on the test set. The data is often divided into 80% training, 10% development, and 10% test.

Introduction to N-Gram Language Models

Probabilistic models of word sequences can suggest more probable English phrases.
Language models (LMs) assign probabilities to sequences of words and are important for augmentative and alternative communication systems.
N-gram is the simplest LM that assigns probabilities to sentences and sequences of words.
An n-gram is a sequence of n words, and a bigram is a two-word sequence while a trigram is a three-word sequence.
N-gram models estimate the probability of the last word of an n-gram given the previous words and assign probabilities to entire sequences.
The Markov assumption is that the probability of a word depends only on the previous word.
Markov models predict the probability of some future unit without looking too far into the past.
The best way to evaluate the performance of an LM is to embed it in an application and measure how much the application improves.
Extrinsic evaluation is the only way to know if a particular improvement in a component is really going to help the task at hand.
Intrinsic evaluation metric measures the quality of a model independent of any application.
The probabilities of an n-gram model come from the training corpus it is trained on, and its quality is measured by its performance on the test corpus.
It's important not to let the test sentences into the training set to avoid training on the test set. The data is often divided into 80% training, 10% development, and 10% test.

Introduction to N-Gram Language Models

Probabilistic models of word sequences can suggest more probable English phrases.
Language models (LMs) assign probabilities to sequences of words and are important for augmentative and alternative communication systems.
N-gram is the simplest LM that assigns probabilities to sentences and sequences of words.
An n-gram is a sequence of n words, and a bigram is a two-word sequence while a trigram is a three-word sequence.
N-gram models estimate the probability of the last word of an n-gram given the previous words and assign probabilities to entire sequences.
The Markov assumption is that the probability of a word depends only on the previous word.
Markov models predict the probability of some future unit without looking too far into the past.
The best way to evaluate the performance of an LM is to embed it in an application and measure how much the application improves.
Extrinsic evaluation is the only way to know if a particular improvement in a component is really going to help the task at hand.
Intrinsic evaluation metric measures the quality of a model independent of any application.
The probabilities of an n-gram model come from the training corpus it is trained on, and its quality is measured by its performance on the test corpus.
It's important not to let the test sentences into the training set to avoid training on the test set. The data is often divided into 80% training, 10% development, and 10% test.