22-Conditional-Random-Fields-and-Sequential-Models-Conclusions.pdf

Full Transcript

Conditional Random Fields – Motivation Hidden Markov Model: Directed graph, strong temporal order: Joint model of and only interacts with. , i.e., But in NLP applications, co-occurrences of terms matter. Maximum Entropy Markov Model: Conditional model for , non-generative. Use feature functions deri...

Conditional Random Fields – Motivation Hidden Markov Model: Directed graph, strong temporal order: Joint model of and only interacts with. , i.e., But in NLP applications, co-occurrences of terms matter. Maximum Entropy Markov Model: Conditional model for , non-generative. Use feature functions derived from. Linear-Chain Conditional Random Field: Normalization: Normalization can be ignored in inference (because of ). 28 Viterbi Algorithm for Linear-Chain CRF For simplicity, we abbreviate Then we can simplify Because the exponential is monotone,. ➜ use Viterbi algorithm to find the path with maximum Initial states: Recurrence: Best path by backtracking from the maximum. as usual in Viterbi Inference is similar to before, using the Viterbi algorithm, at least if we are only interested in finding the maximum, not the true probability. 29 Training For CRF, we do supervised learning by the usual means: we set the derivative of the (conditional) log-likelihood zero, etc. We can drop the likelihood of , which is constant with respect to the true parameters. Summing over all training documents (and adding a regularization) we then get: This is a convex optimization problem, so we can apply a variety of techniques for numerical optimization. Popular approaches: ▶ Improved Iterative Scaling [PiPiLa97] ▶ Conjugate gradient descent ▶ Limited Memory Quasi-Newton approaches 30 Conclusions ▶ Hidden Markov Models are a standard technique for processing sequences. ▶ The Viterbi algorithm is used to find the optimal labeling (and has many other uses in other domains; it is a good example of dynamic programming) ▶ Learning is done using Baum-Welch, which is an Expectation-Maximization (EM) algorithm ▶ HMM works best for a small set of states and outputs, or our estimates will be too unreliable ▶ Maximum Entropy Markov Models integrate feature functions to handle unknown words ▶ With MEMM and CRF the state is influenced directly also by neighboring words and hence can take word cooccurrences into account ▶ Conditional Random Fields (CRF) eliminate the strict direction of HMM/MEMM 32

Use Quizgecko on...
Browser
Browser