19-Hidden-Markov-Models.pdf
Document Details
Uploaded by ThrillingTuba
Tags
Full Transcript
Markov Chains Markov chains (Markov processes) are a popular model for discrete, finite, and memory-less processes. In a (first-order) Markov chain, the next state only depends on the current state. Transitions are then determined by the transition probabilities. Markov property: Probability an enti...
Markov Chains Markov chains (Markov processes) are a popular model for discrete, finite, and memory-less processes. In a (first-order) Markov chain, the next state only depends on the current state. Transitions are then determined by the transition probabilities. Markov property: Probability an entire sequence : Higher-order Markov chains can be reduced to first-order by the cross product. Example: the weather tomorrow depends on the current weather, not on last year’s weather. 3 Text Generation with Markov Chains If each word is a state, we can generate random text with a Markov chain. Examples (usually using a second- or higher-order Markov model): ▶ MegaHAL chatbot [HuAl98] https://en.wikipedia.org/wiki/MegaHAL ▶ Garkov (Garfield + Markov) http://joshmillard.com/garkov/ ▶ Automatic Donald Trump https://filiph.github.io/markov/ Transition probabilites are estimated from training data by counting. 4.1 Automatic Donald Trump Fake tweets generated with Hidden Markov Models: More fun than useful by itself. 4.2 Hidden Markov Models [BaPe66] We can now decouple the word generation from the state. ▶ states follow a Markov process ▶ words depend on the current state only ▶ words are observable, state is hidden We now need to find two probability matrixes depending on state ▶ to go to the next state ▶ to produce observation when in state when in state : : : 5 Hidden Markov Models – Motivation A typical task in natural language processing is part-of-speech annotation: Hidden: NNS VBP VBG TO VB IN IN TO VB Observed: MPs are preparing to vote on whether to back Hidden: NNP NNP POS NN IN Observed: Theresa May ’s VBG DT NNP NNP. deal for leaving the European Union. ▶ Words can have multiple types (a book vs. to book) ▶ Certain transitions are more common than others (e.g., NNP → VBP) Given only the text, can we infer the part-of-speech annotation? 6 Hidden Markov Models – Definition A hidden Markov model is a tuple ▶ a graph of hidden states ▶ a set of possible observations ▶ transition matrix : probabilities of changing to state when in state , ▶ observation probability matrix : to produce the observation when in state ▶ starting probabilities: probability of being the initial state, , Three main problems to solve with HMM: ▶ Evaluation Problem: given , compute the probability of this result ▶ Decoding Problem: given an observation , find the hidden states that most likely led to this observation ▶ Learning Problem: given , , and output sequences, find the optimal , , 7