20-Algorithms-for-Hidden-Markov-Models.pdf
Document Details
Uploaded by ThrillingTuba
Tags
Full Transcript
Evaluation Problem: Naive Approach Observations: ▶ only depends on the current state ▶ only depends on the current state (Markov chain) For simplicity of the equations, we now use , and If we would know the states , we could just compute The answer then is just the sum over all possible hidden state...
Evaluation Problem: Naive Approach Observations: ▶ only depends on the current state ▶ only depends on the current state (Markov chain) For simplicity of the equations, we now use , and If we would know the states , we could just compute The answer then is just the sum over all possible hidden states Problem: There are. using : state sequences to sum over – combinatorial explosion! 9 Evaluation Problem: Forward Algorithm [BaEa67; Bake75] Dynamic Programming solution: Because a Markov chain is “memoryless”, only depends on not on the previous states (or the path taken) ⇒ merge all states with ⇒ fill a table with rows, , into a single case columns: First column for each : Induction for each : 10 Forward-Backward Algorithm The Forward algorithm computes We can do the same backwards, using Last value using only the observations to to. only (denoted as backward variables ) ; Backwards induction: The Forward-Backward algorithm computes Using Bayes’ rule and the law of total probability, we can estimate But: if we estimate all this way, this may not be the most likely sequence of labels. (This sequence might even contain invalid transitions with probability 0). 11 Decoding Problem: Viterbi Algorithm [Forn73; Vite67] The decoding problem is very similar to the evaluation problem. ▶ instead of the sum over all paths, we search the ▶ use path instead of , and remember what the maximum was First column for each : Induction for each : for each : 12 Learning Problem: Baum-Welch-Algorithm [Rabi89] Optimize the parameters using an EM [DeLaRu77] procedure: ▶ Begin with intial guesses / estimates of ▶ Estimate the hidden variables using the Forward-Backward estimates: Transition probability from to ▶ Maximize by updating the values for : : 𝟙 ▶ Repeat with new 13 Beam Search: Approximating Viterbi The runtime complexity of Viterbi is (number of states number of steps). If we have many states, this becomes very expensive! If tags, and we want a third-order model, we have. Observation: many states will have zero, or close to zero probabilities. Idea: keep only the most likely states in each step of Viterbi. ⇝ Beam search [LoRe76] (with beams), complexity Further alternatives: A*-search on the Viterbi graph (guarantees to find the optimum result, but more complicated to implement: needs an updatable priority heap) 14