Simple Explanation of Baum Welch/Viterbi

Question

I'm looking for a very simple explanation as possible for Baum Welch and Viterbi for HMMs with a straightforward well annotated example. Almost all of the explanations I find on the net invariably jump from the beginning into an almost completely jargon/symbol based explanation. The few times they have an example it is usually shoved to the side or poorly annotated (ie its unclear how the symbols and the example are related to each other). I've heard stuff like some sources claiming Baum Welch is based on the EM algorithm and others saying its based off of the forward backward algorithm and some say both so I'd like to get details like that cleared up.

The closest I found so far is a 1hr video posted here that people said was great but is unplayable because flash is banned and an explanation on wikipedia that has a note next to it saying its wrong.

Nimitz14 · Answer 1 · 2018-01-18T11:46:47.147

I have the perfect paper for you: http://www.stat.columbia.edu/~liam/teaching/neurostat-fall17/papers/hmm/rabiner.pdf

Note afaik there are newer BW methods.

Attempted simple explanation for Viterbi: The goal of viterbi is to find an optimal sequence of states during some discrete period of time.
Start off with the initial probabilities of being in a state $s$ at time $t=0$. Then increase $t$ and for each state at that time ($t=1$), find the best state $s$ at $t-1$ (best as in the sequence which gives the highest probability). Save this information. That means for each state $s$ at $t=1$ save what the best state $s$ at $t=0$ is. Also save the probability for that (optimal) sequence.
Then increase $t$ to $t=2$, and repeat the procedure (with the probabilities for each state $s$ at $t-1$ being the just calculated best probabilities for that state). At some point you'll reach $t=T$ (last point in time). Then you just pick the state with the highest probability, and remember for each state at each time you have saved the best state for $t-1$, so take the best state at $t=T$ and then work your way backwards always picking that saved optimal state.

To elaborate on what it does on a more general level: If you have a set of states $S$ which can all be picked over some duration of time $T$, a naive solution to find the best sequence would be to enumerate all possible sequences and then calculate the probability for each of them (that means calculating the product of the state and transition probabilities). The mistake one is doing here is that a lot of the calculations are duplicated, as an example you could have two sequences $p_1, p_2$ ($p$ is a sequence of states) where the only difference is the final state, yet with this approach you are calculating the probabilities of the shared states for both $P(p_1)$ and $P(p_2)$.
By taking the iterative approach used in viterbi, there are no duplicated calculations, so obviously it will be a lot more efficient.

https://www.youtube.com/watch?v=6JVqutwtzmo Above video graphically explains what Nimitz14 has explained (beautifully). — pydisai pydi, Feb 13 '19 at 13:06

score 1 · Answer 2 · answered Mar 23 '19 at 03:06

I'm looking for a very simple explanation as possible for Baum Welch and Viterbi for HMMs...

Given the observed data, the Baum-Welch algorithm finds the likelihood-maximizing parameters.
Given the observed data, and the parameters, the Viterbi algorithm finds the most likely sequence of hidden states.

I've heard stuff like some sources claiming Baum Welch is based on the EM algorithm...

The Baum-Welch algorithm is a special case of the EM algorithm.

...and others saying its based off of the forward backward algorithm

The forward-backward decomposition is how you do the "E-" step (using the EM terminology). It's specific to hidden Markov models.

The few times they have an example it is usually shoved to the side or poorly annotated (ie its unclear how the symbols and the example are related to each other)

I don't know what examples you're looking at, but I can tell you that there are many good ones out there. Remember that these methods were invented a very long time ago, and they're still widely used. The Rabiner one linked to above is probably the most famous, but you have to find one you like that you'll keep working through. I would recommend you find one with an example that has data you're interested in, and notation that you feel is learn-able. Whatever you find will have a lot of notation. There is no way around that, so you've just got to find something you can settle into.

Simple Explanation of Baum Welch/Viterbi

2 Answers2