Initial state recognition in HMM

Question

I am building a speech recognition system using Hidden Markov Model in python. I referred to this and this question and its answers, which were very helpful.

In my approach, I split the continuous speech into separate words. I am thinking of using HMM to detect each word. So my states of HMM will be phones.

What I understood so far is that HMM estimates next state based on current state(phone). But I don't get how to estimate first state of HMM(i.e. the first phone of the word).

Can you suggest the best approach to use HMM to achieve this?

Also states of HMM will be phones, but I am not getting what can be observation in problem? There are multiple frame in a single phone and there is a feature vector corresponding to each frame. What should I use as observation?

http://en.wikipedia.org/wiki/Baum%E2%80%93Welch_algorithm - Have you looked into this? — Phorce, Mar 23 '15 at 09:40
@Phorce I have seen it, but I didn't understand how to apply it to solve my problem i. e. detecting first state of HMM (first phone). — Gaurav Deshmukh, Mar 23 '15 at 10:07
The baum welch algorithm will give you an approximation to what the next state is. I.e. Given the word "Yes" it will predict that in "Y" then comes "E" followed by "S" so you can predict the next state. The Vertibi score might be a better solution. — Phorce, Mar 23 '15 at 10:13
The $\pi$ from the Baum-Welch algorithm is the initial probabilities of the states (i.e. the states at start), which is close to what you're after I think. — Peter K., Mar 23 '15 at 21:02

score 3 · Accepted Answer · answered Mar 23 '15 at 20:47

3

The Baum-Welch algorithm uses the EM (Expectation Maximization) algorithm to estimate the model parameters $(T, E, \pi)$, where:

$T$: the transition probabilities
$E$: the emition probabilities
$\pi$: probability distribution on the states

Some years ago, I made the following quick-and-dirty implementation (may be fairly broken now), for the discrete case.

Hope this helps.

answered Mar 23 '15 at 20:47

dohmatob

236
1
3

I thought that $\pi$ from the Baum-Welch algorithm was the *initial* probability distribution, which is close to what the OP is asking for? – Peter K. Mar 23 '15 at 21:01
@dohmatob as I said states of HMM will be phones, but I am not getting what can be **observation** in problem? There are multiple frame in a single phone and there is a feature vector corresponding to each frame. What should I use as observation? – Gaurav Deshmukh Apr 02 '15 at 07:41

score 0 · Answer 2 · answered Apr 02 '15 at 16:49

This is my understanding, which is probably incomplete: phones are usually modeled using multiple (~3) states. Observations are feature vectors and are often some variant of mel-frequency cepstral coefficients (MFCCs). Word models can then be constructed by concatenating several phone models together. More detailed information can be found in this PDF describing the applications of HMMs to speech recognition.

Initial state recognition in HMM

2 Answers2

Linked