I am currently trying to understand how parameter are being reestimated for hidden Markov models (HMMs), using expectation-maximization (EM).
What I seem to have problems understanding is what the symbol emission probability is actually modelling. In the discrete case it would contain the probability of seeing each symbol in a given state, which analog to in continuous would be how probable a it is to see a continuous stream of observation for a given state.
The Gaussian mixture model that models this probability distribution is defined by the parameters $c_{jk},\mu_{jk},\Sigma_{jk}$ for each state, in which $c$ is each weight of all the PDF the mixture contains (indexed by $k$) and for each state $j$, and similar to $\mu$ and $\Sigma$.
And reestimation of these parameter is defined as such
\begin{equation}\tag{1}\label{1} \widetilde{c}_{jk} = \frac{\sum_{t=1}^{T}\gamma_{jk}(t)}{\sum_{t=1}^{T}\sum_{k=1}^{M}\gamma_{jk}(t)} \end{equation}
\begin{equation}\tag{2}\label{2} \widetilde{\mu}_{jk} = \frac{\sum_{t=1}^{T}\gamma_{jk}(t) \cdot \boldsymbol{o_t}}{\sum_{t=1}^{T}\gamma_{jk}(t)} \end{equation}
\begin{equation}\tag{3}\label{3} \widetilde{\Sigma}_{jk} = \frac{\sum_{t=1}^{T} \gamma_{jk}(t) \cdot (\boldsymbol{o}_t - \boldsymbol{\mu}_{jk})(\boldsymbol{o}_t - \boldsymbol{\mu}_{jk})^T}{\sum_{t=1}^{T}\gamma_{jk}(t)} \end{equation}
$\gamma_{jk}(t)$ is the probability of being in state $j$ at time $t$ with the $k$'th mixture.
Equation \eqref{1} makes sense…
Equation \eqref{1} describes the re estimate formula for $c_{jk}$, which is the ratio between the expected the number of times the system is in state $j$ using the $k$'th mixture, and the expected number of times the system is in state $j$. Which makes sense and it would look like it does.
What I don't get is the other equation. Why are they defined as such? It is said that the observations weight each numerator term, but how does that help making it closer to the oberservation mean?
Similarly with the covariance matrix…
And how and why is the $\gamma_{t}(j,k)$ defined as it is defined..
It is stated in pdf page 351 to be fairly straight forward? I am not fairly agreeing with them..