How to use GMMs for acoustic signal classification?

Question

There are a number of applications of the Gaussian Mixture Model (GMMs) to acoustics/audio data for the purposes of classification; ex paper1 and ex paper2. GMMs for the case of clustering and position source generation can be understood.

What is unclear from various papers is the details of facilitating such a model which does not represent explicitly the temporal dependencies when the data is produced with different temporal features. Questions such as 'what if the class source changes the rate of signal production?', or a question 'does the methodology examine only the latest temporal component (block/window)?'. These questions would displace the parameterizations of a GMM, would they not?

It also appears that the parameters $\mathbf{\mu}_i,\mathbf{\sigma}_i$, would have the dimensionality of the number of samples in the audio sequence under examination, correct? So that if $\mathbf{\mu}_i \in \mathbb{R}^\tau$, the signal data $D$ being examined is $D_{T-\tau,\ldots,T}$?

The question is, how can GMMs be adapted in practice for the purposes of signal classification?

score 1 · Answer 1 · answered Apr 13 '19 at 21:14

What is unclear from various papers is the details of facilitating such a model which does not represent explicitly the temporal dependencies when the data is produced with different temporal features.

GMM is simplistic model of speech, it does not account for temporal dependencies indeed. There are more advanced ones like GMM-HMM or LSTM neural networks.

Questions such as 'what if the class source changes the rate of signal production?'

It might harm or might not, depends on the change.

or a question 'does the methodology examine only the latest temporal component (block/window)?'

It depends on how you apply it. You can apply it to a window or to the whole audio.

So that if $\mathbf{\mu}_i \in \mathbb{R}^\tau$, the signal data $D$ being examined is $D_{T-\tau,\ldots,T}$?

It usually considers the whole audio and just matches the components inside. For a practical example you can check:

https://de.mathworks.com/company/newsletters/articles/developing-an-isolated-word-recognition-system-in-matlab.html

does it require moving from the pressure signal to a spectrogram to cluster based upon the cell entries? — Vass, Apr 13 '19 at 22:22
Yes, it uses spectrum. "pressure" is usually called "amplitude" or "raw signal" — Nikolay Shmyrev, Apr 13 '19 at 22:25
can you include this in your answer and the process of going from the raw signal to the 'image' from where the GMM is used? Can you also elaborate on different 'image' transformations such as Mel etc. — Vass, Apr 13 '19 at 22:28

How to use GMMs for acoustic signal classification?

1 Answers1