Let's say I have some [multivariate] data and want to fit a GMM to it. So I have $P_x=\sum_{i=1}^{n}\alpha_i{N(x;\theta_i)}$, where $x$ is an observation from the data, $\theta_i$ is the mean and covariance matrix parameters for the ith Gaussian, and $\sum_{i=1}^{n}\alpha_i=1$, i.e. it is a contrast to ensure a valid probability distribution of the mixed Gaussians.
As we know this would be easy to setup a liklihood for and optimize via maximum liklihood. The optimized result would be a local optimum and a valid probability distribution (I could ensure a global optimum by doing something like self-contrastive estimation as described by Ian Goodfellow), but admittedly now I'm a bit stuck on interpretation.
The mixing weights $\alpha_i$ seem like they would represent the marginal probability of each group (i.e., $\alpha_i$ would be the marginal probability of group "i"...$P(group_i)$), but then the ith Gaussian would be like the likelihood of group "i" given the data. I.e., $P(group_1|x), P(group_2|x)$, etc. which when summed would be like the normalizing constant for $P(x|group_1), P(x|group_2)$, etc... Or is the output of the ith Gaussian $P(x|group_i)$ (which would make more sense...)? If the latter, then since $P(group_i|x)=\frac{P(x|group_i)P(group_i)}{P(x)}$ is the output for EM algorithm model I could seemingly back calculate very easily what EM algorithm would provide.
Any ideas if I'm viewing this correctly. It seems if I just want to fit a flexible PDF to my data my method would work though, correct? Still trying to reconcile the difference when applying EM algorithm though to the same type of problem, even if the method I describe is legit...