EM and (Gaussian) Mixture Models

Question

I have some trouble understanding the definition of the EM-algorithm. On Wikipedia they write the following mathematical expression:

$$ E_{Z|X,\Theta^{(t)}}[\log L(\Theta^{(t)}; X, Z)|X=x] $$

Now, let us rename $\Theta^{(t)}$ to just $\Theta$ and let us also recall that (yet) we do not want to undergo the complete Bayesian approach, i.e. $\Theta$ is not a random variable but just a (fixed but arbitrary) constant. $L(\Theta; X, Z)$ is simply $f_{X, Z}(x,z)$, i.e. the common density of the random variables $X$ and $Z$. We will abbreviate this as $p(x,z)$. Hence, what they want to compute is

$$E[\log p(X,Z)|X=x] $$

So I know that this should be equal to $E[\log p(x,Z)|X=x] = \int_{\mathcal{Z}} \log p(x,z) p(x|z) dz$ and so on but I am stuck at the very beginning: In order to write down a conditional expectation for, in fact, any random variable $Y$ conditioned on $X$ one needs to know that the random variable $Y$ is integrable (i.e. $L^1(\Omega)$). Hence, we need to know that the variable $$ \omega \mapsto \log p(X(\omega), Z(\omega))$$ is integrable.

Question: Why is this true?

What I have achieved so far: When I restrict myself to try to do Mixture models with the EM algorithm then this boils down to showing that for any $\Theta$, $$\int_{\mathbb{R}} \log (f(x;\Theta)) \cdot f(x;\Theta) dx < \infty$$ where $f$ is the density one puts into the mixture model. For example, this works when one takes $f$ to be the density of a normal distribution but the question is: Does this work for a general density function??

Regards & Thanks in advance,

FW

ahstat · Accepted Answer · 2017-04-26T01:40:03.730

1

We do not have $-\int_\mathbb{R} f \log(f) < +\infty$ in general.

I added the minus sign to show the relation of this integral with the concept of differential entropy.

Seeking for this (infinite differential entropy) on stackexchange, we obtain a detailed answer in this post: Is differential entropy always less than infinity? The proposed example is the density defined with (for $x > 2$): $$f(x) = \frac{\log(2)}{x \log(x)^2}.$$

For common distributions however, we have $-\int_\mathbb{R} f \log(f) < +\infty$ and we can apply EM (See for example: https://en.wikipedia.org/wiki/Differential_entropy#Differential_entropies_for_various_distributions ).

edited Apr 26 '17 at 01:40

answered Apr 25 '17 at 16:30

ahstat

1,080
1
11
17

Ok, thanks... so, let me get this straight: essentially you are saying that we need to assume some stuff about the density and this assumption is just missing in the Wikipedia article. However, this assumption holds for many " natural" distributions... is that correct? – Fabian Werner Apr 26 '17 at 04:21
I thought about it again, I think with can generally apply the EM algorithm, even when $\int f \log f = \infty$. In the proof of convergence of EM (to a solution which is not necessary the maximum of likelihood), we don't use $\mathbb{E}[...]$ explicitly but only need the sum over Z (which is typically discrete). – ahstat Apr 26 '17 at 06:25
Yes, that's what I thought as well, if the function is not in L1 then we actually define the E step as what it should be, namely E[p(x,Z)|X=x]... I do not know the convergence proof so well, I need to study it. – Fabian Werner Apr 26 '17 at 06:31

score 0 · Answer 2 · answered Apr 26 '17 at 13:00

I rewrite another answer because I think there is a problem at the beginning.

On Wikipedia, there is no final "$X=x$", and what we want to compute is: $$Q = E_{Z|X=x,\Theta=\theta^{(t)}}[\log p(X=x, Z|\Theta=\theta)].$$

We write it under this form to emphasize that the random distribution is "$Z|X=x,\Theta=\theta^{(t)}$" (this conditional distribution exists because $Z$ and $X$ are probability distribution).

If we write it explicitly, we obtain:

$$Q = \sum_z \log{p(X=x,Z=z|\Theta=\theta)} dP(Z=z | X=x,\Theta = \theta^{t})$$

I write it as a sum because we know that $Z$ is discrete. I add "$dP$" only to insist that this probability comes from the random distribution.

Additional note: The expression $Q = E_{Z|X=x,\Theta=\theta^{(t)}}[\log p(X=x, Z|\Theta=\theta)]$ might be a little difficult to understand. We can think it as follows (in a Monte Carlo point of view): We simulate a sample $z_1, ..., z_N$ from the random variable $Z':=\lbrace Z|X=x,\Theta=\theta^{(t)} \rbrace$, and then we evaluate the mean of $\log p(X=x, Z=z_i|\Theta=\theta)$ (with $i$ from $1$ to $N$). When $N$ goes to infinity, we obtain the expectancy $Q$.

Actually, expressions E[...|...] are perfectly well defined mathematical objects. What makes it difficult to understand is the E_... notation (this is not standard)... Anyhow, it could be the case that they actually mean E[log L(Theta, x, Z) |...] by emphasising the Z|... in the subscript. — Fabian Werner, Apr 27 '17 at 12:05

EM and (Gaussian) Mixture Models

2 Answers2

Linked