Does EM algorithm require us to know the joint (predictive) distribution of the latent variables $Z$ when $Z$ is two-dimensional?

Question

In its general form the E-step of the EM algorithm finds the expectation

$$ Q(\theta|\theta') =\int \log[ p(Y,Z | \theta)] p(Z|Y,\theta') d Z$$

where $Y$ the data, $Z$ the latent variables, $\theta'$ the current parameters, and $l(\theta|Y,Z) = p(Y,Z|\theta)$ the complete data likelihood.

My general question is: does EM require us to know the joint conditional (predictive) distribution of the latent variables, $p(Z|Y,\theta)$?

Context. This question is phrased with more context as follows. Suppose $Z=[X,S]$ has two random variables parameterised by distinct vectors $\lambda$ and $\phi$ respectively, i.e. we know (can evaluate and sample from) $p(X,S|\lambda,\phi)=p(X|\lambda)p(S|\phi)$. For my problem, data $Y$ are a deterministic function $Y=f(X,S)$.

Variables $X,S$ are marginally independent; however they are not independent conditional on $Y$. We also know the complete data likelihood, it is

$$p(Y,X,S|\lambda, \phi) = p(X,S|\lambda, \phi) = p(X|\lambda)p(S|\phi).$$

Also we know the full conditionals $p(X|S,Y,\lambda)$ and $p(S|X,Y,\phi)$. They denote truncated distributions which can be derived. If we fill this in we find:

$$ \int \log[ p(X|\lambda)p(S|\phi) ] p(X,S|Y, \phi',\lambda') dXdS$$

As noted by Xi'an the E step then considers

$$ Q(\lambda, \phi|\lambda',\phi') = E_{X,S|Y,\lambda',\phi'}[l(\lambda|X)] +E_{X,S|Y,\lambda',\phi'}[l(\phi|S)] $$

which could be maximized in turn in an expectation conditional maximization (ECM) algorithm. However, for the problem I am working on the joint predictive distribution $p(X,S|Y, \phi,\lambda)$ is not known. We can use the (known) truncated distributions to write

$$ p(X|S,Y, \lambda) p(S|Y,\phi) = p(S|X,Y, \phi) p(X|Y,\lambda)$$

However the marginals $p(X|Y,\lambda),p(S|Y,\phi)$ are then unknown for my problem. As Xi'an noted the log-likelihood simplifies to

$$ Q(\lambda, \phi|\lambda',\phi') = E_{X|Y,\lambda'}[l(\lambda|X)] +E_{S|Y,\phi'}[l(\phi|S)] $$

involving the same unknown densities.

Is there still a way to use a type of EM algorithm to solve this estimation problem?

Speculative procedure. Let us assume we know $S=s_0$ then we could use EM with

$$ \tilde{Q}_1(\lambda, \phi|\lambda',\phi') = E_{X|Y,s_0,\lambda'}[l(\lambda|X)] $$

Similarly if we knew $X=x_0$

$$ \tilde{Q}_2(\lambda, \phi|\lambda',\phi') = E_{S|Y,x_0,\phi'}[l(\phi|S)] $$

So I gauge that a procedure that updates as follows might be a valid EM?

Sample $s_0 \sim p(S|\phi')$
Update $\lambda'$ by maximizing $\tilde{Q}_1$
Sample $x_0 \sim p(X|\lambda')$
Update $\phi'$ by maximizing $\tilde{Q}_2$

Then iterate until convergence. This seems not pure EM as it uses data augmentation. But perhaps something like this is a known method?

It all depends on the form of the full likelihood. If it breaks into terms depending only on $X$ and terms only depending on $S$ then marginals are good enough. Else you need the joint. Monte Carlo EM with Gibbs steps may be a practical solution to your difficulty. — Xi'an, Feb 22 '20 at 08:38
@Xi'an In fact the full likelihood in my problem factors $p(Y,X,S|\phi,\lambda) = p(X|\lambda), p(S|\phi)$ - what would be the right approach then? I was thinking along the lines of holding $\phi$ and $S$ fixed and then use EM on $X$ and $\lambda$, then hold the result fixed and use EM for $\phi$ and $S$. Then iterate. Does this make sense? It is like expectation conditional maximization (ECM) I believe. — tomka, Feb 22 '20 at 11:32
if you mean that$$p(Y,X,S|\phi,\lambda) = p(X,Y|\lambda)\times p(S,Y|\phi)$$then$$\log p(Y,X,S|\phi,\lambda) = \log p(X,Y|\lambda)+\log p(S,Y|\phi)$$ and the posterior marginals are enough to run the E step of the EM algorithm, since$$\mathbb E[\log p(Y,X,S|\phi,\lambda)|Y]=\mathbb E[\log p(X,Y|\lambda)|Y]+\mathbb E[\log p(S,Y|\phi)|Y]$$If this is not feasible, ECM can indeed be used, meaning iterating between EM steps on $X$ and on $S$. — Xi'an, Feb 22 '20 at 18:19
@Xi'an I fully revised my question following our discussion and your remarks and would be happy if you could have another look. Thank you. — tomka, Feb 24 '20 at 17:36
@Xi'an Yes, I say that actually: if $Y=f(X,S)$ then even if "Variables $X,S$ are marginally independent; [...] they are not independent conditional on $Y$" — tomka, Feb 24 '20 at 17:55
Note that$$Q(\lambda, \phi|\lambda',\phi') = E_{X|Y,\lambda'}[l(\lambda|X)] +E_{S|Y,\phi'}[l(\phi|S)]$$only depends on the marginals. — Xi'an, Feb 24 '20 at 18:03
@Xi'an You are right. But I do not manage to derive $p(X|Y,\lambda'),p(S|Y,\phi')$. I only manage to derive the full conditionals $p(X|Y,S,\lambda'),p(S|Y,X,\phi')$ which are truncated distributions in my case. — tomka, Feb 24 '20 at 18:16
@Xi'an I have updated my question acknowledging your result and also suggested some method that might work, but which I am not certain about. — tomka, Feb 24 '20 at 19:43

Does EM algorithm require us to know the joint (predictive) distribution of the latent variables $Z$ when $Z$ is two-dimensional?

0 Answers0