1

Suppose we have an latent r.v. $Z$ (not observed) and an observed r.v. $X$, where $X$ depends on $Z$ via some conditional distribution $p(x|z)$. Given $x$, we will try to infer $z$.

Standard maximum likelihood inference asks: given $x$, find $z^*$ that maximizes $p(x|z^*)$.

Consider the following alternate "variational" method: we find the distribution $p^*(z)$ that maximizes $\sum_z p(x|z) p^*(z)$, then find the $z^*$ that maximizes $p^*(z^*)$.

Do these two methods always yield the same result $z^*$?

D.W.
  • 5,892
  • 2
  • 39
  • 60

1 Answers1

0

Yes. If $z^*$ maximizes $p(x|z^*)$, then the optimal distribution for $p^*(z)$ is a one-hot distribution that assigns all of its weight to $z^*$, i.e., $p^*(z^*)=1$ and $p^*(z)=0$ for all $z \ne z^*$.

D.W.
  • 5,892
  • 2
  • 39
  • 60