4

According to this lecture note, Eq. 25 gives the coordinate ascent update for latent variable $z_k$ as follows $$q^*(z_k)\propto\exp(E_{-k}[\log{p(z_k,Z_{-k},x)}])$$ and I understand the derivation for this formula. But in the following Bayesian mixtures of Gaussians example section, try to find the specific update formula for latent variable $z_i$ as the following picture: enter image description here

Eq. 36 is the same as the above Eq. 25, what confuses me is Eq. 38, why omit the terms which don't rely on $z_i$?

I think if we substitute Eq. 37 back into 36, we will have more terms than Eq. 38, like this $$q^*(z_i)\propto\exp(\log\pi_{z_i}+E[\log p(x_i|\mu_{z_i})]+E[x])$$

where $E[x]$ is a short-hand notation for those more terms, I just don't quite understand why we could omit them? Just because it could be re-written as follows, which is like a constant multiplying the $\exp$? $$q^*(z_i)\propto \exp(E[x])\cdot\exp(\log\pi_{z_i}+E[\log p(x_i|\mu_{z_i})])=\alpha\cdot \exp(\log\pi_{z_i}+E[\log p(x_i|\mu_{z_i})])$$

avocado
  • 3,045
  • 5
  • 32
  • 45

1 Answers1

2

Your last equation has just about arrived at the answer. Take that last equation:

$$q^*(z_i)\propto \alpha\cdot \exp(\log\pi_{z_i}+E[\log p(x_i|\mu_{z_i})])$$

and explicitly write the $\propto$ relation like so

$$q^*(z_i) = \frac{\alpha\cdot \exp(\log\pi_{z_i}+E[\log p(x_i|\mu_{z_i})])}{\sum_i \alpha\cdot \exp(\log\pi_{z_i}+E[\log p(x_i|\mu_{z_i})])}$$

then we realize that because it's constant with respect to $z$, $\alpha$ cancels out.

In a more general, if hand-wavy, way: When developing a gradient method, we take the gradient of the objective, and any terms that are constant with respect to the variable of interest have a derivative of zero.

Sean Easter
  • 8,359
  • 2
  • 29
  • 58