4

In PRML Chapter 10 Approximate Inference, the fist sentence of the chapter says

A central task in the application of probabilistic models is the evaluation of the posterior distribution $p(Z|X)$ of the latent variables $Z$ given the observed (visible) data variables $X$, and the evaluation of expectations computed with respect to this distribution.

The book then uses EM as an example, for we need to compute the expectation of $p(Z|X)$ in the E step and the expectation of the log likelihood with respect to $p(Z|X)$ in the M step, which is clear.

What else are some common problems that require the evaluation of the posterior $p(Z|X)$ (of latent variables)?


Update
After a year I've realized at the time this question was posted, my experience with machine learning was mostly limited to solving MLE/MAP by gradient descent that only require computing a likelihood and its gradient. Followings are some models I've worked with later on that require the evaluation of the posterior probability.

EM
EM itself is a optimization algorithm for MLE/MAP, nevertheless in every iteration the E-step computes the posterior $p(Z|X)$, which could be solved efficiently in the HMM/GMM setting.

recursive filtering
In the prediction step we compute the marginal $$p(z_t|x_{1:t-1})=\int p(z_t|z_{t-1})p(z_{t-1}|x_{1:t-1})dz_{t-1}$$ in the correction step we compute the posterior $$p(z_t|x_t, x_{1:t-1})\propto p(x_t|z_t)p(z_t|x_{1:t-1})$$
The Kalman filtering algorithm assumes an underlying linear Gaussian system, so both the marginal and the posterior have close-form solutions.

The Extended Kalman filtering algorithm is designed for nonlinear Gaussian systems. In this setting the posterior no longer have close-form solutions, so approximate inference is used to approximate a Gaussian distribution for $p(x_t|z_t)$, so that the posterior becomes Gaussian again.

While the particle filtering algorithm uses sampling methods to approximate a broader range of probability distributions (and nonlinear dynamics).

dontloo
  • 13,692
  • 7
  • 51
  • 80

1 Answers1

2

Any method involving a latent variable can be represented in the terms you mentioned and the EM algorithm is often used as a part of the estimation or prediction procedure for $Z$, such as

  1. Mixture modeling, such as latent class analysis
  2. Item response theory
  3. Factor analysis

Note that not always the use of the posterior distribution of $Z$ is obvious, since not all estimation approaches use Bayesian statistics. However, in each of the three examples distinct representations can be given to $X$ and $Z$, i.e.

  1. Latent class analysis: $Z$ is a discrete latent variable representing class membership, $X$ is a vector of the observed discrete indicators of latent classes.
  2. Item response theory: $Z$ is a continuous variable representing a "trait" or other true cause of $X$, which is a vector of observed discrete indicators of the latent trait.
  3. Factor Analysis: $Z$ is a continuous variable representing a trait or other true cause of $X$ which is a vector of continuous indicators of the latent factor.

I all of the above examples, $Z$ may also be a vector, but there are many practical applications where unidimensionality of $Z$ is assumed.

tomka
  • 5,874
  • 3
  • 30
  • 71