Latent variables, overparameterization and MCMC convergence in bayesian models

Question

Sometimes I have a large number of latent variables in a Bayesian hierarchical model to which, but I am only interested in estimating projected transformations of those latent variables (for example, I will parameterize a binomial parameter as an inverse logit of a set of possibly-non-identifiable covariates, even though the result I'm interested in is the binomial parameter estimate).

The projected transformations will often converge very quickly (based on convergence diagnostics such as the Gelman/Rubin or by eyeballing the posterior samples) even if the latent variables have not yet converged.

Intuitively this makes sense, the model may be an overparameterization where the latent parameters are not identifiable - the derived quantities are constrained to be in a constrained a narrow high-likelihood region of the transformed variables' parameter space which maps to a much larger largely flat likelihood (but bounded) region of the latent variable parameter space.

So is the intuition correct that I shouldn't be concerned that the overparameterized latent variables are not identifiable and aren't fully converged when I take my posterior samples? Are there some good references which discuss the use of non-identified latent variables in this way? I've heard some discussion on overparameterizing to speed up mcmc convergence, but I'm not entirely clear on how to think about this, as the approaches and attitudes towards overparameterization and non-identifiability in bayesian methods seems to be a bit different than in other areas of modeling.

Dear anonymous user, you could become a much better CV citizen by providing some semblance of a name and accepting more answers. — StasK, Sep 01 '12 at 19:04
I went back and accepted a few more answers in my history... To be honest, I can't see the checkmark shades very well and I forget if I've accepted an answer sometimes. Regarding anonymity, why does it matter what my name is if I'm making an effort to contribute content? — user4733, Sep 01 '12 at 19:23

score 6 · Accepted Answer · edited Mar 09 '17 at 17:30

So is the intuition correct that I shouldn't be concerned that the overparameterized latent variables are not identifiable and aren't fully converged when I take my posterior samples?

I think your intuition is correct: you shouldn't be concerned that the overparameterized latent variables are not identifiable and aren't fully converged. In fact, the latent variables likely can't converge; my understanding is that in this situation the full state space chain is null recurrent, even though by your account there is a transformed state space of smaller dimension in which the chain is full recurrent (and hence has a stationary distribution). For what it's worth, I have deliberately created and used such MCMC chains myself in my applied research.

Sometimes stochastic processes with these features are used to model time series data (key word: cointegration). A quick look at this plot might generate some intuition:

The upper figure shows two price time series, which one might think of as nonstationary in due to inflation even though no inflation can be seen on the time scale of the plot. Although each time series taken alone is nonstationary, there can exist a smaller dimensional manifold within the full state space (in this case, the "spread", i.e., the difference of the time series) such that the stochastic process generated by projecting the original process onto the manifold is stationary.

Are there some good references which discuss the use of non-identified latent variables in this way?

I don't know of any references that discuss the use of non-identified latent variables in this exact way, but here are a technical report and a published paper on the subject by Andrew Gelman, and here is a more recent manuscript by a different author that I think might be closer to what you're doing than the previous two references.

what if you want to do some inference for the latent variables? Can you use some trick to constrain those latent variables? For example, use a truncated proposal distribution only in one feasible domain, not the whole space? — Albert Chen, Oct 21 '21 at 01:05

Latent variables, overparameterization and MCMC convergence in bayesian models

1 Answers1

Linked