3

I'm working to implement Bayesian model selection among models whose posteriors have already been sampled via MCMC. After reviewing some discussions of Bayes factors, I understand that they are sensitive to choice of prior, that their computation via the harmonic mean of marginal likelihood samples is treacherous, and that alternate information criteria are available.

[ $y$=data, $M$=model, $\vec{\theta}$=parameters, $V_{\vec{\theta}}$=volume of parameter space, $N$=# MCMC samples ]

Question 1)

The Bayes factor is the ratio of marginal likelihoods of two models, $$\frac{\text{p}(y|M_0)}{\text{p}(y|M_1)}.$$ For a given model, is it OK to think of the marginal likelihood as an integral over the parameter posterior? I.e. $$ \begin{split} \text{p}(y|M_i) & = \int \text{p}(y|\vec{\theta},M_i) \text{p}(\vec{\theta},M_i)\text{d}\vec{\theta} \\ & \propto \int \text{p}(\vec{\theta}|y,M_i)\text{d}\vec{\theta} \text{ ?} \end{split} $$

Question 2)

If my understanding in Question 1 is correct, when would it be safe to calculate $\text{p}(y|M_i)$ using MC approximation of $\int \text{p}(\vec{\theta}|y,M_i) \text{d}\vec{\theta}$ ? If my MCMC procedure has already converged upon the full parameter posterior, and each parameter's marginal posterior appears well-sampled, can I reliably calculate $\text{p}(y|M_i)$ using $$ \text{p}(y|M_i) \propto \int \text{p}(\vec{\theta}|y,M_i) \text{d}\vec{\theta} \approx \frac{V_{\vec{\theta}}}{N} \sum_{j=1}^{N} \text{p}(\vec{\theta_j}|y,M_i) \text{ ,} $$ where $\text{p}(\vec{\theta_j}|y,M_i)$ are the MCMC samples?

Computation of the marginal likelihood from MCMC samples seems like a detailed response, but I'd like to know if I'm on track with my initial concept of what goes into a calculation for $\text{p}(y|M_i)$ from MCMC samples.

1 Answers1

1

Regarding 1, your expression needs to be adjusted a little: \begin{split} \text{p}(y|M_i) & = \int \text{p}(y,\vec{\theta} | M_i)\text{d}\vec{\theta} \\ & =\int \text{p}(y|\vec{\theta},M_i) \text{p}(\vec{\theta} |M_i)\text{d}\vec{\theta} \\ &= E_{\vec{\theta} |M_i}[\text{p}(y|\vec{\theta},M_i)]. \end{split} Your $\int \text{p}(\vec{\theta}|y,M_i)\text{d}\vec{\theta}$ is actually just equal to $1$ (because it's the integral of a density), and your $\int \text{p}(y|\vec{\theta},M_i) \text{p}(\vec{\theta},M_i)\text{d}\vec{\theta}$ is equal to $\text{p}(y,M_i)$ (not $\text{p}(y|M_i)$).

This does not use draws from the posterior, though. For that you would need a different expression. This is explained in the thread you linked to.

Regarding 2, yes, that's quite a meaty thread. I'd start there. You might be trying to write $$ E_{\vec{\theta} |M_i}[\text{p}(y|\vec{\theta},M_i)] \approx N^{-1}\sum_{j=1}^N\text{p}(y|\vec{\theta}^j,M_i) \tag{1} $$ where $\vec{\theta}^j$ are samples from the prior. That's one version of a Monte-Carlo or importance sampling estimate.

Edit: as was mentioned in the comments, you might consider using another importance sampling estimator, one with a different proposal/instrumental density: $$ N^{-1}\sum_{j=1}^N\frac{\text{p}(y|\vec{\theta}^j,M_i) \text{p}(\vec{\theta} |M_i)}{ \text{q}(\vec{\theta})} \tag{2}. $$

Taylor
  • 18,278
  • 2
  • 31
  • 66
  • Yes, with an importance weight, another distribution works better. The only one that does not work is the posterior itself (harmonic mean estimator warning!). – Xi'an Mar 05 '19 at 19:10
  • @Taylor, each step in my MCMC chain represents a $\vec{\theta}^j$ (drawn from joint prior) for which $p(y|\vec{\theta}^j,M_i)$ was calculated, so I should be able to use those existing samples to approximate $E_{\vec{\theta}|M_i}[\text{p}(y|\vec{\theta},M_i)]$. Regarding my second question, if my MCMC procedure has already converged upon the joint parameter posterior, can I be confident in going ahead and using those converged $p(y|\vec{\theta}^j,M_i)$ samples in calculating $E_{\vec{\theta}|M_i}$? – curiousStudent Mar 05 '19 at 19:51
  • 1
    @Xi'an I apologize I was thinking $\text{p}(\vec{\theta} |M_i)$ is the posterior instead of the prior(!) – Taylor Mar 05 '19 at 20:23
  • @curiousStudent samples are from the prior in estimate (1). In (2), you could set $q$ equal to the posterior, and evaluate that fraction for each sample, but that would probably give you a high-variance estimate. – Taylor Mar 05 '19 at 20:25