Multiple Importance Sampling and Metropolis-Hastings on extended state space

Question

Let

$(E,\mathcal E,\lambda),(E',\mathcal E',\lambda')$ be measure spaces
$k\in\mathbb N$
$p,q_1,\ldots,q_k:E\to(0,\infty)$ be probability densities on $(E,\mathcal E,\lambda)$
$w_1,\ldots,w_k:E\to[0,1]$ with $\sum_{i=1}^kw_i=1$
$\varphi_1,\ldots,\varphi_k:E'\to E$ be $(\mathcal E',\mathcal E)$-measurable with $(\varphi_i)_\ast\lambda'=q_i\lambda$ (the left-hand side denotes the pushforward measure and the right-hand side the measure with density) for all $i\in\{1,\ldots,k\}$

Suppose we would like to run the Metropolis-Hastings algorithm with target measure $p\lambda$, but it's very complicated to find proposal kernels on $(E,\mathcal E,\lambda)$. On the other hand, we've got easy to implement transformations $\varphi_1,\ldots,\varphi_k$ as above and each $q_i$ is locally a good approximation of $p$.

Now the idea is the following: Let $\zeta$ denote the counting measure on $(\{1,\ldots,k\},2^{\{1,\:\ldots\:,\:k\}})$ and define $$\mu:=\left(w\frac pq\circ\varphi\right)\zeta\otimes\lambda'.$$ We could run the Metropolis-Hastings algorithm with target measure $\mu$ instead! The choice of the $w_i$ and the additional parameter should guarantee that the algorithm prefers to move to states where $q_i$ is a good approximation of $p$.

But how should we choose the weights $w_i$? Is there some "optimal" choice like the balance heuristic in multiple importance sampling?

EDIT: The setting of my actual application is too complicated to describe it here. So, I'll give a toy example: Let $f_{\mu,\:\sigma^2}$ and $\Phi_{\mu,\:\sigma^2}$ denote the density and cumulative distribution function of $\mathcal N_{\mu,\:\sigma^2}$. Take $k=2$, $\mu_i\in\mathbb R$, $\sigma_i,\varsigma_i>0$, $$p:=c_1f_{\mu_1,\:\sigma_1^2}+(1-c_1)f_{\mu_2,\:\sigma_2^2}$$ for some $c\in(0,1)$, $q_i:=f_{\mu_i,\:\varsigma_i^2}$ and let $\varphi_i$ be the quantile function $\Phi_{\mu_i,\:\varsigma_i^2}^{-1}$.

Here's a examp with $\mu_1=-2$, $\mu_2=2$, $\sigma_1=\sigma_2=1$, $\varsigma_1=2$ and $\varsigma_2=5$:

$p$ is the black, $q_1$ is the red and $q_2$ is the green function. Here we could take, for example, to take $w_1=1_{(-\infty,\:0)}$ and $w_2=1_{[0,\:\infty)}$.

"...prefers to move to states where $q_i$ is a good approximation of $p$" does not imply that the resulting random numbers have anywhere near the true target distribution. It is not at all clear to me that this is a workable approach, except perhaps in special cases... please correct me if I've misunderstood something! — jbowman, Aug 02 '19 at 16:18
@jbowman In the application I've got in mind, samples from each $q_i\lambda$ are locally quite well distributed, but not everywhere. — 0xbadf00d, Aug 02 '19 at 17:55
What example do you have in mind? Some extra context might be useful. — πr8, Aug 02 '19 at 20:01
@πr8 My real application is too complicated, but I've added a toy example. — 0xbadf00d, Aug 03 '19 at 09:28
In section 2.2 of the paper "Adaptive MCMC for multimodal distributions" by Holmes et al., they introduce a sampler I believe is very close to what you want - it combines mode-hopping with local exploration. — Forgottenscience, Aug 03 '19 at 09:43
@Forgottenscience I will take a look, but be aware that the distribution I'm interested in is not a multimodal distribution (I just pick one in the example), but a highly complex distribution on an infinite-dimensional space. — 0xbadf00d, Aug 03 '19 at 09:44
Adding actual content in your answer regarding what you really want to do would probably be more helpful than generic vagueness and an example that apparently doesn't match your problem at all. — Forgottenscience, Aug 03 '19 at 09:51
It seems like there's a couple of separate issues here. As I understand it, you're trying to choose the mixture weights $\{ w_i \}$ i) such that the variance of your Monte Carlo estimators are well-behaved, and ii) such that running MCMC on the resulting mixture distribution mixes well. The first issue is probably covered by the MIS literature, as you say. The second will depend on what sort of MCMC you're planning on applying to the new target - could you possibly expand more on that? — πr8, Aug 03 '19 at 09:51
@πr8 Sure, see my comment below. The question is with respect to which quantity the weights should be optimized. In the reference, they talk about the "variance of the target density" (which is the density of $\mu$ in the context of the question). What do they mean? — 0xbadf00d, Aug 03 '19 at 10:00
@Forgottenscience The actual problem is given in section 4.2 here: https://cgg.mff.cuni.cz/~jaroslav/papers/2018-mcmc-survey/2018-sik-mcmc-survey-paper.pdf. (I refered to the wrong section before; sorry!) — 0xbadf00d, Aug 03 '19 at 10:04

Multiple Importance Sampling and Metropolis-Hastings on extended state space

0 Answers0

Linked