How do I account for numerical overflow with Adaptive MCMC?

Question

EDIT: I tested Forgottenscience's solution below and it works; however, note that I found the working acceptance criterion to be that if $\log\alpha \geq \log u$, the point is accepted, where $u\sim\mathcal{U}(0,1)$.

I'm using Adaptive MCMC (i.e. Haario et. al 2001, link to the paper here) to optimize the covariance matrix/Metropolis-Hastings "step sizes" for a multi-dimensional proposal distribution, in order to use this proposal distribution with a Metropolis Hastings sampler. Due to the nature of the likelihood function that I need to use, I've found that at the best-fit set of parameters, which I use as the starting point for my Markov chain for adaptive MCMC, the likelihood evaluates to numerical infinity/overflow (as well as sets of parameters near this best fit). I know that for a non-adaptive MH sampler, I could use log-likelihood methods e.g. this post, but I've found that when I follow this solution for adaptive MCMC, the MH step sizes/proposal distribution variances that Adaptive MCMC iterates to are far too small, and I'm getting a MH acceptance ratio that is way too big (about 0.98) when running the MH samplers with those step sizes. Has anyone encountered a similar problem before? Thanks!

EDIT: My likelihood is of the form

$\displaystyle\mathcal{L}\propto \prod_{n=1}^{N}{ \left(\frac{m_{t,n}^2\Sigma_{x,n}^2+\Sigma_{y,n}^2}{m_{t,n}^2\Sigma_{x,n}^4+\Sigma_{y,n}^4}\right)^{\displaystyle w_n/2}\times\exp\left\{{-\frac{1}{2}w_n\frac{\left[y_n-y_{t,n}-m_{t,n}(x_n-x_{t,n})\right]^2}{m_{t,n}^2\Sigma_{x,n}^2+\Sigma_{y,n}^2}}\right\}}$,

where there are $N$ datapoints $(x_n,y_n)$, weighted by $w_n$; $(\Sigma_{x,n}, \Sigma_{y,n})$ are parameters related to the combined $x-$ and $y-$uncertainties of the $n^\text{th}$ datapoint and the model I'm fitting to with this likelihood, and $m_{t,n},x_{t,n},y_{t,n}$ are parameters that are different for each datapoint. I've pinpointed the reason why this likelihood blows up; I'm working with about 700 datapoints, many of which have small values of $(\Sigma_{x,n}, \Sigma_{y,n})$, which is making the prefactor term $\left(\frac{m_{t,n}^2\Sigma_{x,n}^2+\Sigma_{y,n}^2}{m_{t,n}^2\Sigma_{x,n}^4+\Sigma_{y,n}^4}\right)^{w_n/2}$ blow up, especially if the weight $w_n$ is high. Essentially, I can't modify these data or parameters, so it's unavoidable that this likelihood blows up. The only solution I can think of would be to somehow instead use $\ln\mathcal{L}$ (a summation instead of a product) for the Adaptive Metropolis algorithm, but I'm not sure how this would work with the Metropolis Hastings acceptance ratio.

I have hardly seen cases, where the likelihood becomes numerically infinite. Could you maybe tell us more precisely how your likelihood looks like? Then it may be easier to help you. — Jonas, Mar 08 '20 at 09:36
Does this answer your question? [Metropolis-Hastings using log of the density](https://stats.stackexchange.com/questions/137710/metropolis-hastings-using-log-of-the-density) — Xi'an, Mar 25 '20 at 07:21

score 4 · Accepted Answer · answered Mar 10 '20 at 15:57

4

There is no issue in converting the entire problem to log-scale. In this case, with the prior $\pi_0$, the new point $z'$ and old point $z$

$$\log\alpha = \min \left \{0, \log \mathcal L(z') - \log \mathcal L(z) + \log \pi_0(z') - \log \pi_0(z) + \log q(z|z') - \log q(z'|z) \right \}$$

Note that in this case you are proposing and accepting on log-space.

answered Mar 10 '20 at 15:57

Forgottenscience

1,186
6
10

Thank you! So in this case, I would still be accepting $z'$ with probability $\alpha$, as usual, for Haario et. al's Adaptive MCMC + Metropolis-Hastings? However, I'm confused about your notation $q$; what are you describing with $q$? – SandwichTheorem Mar 10 '20 at 16:12
1

The probability has been changed to log-space, so you should use the appropriate log-transformed uniform, $-\log u$. $q$ is just a proposal density. – Forgottenscience Mar 10 '20 at 16:16
I guess I'm confused about one subtlety then: for the regular, "linear-space" MH, we accept some trial $z'$ and make it the next $z$ if $\alpha \geq u$, where $\alpha=\frac{\mathcal{L}(z')\pi_0(z')}{\mathcal{L}(z)\pi_0(z)}$ and $u\sim\mathcal{U}(0,1)$; in either case, we add $z'$ to the overall sampling, regardless if it becomes the next step in the Markov Chain. Then, for this log case, If I understand correctly, we accept $z'$ if $\log\alpha\geq-\log u$, and reject otherwise? Then, as this is Adaptive MH, we update the proposal $q$ only when a new point is accepted? – SandwichTheorem Mar 10 '20 at 17:21
1

Your inherent algorithm shouldn't change. You should still only update whenever you have an acceptance. While I appreciate Haario's contribution as significant at its time, today I would probably prefer to use Titsias and Dellaportas' adaptive MCMC https://arxiv.org/abs/1911.01373 instead, it is very fast and uses all points (also rejected ones). – Forgottenscience Mar 11 '20 at 10:05

How do I account for numerical overflow with Adaptive MCMC?

1 Answers1