3

Consider a Bayesian framework where we have priors for some parameters and a likelihood based on the data. Consider the likelihood (and its parametric format) to be very sensitive to the choice of the parameters in a sense that for some not-so-proper parameter proposals, likelihood may overflow and become infinity. This is particularly a problem as in Metropolis-Hastings, you fail to compute the acceptance rate where you have likelihood overflow.

So, I wanted to ask if someone has a solution to deal with this situation. I'm looking for solutions on:

  • Avoiding likelihood overflow
  • If likelihood overflow happens, properly calculating acceptance probability
Xi'an
  • 90,397
  • 9
  • 157
  • 575
Sam
  • 2,104
  • 19
  • 29
  • 1
    Numerous posts on site deal with the closely related issue of underflow/overflow in calculation of likelihood. For example, see the answers [here](http://stats.stackexchange.com/questions/56724/computation-of-likelihood-when-n-is-very-large-so-likelihood-gets-very-small). – Glen_b Oct 12 '15 at 21:24

2 Answers2

6

This is a very good question! Although a much less common situation than likelihood underflow.

If the likelihood at a given proposed value $\theta'$ is exactly $+\infty$, then the chain should move there and not move except to other values of $\theta$ with infinite likelihood. I do not know of such cases.

If the likelihood at a given proposed value $\theta'_0$ is much larger than the likelihood at the current value $\theta$ so that $$\dfrac{\pi(\theta') \ell_n(\theta'_0|x)}{\pi(\theta) \ell_n(\theta|x)}\gg 1$$ to the point that it creates an overflow in a computer code, the likelihood can be renormalised by $$\tilde\ell_n(\theta)=\exp\{\log(\ell_n(\theta))-\log(\ell_n(\theta'_0))\}$$ which means that $\tilde\ell_n(\theta'_0)=1$ while $$\dfrac{\pi(\theta'_0) \ell_n(\theta'_0|x)}{\pi(\theta) \ell_n(\theta|x)}=\dfrac{\pi(\theta_0') \tilde\ell_n(\theta'_0|x)}{\pi(\theta) \tilde\ell_n(\theta|x)}$$ This renormalisation somehow turns overflow issues into underflow issues, which are easier to handle because such values of $\theta$ are not of interest for the Markov chain.

Note also that a related problem may also happen because of the proposal density $q(\theta'|\theta)$: when the Metropolis-Hastings ratio $$\alpha(\theta,\theta')=\dfrac{\pi(\theta'|x)}{\pi(\theta|x)}\dfrac{q(\theta|\theta')}{q(\theta'|\theta)}\wedge 1$$ is computed, if $q(\theta|\theta')$ is very small, while $q(\theta'|\theta)=\text{O}(1)$ and $\pi(\theta'|x)\approx\pi(\theta|x)$, the chain may get stuck forever at $\theta$. This happens for instance when $\pi(\cdot)$ is almost constant over a large compact and $q(\theta|\theta')$ is the density of a normal $\mathcal{N}(0,\sigma^2)$ with a small $\sigma$.

Xi'an
  • 90,397
  • 9
  • 157
  • 575
1

A very common approach is to work with log likelihood values rather than likelihood itself. Rather than multiplying the likelihood and prior, you can add the log likelihood to the log of the prior density.

Brian Borchers
  • 5,015
  • 1
  • 18
  • 27