In MCMC simulation, how to deal with very small likelihood values that couldn't be represented by computer?

Question

I am working on a Bayesian project based on Stagnant data from a OpenBugs example, which is a changepoint problem. Basically we assume a model with two straight lines that meet at a certain changepoint $x_k$. The basic setup is as following. \begin{align*} Y_i \ & \sim \ N(\alpha + \beta_1 (x_i - x_k), \sigma^2), \; i = 1, \ldots, k \\ Y_i \ & \sim \ N(\alpha + \beta_2 (x_i - x_k), \sigma^2), \; i = k+1, \ldots, n \\ \end{align*} The priors are \begin{align*} \alpha \ & \sim \ N(\mu_{\alpha}, \sigma^2_{\alpha}), \quad \sigma^2 \ \sim \ IG(a, b) \\ \beta_1, \beta_2 \ & \sim \ N(\mu_{\beta}, \sigma^2_{\beta}), \quad k \ \sim \ Unif\{1, n\} \end{align*} That is, $k$ is assumed to have a discrete uniform. The changepoint is constrained to be one observed $x$ value.

The full conditional distributions for the parameters can be derived and Gibbs sampling can be used for MCMC simulation.

The data likelihood is \begin{align*} & p(\mathbf{x}, \mathbf{y}| k,\alpha, \beta_1, \beta_2, \sigma^2) = \prod_{i=1}^{k} p_1(y_i| k, .) \prod_{i=k+1}^{n} p_2(y_i|k,.) \\ & = (2 \pi \sigma^2)^{-n /2} \exp\left\{- \frac{1}{2 \sigma^2} \sum_{i=1}^k (y_i - \alpha - \beta_1 (x_i - x_k)) ^ 2 \right\} \\ & \quad \times \exp\left\{- \frac{1}{2 \sigma^2} \sum_{i=k+1}^n (y_i - \alpha - \beta_2 (x_i - x_k)) ^ 2 \right\} \end{align*} And the full conditional of $k$ is (which is a discrete distribution) \begin{align*} p(k = \mathcal{K}|.) =& \frac{p(\mathbf{x}, \mathbf{y}| \mathcal{K}, \alpha, \beta_1, \beta_2, \sigma^2) \times \frac{1}{n}}{\sum_{k \in \{1, \ldots, n\} } \frac{1}{n} \times p(\mathbf{x}, \mathbf{y}| k, \alpha, \beta_1, \beta_2, \sigma^2)} \\ =& \frac{p(\mathbf{x}, \mathbf{y}| \mathcal{K}, \alpha, \beta_1, \beta_2, \sigma^2)}{\sum_{k \in \{1, \ldots, n\} } p(\mathbf{x}, \mathbf{y}| k, \alpha, \beta_1, \beta_2, \sigma^2)} \end{align*}

My problem is that when I sample $k$, I have to update $p(k = \mathcal{K}|.)$ by computing the data likelihood at each $\mathcal{K} = 1, \ldots, n$, which could be very small that cannot be represented and thus the updated probability couldn't be computed as well.

Below is an example to demonstrate the tiny likelihood values.

data <- list(Y = c(1.12, 1.12, 0.99, 1.03, 0.92, 0.90, 0.81, 0.83, 0.65, 0.67, 0.60, 0.59, 0.51, 0.44, 0.43,
           0.43, 0.33, 0.30, 0.25, 0.24, 0.13, -0.01, -0.13, -0.14, -0.30, -0.33, -0.46, -0.43, -0.65),
     x = c(-1.39, -1.39, -1.08, -1.08, -0.94, -0.80, -0.63, -0.63, -0.25, -0.25, -0.12, -0.12, 0.01, 0.11, 0.11, 
           0.11, 0.25, 0.25, 0.34, 0.34, 0.44, 0.59, 0.70, 0.70, 0.85, 0.85, 0.99, 0.99, 1.19),
     N = 29)

Y <- data$Y
x <- data$x
N <- data$N

## function: compute the data log likelihood given k and other parameters
log.like <- function(k, alpha, beta1, beta2, s2) {
  log.like1 <- 0
  log.like2 <- 0
  for(i in 1:k) {
    log.like1 <- log.like1 + dnorm(Y[i], mean = alpha + beta1 * (x[i] - x[k]),
                           sd = s2, log = TRUE)
  }
  if (k < N) {
    for(i in (k+1):N) {
      log.like2 <- log.like2 + dnorm(Y[i], mean = alpha + beta2 * (x[i] - x[k]),
                             sd = s2, log = TRUE)
    }
  } 
  log.like1 + log.like2
}


## Example to show small likelihood values: these parameter values are 
## random intermediate values during iterations.
k <- 13
alpha <- 0.75843864
beta1 <- -0.13184158
beta2 <- -1.25138517
s2 <- 0.01523551
loglik <- unlist(lapply(1:N, FUN = log.like, alpha, beta1, beta2, s2))

The output is following

> loglik
 [1] -127755.2887 -127755.2887  -71642.5525  -71642.5525  -52252.7596  -36218.2543
 [7]  -21043.7006  -21043.7006   -2648.1820   -2648.1820    -835.1095    -835.1095
[13]    -883.2403   -2096.3249   -2096.3249   -2096.3249   -4769.7051   -4769.7051
[19]   -6969.4129   -6969.4129   -9569.3415  -14106.0524  -17779.1912  -17779.1912
[25]  -22298.1640  -22298.1640  -25656.7013  -25656.7013  -28620.3518
> exp(loglik)
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The log likelihood values are so small that the exponentiated values are ZERO. But we can see that the log likelihood values if k = 11, 12, 13 are pretty large than others, which could lead to higher probability for $k = 11, 12$, or $13$. And that is as expected! I wish to update $p(k = \mathcal{K}|.)$ based on those tiny likelihood.

My question is: how could I deal with this issue? Am I doing something wrong here that resulted in this situation? Any suggestions are highly appreciated.

EDIT This thread discussed exactly what I need here.

Converting (normalizing) very small likelihood values to probability

Some other similar/related topics are:

Computation of likelihood when $n$ is very large, so likelihood gets very small?

What to do when your likelihood function has a double product with small values near zero - log transform doesn't work?

END EDIT

Why can't you work on the log scale? The posterior is the product of the component parts, so it's the sum of their logarithms. — Sycorax, May 03 '15 at 22:40
The basic issue here is underflow (sometimes overflow) in likelihood calculations. This is a routine problem, with fairly obvious solutions. It has been asked about here a number of times. For example, see some of the strategies here: [Computation of likelihood when $n$ is very large, so likelihood gets very small](http://stats.stackexchange.com/questions/56724/computation-of-likelihood-when-n-is-very-large-so-likelihood-gets-very-small/), such as scaling both numerator and denominator by similar factor, then working on the log scale (don't exponentiate unless/until you have to) ... — Glen_b, May 03 '15 at 22:49
... if you apply both those suggestions you shift all the log-likelihoods by a similar constant (it can be a different constant each time you do a new calculation though - e.g. in your example, you could add 835.1095 to them all, making the largest ones 0). In many cases you can avoid exponentiation altogether, but where you can't avoid it, you leave it as late as possible in the calculation. — Glen_b, May 03 '15 at 23:05
@user777, thanks. I CAN and I did compute likelihood value at each k on log scale. But at the end of the day when update $p(k)$, I have to go back to original scale since the denominator is the sum of likelihood values, which cannot be converted to log scale. — SixSigma, May 03 '15 at 23:06
@Glen_b, my problem is when updating $p(k)$, the denominator is sum of likelihood values. I am not sure how I can work that on log scale. — SixSigma, May 03 '15 at 23:10
Did you miss the mention of scaling (shifting them all by a constant on the log scale) there? Further explanation of relevant issues is given [here](http://stats.stackexchange.com/questions/142254/what-to-do-when-your-likelihood-function-has-a-double-product-with-small-values/142324#142324) — Glen_b, May 03 '15 at 23:11
Thanks a lot, @Glen_b. Adding a constant is a good idea, though I am still concerned that the smallest value couldn't be exponentiated either. In my example, if adding 835.1095 to the smallest value -127755.2887, it won't help much. The exponentiation would fail again. — SixSigma, May 03 '15 at 23:13
What's the relative probability there? Effectively 0. You'd never move there if you repeated trials for the age of the universe. — Glen_b, May 03 '15 at 23:14
@Glen_b, thanks for pointing out the related threads. Those are really helpful. I will see whether I can figure that out. — SixSigma, May 03 '15 at 23:14
Some additional information that can be useful for this kind of calculation [here](http://stats.stackexchange.com/questions/94271/how-to-properly-handle-infs-in-a-statistical-function/94277#94277) — Glen_b, May 03 '15 at 23:45

In MCMC simulation, how to deal with very small likelihood values that couldn't be represented by computer?

0 Answers0

Linked