Computation of likelihood when $n$ is very large, so likelihood gets very small?

Question

I am trying to compute this posterior distribution:

$$ (\theta|-)=\frac{\prod_{i=1}^{n}p_i^{y_i}(1-p_i)^{1-y_i}}{\sum_{\text{all}\,\theta,p_i|\theta}\prod_{i=1}^{n}p_i^{y_i}(1-p_i)^{1-y_i}} $$

The problem is that the numerator, which is the product of a bunch of $\text{Bernoulli}(p_i,y_i)$ probabilities is too small. (My $n$ is large, about 1500).

Hence, the posterior values for all $\theta$ all get calculated to be 0 (I am doing calculations in R).

To clarify, each $y_i$ has its own $p_i$, together these $p_i$'s make a vector of $n$ elements for $n$ $y$'s. Each $\theta$ has its own $n$-element vector of $p_i$.

EDIT: Adding a reproducing example (for the numerator)

p <- sample(seq(0,1,by=0.01), 1500, replace=T)
y <- sample(c(0,1), 1500, replace=T)
dbern(y, p) # 1500-element vector, each element is < 1
prod(dbern(y, p)) # produce 0
exp(sum(log(dbern(y, p)))) # produce 0 since the sum is very negative

There's related discussion [here](http://stats.stackexchange.com/questions/66616/converting-normalizing-very-small-likelihood-values-to-probability). It has some additional discussion of some of the details of such calculations. — Glen_b, May 03 '15 at 23:43

Glen_b · Accepted Answer · 2015-05-03T23:48:12.400

This is a common problem with computation of likelihoods for all manner of models; the kinds of things that are commonly done are to work on logs, and to use a common scaling factor that bring the values into a more reasonable range.

In this case, I'd suggest:

Step 1: Pick a fairly "typical" $\theta$, $\theta_0$. Divide the formula for both numerator and denominator of the general term by the numerator for $\theta = \theta_0$, in order to get something that will be much less likely to underflow.

Step 2: work on the log scale, this means that the numerator is an exp of sums of differences of logs, and the denominator is a sum of exp of sums of differences of logs.

NB: If any of your p's are 0 or 1, pull those out separately and don't take logs of those terms; they're easy to evaluate as is!

[In more general terms this scaling-and-working-on-the-log-scale can be seen as taking a set of log-likelihoods, $l_i$ and doing this: $\log(\sum_i e^{l_i})= c+\log(\sum_i e^{l_i−c})$. An obvious choice for $c$ is to make the largest term 0, which leaves us with: $\log(\sum_i e^{l_i})= \max_i(l_i)+\log(\sum_i e^{l_i−\max_i(l_i)})$. Note that when you have a numerator and denominator you could use the same $c$ for both, which will then cancel. In the above, that corresponds to taking the $\theta_0$ with the highest log-likelihood.]

The usual terms in the numerator will tend to be more moderate in size, and so in many situations the numerator and denominator are both relatively reasonable.

If there are a range of sizes in the denominator, add up the smaller ones before adding the larger ones.

If only a few terms dominate heavily, you should focus your attention on making the computation for those relatively accurate.

But for all theta, the numerator always go to 0. How do I divide the general term by the numerator then? (Step 1) — Heisenberg, Apr 22 '13 at 01:16
Step 1 is *algebra* not computer calculation. Its purpose is to give you something in Step 2 to compute that doesn't underflow. Unless you're saying it's always *algebraically* zero... in which case you're doubtless doing something you ought not. — Glen_b, Apr 22 '13 at 01:18
okay -- I will give it a try. The numerator is not exactly 0, only very small that R cannot compute. Thanks! — Heisenberg, Apr 22 '13 at 01:26
Dear God, you are correct! Thank you so, so much. Everyone keeps saying "use log.likelihood" but only you really see the problem. — Heisenberg, Apr 22 '13 at 05:16

score 1 · Answer 2 · answered Apr 21 '13 at 03:36

1

Try capitalizing on the properties of using the logarithms and summation rather than taking the product of decimal numbers. Following the summation just use the anti-log to put it back into your more natural form. I think something like this should do the trick

$\frac{exp(\sum_{i}^{n}(y_{i}*log(p_{i})+(1-y_{i})*log(1-p_{i})))}{\sum_{g}exp(\sum_{i}^{n}y_{i}*log(p_{i})+(1-y_{i})*log(1-p_{i}))}$

answered Apr 21 '13 at 03:36

philchalmers

2,641
1
14
22

The numerator in your suggestion still produces a 0 since the sum within exp( ) is still very negative (< -1000). Am I doing anything wrong? Thanks for your help! – Heisenberg Apr 21 '13 at 03:51
Well, if any value in p is actually 0 or 1 then automatically the log of it will produce -inf and so will log(1-p). Otherwise I think the numbers just becoming too small to be raised back to the original form. – philchalmers Apr 21 '13 at 03:59
2

Note that you can add and subtract any constant $c$ from the terms inside the $\exp()$ the above expression without changing the result. setting $c$ equal to the negative of the maximum value of $\log(p(\theta|-))$ provides the best numerical accuracy – probabilityislogic Apr 22 '13 at 05:00

Computation of likelihood when $n$ is very large, so likelihood gets very small?

2 Answers2

Linked

Related