Avoid numerical problems with product of probabilities when taking logs or subtracting the maximum is not enough

Question

I have to take draws from the discrete posterior distribution:

$ P(X = x_i |y) \propto P(X = x_i)\prod_{t}^N p(Y_t|X)$

where $P(X = x )$ is the probability mass function of a discrete uniform with boundaries $a$ and $b$ and $p(Y_t|x)$ is the pdf of a student-t with $X$ degrees of freedom.

In MATLAB my problem is that for my realizations $Y_t = y_t$ when I evaluate the product of the student-t pdfs, namely $\prod_{t}^N p(y_t|X = x_i)$ I get zeros because of numerical problems. I took logarithms and calculated what follows:

$ exp\left[log(P(X = x_i |y))\right] \propto exp\left[ log(P(X = x_i)) + \sum_{t}^N log(p(y_t|X=x_i))\right] $

But same problem since when I compute the exponential function I get all zeros. I have tried then to subtract the maximum of value of the $\sum_i^N log(p(y_t|X = x_i))$ but again I have numerical problems with the exponent.

What can I do?

N.B: In general I am trying to take draws from equation (7) in

http://people.bu.edu/jacquier/papers/jpr.je2004.pdf

can the acceptance requirement of the sampler be modified to use the log probabilities rather than the probabilities themselves (so you don't have to compute the exp)? — Dikran Marsupial, Aug 12 '21 at 16:22
Honestly I don't know any algorithm which draws a discrete random variable from it's pmf knowing only the log probabilities . If you know any it would be more than welcome! — Giorgetto, Aug 12 '21 at 16:32
I was thinking of something like a discrete rejection sampler where the acceptance criterion is something like $U < P(X)/cQ(X)$ then taking logs it would be something like $\log U < \log P(X) - \log\{cQ(X)\}$, but it has been a long day and I am pretty much mentally negligible at this point, so I realise that may be completely unhelpful! — Dikran Marsupial, Aug 12 '21 at 16:55
The key to success lies in the $\propto$; you can rescale the $p(Y_t|X)$ to anything you like, e.g., numbers with a median of around 1, and the l.h.s. probability is still $\propto$ the r.h.s. number. How large is $N$? That can help you with determining a rescaling factor. — jbowman, Aug 12 '21 at 17:36
Ok, thanks for the intuition, but how can I choose the scaling factor? I imagine it should depend on the scale of y and N (potentially N can be any from 100 to 10000, for this sample is 700) — Giorgetto, Aug 12 '21 at 18:05
Yes, if you multiply 700 numbers significantly smaller than 1 together, you will have a problem. So any rescaling factor $\lambda$ will rescale the final result by $\lambda^N$. I'd probably calculate a lot of the probabilities and see how much smaller than 1 they are, then use a rescaling factor that brings their product close to 1. — jbowman, Aug 12 '21 at 18:25
One doesn't compute the product and then take the log: that creates underflow. Instead, one computes the logs and *sums them.* You can sum a huge number of logarithms before underflowing! — whuber, Aug 12 '21 at 19:07
I don't know if you are referring to my question or to the the comment of @jbowman . My problem is exactly that even when I am doing $exp(\sum log(p(.)) )$ l have numerical problems — Giorgetto, Aug 12 '21 at 19:37
As it is $\propto$, as @jbowman suggests you can multiply by any constant you like, which is equivalent to putting an additive constant inside the exp, i.e. $\exp\{\sum \log(p\cdot) + C\}$ in matlab log(realmin) is -708.3964, so $-N$ times that would make the argument to the $\exp$ at least 0, giving an answer of 1, so that would be an upper bound. Probably want to use something smaller though. — Dikran Marsupial, Aug 13 '21 at 09:59
*Of course* you have numerical problems when you try to exponentiate a very negative number. You have three possible strategies: (1) use a platform that supports arbitrary-precision arithmetic. (2) Stick with the logarithms: usually it's unnecessary to exponentiate them. (3) Subtract the logarithm of a "nice" number, such as an exact (hugely negative) power of ten and express the exponential as a multiple of that nice number. (This elaborates on @Dikran's suggestion). *E.g.*, see https://stats.stackexchange.com/a/13803/919 and https://stats.stackexchange.com/a/532300/919. — whuber, Aug 13 '21 at 15:18

Dikran Marsupial · Answer 1 · 2021-08-13T12:31:39.687

I may not be that much less mentally negligible than last night, but here goes...

If this is only needed for the purpose of sampling from a discrete distribution, but we only have an un-normalised probability, $P(X=x_i) \propto Q(X=x_i)$, then we would generate a random number, $y$, in the range 0 to $U = \sum_{i}Q(X=x_i)$. Pick $x_1$ if $y < Q(X=x_1)$, else $x_2$ if $y < Q(X=x_2)$, and so on. The un-normalised probabilities are given by $Q(X=x_i) = \exp\{ \log(\lambda_i)\}$ and the problem is the $\lambda$ variables are so small that the $\exp$ is giving values that are small enough to cause numerical problems. Possible solution, find the smallest (rather than largest) $\lambda_i$, which is $\lambda^\ast$ and subtract it, so that $Q(X=x_i) = \exp\{\log(\lambda_i) - \lambda^\ast\}$. That means that the smallest argument to the $\exp$ function will be zero, giving $Q(X=x^\ast) = 1$, and all others will be greater than one. The simple roulette sampler should work as before as it is just multiplying all of the un-normalised probabilities by the same amount, $\exp\{\lambda^\ast\}$, so it just changes the value of $U$.

ETA: sorry to spell it out, I was mostly spelling it out to myself to get my "thinking" straight - it has been a long year!

I think it needs to be subtracting the minimum. If the lambdas are -4, -3 and 2, if you subtract the max, you end up with -6, -5 and 0 and the -6 is even smaller than before. But if you subtract the smallest, you end up with 0, 1 and 6, which have all been nudged in the right direction. — Dikran Marsupial, Aug 13 '21 at 11:16
;o) well spotted - I only make a fleeting appearance in a couple of his less well known works! — Dikran Marsupial, Aug 13 '21 at 15:29

Avoid numerical problems with product of probabilities when taking logs or subtracting the maximum is not enough

1 Answers1