How to randomly sample values given the extremly small and large log-probabilities?

Question

Assume a long list of log-values. The list consists of very small negative numbers, very large negative numbers, as well as very large positive numbers. To avoid numerical overflow/underflow, I need to stay in log space. Now the question is, how can I randomly select an element of this list, with a probability proportional to its value? A naive way would be to switch back from the log space, but that will result in nan. Another naive way is to stay in the log space and just sort and then normalize all the values but this is still can cause numerical underflow.

An example array:

[-4.92636e+08, 1.92636e+06, -1.85264e+03, 2.2636e-04]

You want to sample them with probability proportional to their value or to their value exponentiated? — Tim, Nov 23 '16 at 10:51
They are in log, so if I exponentiate them I will run into underflow/overflow. But, ideally yes, I would like to sample proportional to the actual values, which would be the exponentiated version of each log value. — user3639557, Nov 23 '16 at 10:53
Could you, for clarity, prepare a simple and small data-example that resembles our problem? — Tim, Nov 23 '16 at 10:55
But with such huge differences, "always take the largest value" would be probably the best approximation of sampling, wouldn't it? — Tim, Nov 23 '16 at 11:17
@Tim, Probably you are right. But, there should be a numerically grounded way of doing this without approximation. To simplify it, lets assume we are always dealing with "negative" values. Very small and very large negative values. Is there an exact solution in this case? — user3639557, Nov 23 '16 at 11:25
From theoretical point of view this is an interesting question (so my +1), but from practical point of view, if the differences are that extreme, then probability of sampling the smallest values would be almost zero and probability of sampling highest values would be orders of magnitude higher. Another question is that if you have precise estimates of the small probabilities? This would mean that you needed huge amounts of data to see those cases ever appear, so I guess you don't have the precise estimates of probability and you can only assume for them some extremely small probabilities. — Tim, Nov 23 '16 at 11:38
Following Tim's comments, in your illustration, the ratio of the first to the second (ordered) probabilities is so large that no finite sample can exhibit any other value than the one associated with the largest probability. — Xi'an, Nov 23 '16 at 12:22
@user3639557 I edited your title to be more descriptive and easier to search in the future, feel free to revert or change my edit if you find it inappropriate. — Tim, Nov 23 '16 at 12:51
@Tim, happy with it but note that these values are not necessarily probabilities. Note the very large logarithmic value 1.92636e+06 in my example. — user3639557, Nov 23 '16 at 14:03
@user3639557 still you want to sample them with probabilities proportional to the values. Moreover, you said in the comment that we can focus only on the negative values for simplicity. My answer can be adapted to such scenario, since you treat the extreme values as special cases. — Tim, Nov 23 '16 at 14:10
@Xi'an is there a reference to the theory you are using to justify the non-existing finite sample? — user3639557, Nov 23 '16 at 23:10
Actually it appears that you may stay in the log space, see http://stats.stackexchange.com/questions/64081/how-do-i-sample-from-a-discrete-categorical-distribution-in-log-space/260248#260248 — Tim, Feb 06 '17 at 16:05

score 2 · Accepted Answer · edited Apr 13 '17 at 12:44

Taking aside the question if you would ever sample the smallest values if the differences between the largest and the smallest values are that big and the smallest values have near-zero probability, the solution to your problem is not that hard as it sounds. Imagine that you have some values

$$ \varepsilon_1, \varepsilon_2,\dots,\varepsilon_n = \log\pi_1, \log\pi_2,\dots,\log\pi_n $$

where $\sum_{i=1}^n \pi_i = 1$, so they can be thought as probabilities. So not to clutter the notation, let's assume that $\varepsilon_1, \varepsilon_2,\dots,\varepsilon_n$ values are sorted in increasing order. Imagine that values $\varepsilon_1, \varepsilon_2,\dots,\varepsilon_k$ are very small, i.e. $\sum_{i=1}^k \pi_i = \xi$ is some very small value, and most of the probability mass is accumulated in the the reminder $\sum_{i=k+1}^n \pi_i \approx 1$.

The simple two-step sampling scheme for such values would be

Define arbitrary pseudo-observation $\delta$ that appears with probability $\xi$.

Sample $Y$ from $\{ \delta,\varepsilon_{k+1}, \varepsilon_{k+2},\dots,\varepsilon_n \}$ with probabilities $\xi,\pi_{k+1}, \pi_{k+2},\dots,\pi_n$.

If you sampled $Y =\delta$, then sample from $\{ \varepsilon_1, \varepsilon_2,\dots,\varepsilon_k \}$ with probabilities $\exp(\varepsilon_1+c), \exp(\varepsilon_2+c),\dots,\exp(\varepsilon_k+c)$, where $c$ is some constant, such that $\sum_{i=1}^k \exp(\varepsilon_i + c) = 1$, and take the sampled $\varepsilon_i$ value as $Y$.

Finding appropriate $c$ would help you to get rid of the numerical precision issues (you can simply add some large constant to those values and then normalize them). If your $\pi_i$ values do not sum to unity, but you know $\xi$, you can normalize the $\pi_{k+1}, \pi_{k+2},\dots,\pi_n$ by replacing them with $\tfrac{\pi_i}{\sum_{j=k+1}^n \pi_j + \xi}$ values. If $\xi$ is unknown to you, then you would have to make some assumption about it's value, so that enough probability mass is accumulated in the small values.

Notice that this can be easily adapted to case when $\pi_i \not\in (0,1)$, since you can first sample from some subset of "large" values, treating the reminder as a pseudo-observation as described above. Moreover, you can extend the two-step procedure to multi-step sampling where on each step you would sample some subset treated as pseudo-observation vs the rest of values, where values in each step are normalized so their probabilities sum to unity.

EDIT (2017-02-06):

It seems that you can use Gumbel-max trick to for random generation without leaving the log-space. This basically solves your problem, but since the above answer got up-voted, I'm leaving it.

I marked it as the answer, but a future reader should also go through the discussion in the comment section of the question, before jumping to use it. — user3639557, Nov 23 '16 at 23:45

How to randomly sample values given the extremly small and large log-probabilities?

1 Answers1