python computing likelihood causing exp overflow

Question

I am using numpy to compute the likelihood of a variable $Z$ using numpy. $Z$ is a Bernoulli random variable which has two outcomes $[0,1]$. I compute the log likelihood of observing $Z$ given the parameter is $x=[ -3146,-1821]$. These numbers are too large, np.exp(x) returns [0,0], and np.exp(x)/np.exp(x).sum() returns [nan, nan]. I know the reason is that these two numbers are both smaller than np.log(sys.float_info.min)(the value is $-708.3964185322641$ in my PC). Is there any method to accurately calculate the likelihood from log likelihood if the numbers are too small?

I tried to add a large number on $x$: $x = x+1821$, but the first number is still too small.

Update

The context is a state observation problem. There are some (unknown states) $\{z_1,z_2,\dots,z_m\} \in Z$ whose true values are unknown, we only know that it can take two possible values $[0,1]$. $Z$ can only be observed by some sensors $\{s_1,s_2,\dots,s_n\} \in S$, each sensor can provide values to multiple unknown states in $Z$. The observed sensor value from $s_i$ about $z_j$ is $x_{ij}$. $e_i$ is used to model the error of a sensor $s_i$, larger $e_i$ implies sensor $s_i$ is less accurate and $e_i$ is unknown as well.

I model $Z$ as the latent variable, and $E$ as the model parameters where $E = \{e_i,e_2,\dots,e_n\}$. I am trying to use EM algorithm to work out $Z$ and $E$ given the observations $X$. The E-step is given below:

$p(z_j|X_j,E_j) \propto p(z_j)\prod\limits_i p(x_{ij}|z_j,e_i)$, where $X_j$ is the set of sensors values for $z_j$, $E_j$ is the set of errors of sensors that observe $z_j$, $p(z_j)$ is the prior and $p(x_{ij}|z_j,e_i)$ is the likelihood. Then we have:

$p(z_j = k|X_j,E_j) = \frac{p(z_j = k|X_j,E_j)}{\sum_{k'=0}^1 p(z_j = k'|X_j,E_j)}$

M-step is not related to the question here, not going to write it here.

Back to the question, I am computing the value of $\log p(z_j = k|X_j,E_j) = \log p(z_j=k) - \log\sum\limits_i p(z_j = k|X_j,E_j)$. Since there are a lot of sensors, the log likelihood is very small: $[-3146,-1821]$. Using np.exp([-3146,-1821]) to convert it back to likelihood would raise some warnings because the numbers are too small to be handled by np.exp.

Update 2

I understand that $\exp(-3146)$ is tiny and could be ignored if compared with $\exp(-1821)$. It is attempted to approx the result as $[0,1]$. But I think in terms of probability, $\exp(-3146)$ is indeed very very small, it still has a very tiny probability that $z_j=0$. If the approximated result $\exp([-3146,-1821])\approx[0,1]$ is used, it would make the probability $p(z_j = 0|X_j,E_j) = 0$, i.e., $z_j=0$ is impossible.

if you add 1821, you should get `np.exp(x)=[0,1]`, and `np.exp(x)/np.exp(x).sum()` should also be [0,1] ... — Ben Bolker, Sep 19 '18 at 04:00
Yes, I tried that. Adding 1821 makes the second number 0, but $(-3146+1821)$ is still to small, and `np.exp(x)/np.exp(x).sum() = [0,1]` is still not an accurate result — JYY, Sep 19 '18 at 05:05
Are you trying to get the order of magnitude of this likelihood, or the first few digits (in scientific notation)? It's quite a small number, hence part of the reason we use log-likelihoods to begin with. — Kevin, Sep 19 '18 at 05:31
Quite a few posts already on site deal with likelihood computation issues; you may get some benefit from a search. What are you using this likelihood to do? — Glen_b, Sep 19 '18 at 05:36
What @Glen_b says. Are you sure you need the likelihood itself? It is usually quite enough to work on log-likelihoods. — Stephan Kolassa, Sep 19 '18 at 07:44
The question is: accurate enough for what? If the true answer is `[1e-500, 1-1e-500]` (for example), what computation are you trying to do that will be changed in any significant way if you use `[0,1]` instead? If you really need this you will have to use an extended precision library like [mpmath](http://mpmath.org/) ... — Ben Bolker, Sep 19 '18 at 12:09
I can post an illustration of how to do this, but I'm not sure it's going to do you any good. Again, it would be useful to know more about the context: what are you going to use these values for? — Ben Bolker, Sep 19 '18 at 12:18
@YiYang: I'm a confused - the MLE for a binomial RV is just $\sum(x_i) /n$? Why not just use that directly? — user1357015, Sep 19 '18 at 22:58
@user1357015 $\sum(x_i)/n$ assumes that all the sensors are equally reliable. In the problem, I use $e_j$ to model the error or reliability of sensor $j$. — JYY, Sep 20 '18 at 01:00
@YiYang you ever heard of the "log-sum-exp" trick? https://stats.stackexchange.com/questions/304758/softmax-overflow/304774#304774 — Taylor, Sep 24 '18 at 00:18
@Taylor yes, I tried that one, $[−3146,−1821] + 1821 = [-1325,0]$, but $-1325$ is still too small. — JYY, Sep 24 '18 at 00:44
@YiYang right. It doesn't make everything fit; it just helps you focus on the numbers/elements that count, relatively speaking. If you don't do this, then *everything* will be zero after you exponentiate, and your normalizing constant is zero, and it doesn't make sense. But if at least one number is non-zero, then you won't get NaNs. You might have all your probability focused on a few points, but that's okay usually. — Taylor, Sep 24 '18 at 00:50

seanv507 · Answer 1 · 2018-09-20T05:30:04.677

2

so if I understand what you are doing, you want to calculate $$\log \frac{\exp(x_i)}{\sum exp(x_j)}$$ where $x$ is large negative number. So what you need to do is write as $$\log \frac{\exp(x_i)}{\exp(x_m) (1 + \sum_{j\ne m} \exp(x_j - x_m))}=x_j - x_m - \log (1 + \sum_{j\ne m} \exp(x_j - x_m))\approx x_j-x_m$$

Where $x_m$ is the largest of your negative numbers (ie -1821 in your example) Hopefully this is sufficient for your purposes otherwise you would have to do Taylor series of $\log(1 + x)=x ... $

edited Sep 20 '18 at 05:30

answered Sep 19 '18 at 23:24

seanv507

4,305
16
25

It's not max $\log\frac{\exp(x_i)}{\sum\exp(x_j)}$. It is a E-step in EM. The values of $\log\exp(x_0)$ and $\log\exp(x_1)$ are known: $x=[−3146,−1821]$. From these two values I want to calculate $\exp(x_0)$ and $\exp(x_1)$. – JYY Sep 20 '18 at 01:07

JYY · Answer 2 · 2020-05-25T20:33:58.697

1

The cause of the problem is that numpy and python's math module cannot calculate $\exp(x)$ if $x$ is greater than sys.float_info.max of smaller than sys.float_info.min. I find some packages, e.g., mpmath and sympy, can compute precise exponent, but the efficiency is significantly worse than numpy, especially in the case of vectorized operations.

If programming in Python, the softmax function in scipy.special module can solve this problem. For example, softmax([-2000,-2005]) returns array([0.99330715, 0.00669285]).

edited May 25 '20 at 20:33

answered Sep 23 '18 at 23:52

JYY

697
5
13

1

all of this is true, but I remain unconvinced that the issue you raise here is actually ever going to be a practical problem in an estimation procedure. If the *relative* probability is too small to be captured in a double-precision floating point number, how can it affect the calculation to a significant extent? – Ben Bolker Sep 24 '18 at 01:32
@BenBolker I concerned that assigning 0 probability to a very unlikely happened event would bring some problem. – JYY Sep 24 '18 at 04:22

python computing likelihood causing exp overflow

2 Answers2