8

Suppose I have a function such like:

f <- function(x){
  exp(x) / (1 + exp(x))
}

it's supposed to work for any real value of x, but actually it returns NaN when x is 710 or larger. I'm wondering what's a proper way to handle this issue. I realize it's easy to make it just return 1, but maybe it's not a good behavior from a statistician's point. Does somebody have some comments or suggestions?

whuber
  • 281,159
  • 54
  • 637
  • 1,101
David Z
  • 1,288
  • 2
  • 15
  • 23
  • I don't know if I could trust model based parameter estimates with such high influence values in the function. You can expect your standard Newton-Raphson algorithms to give you nonsensical parameter estimates with such values of $x$ as a linear predictor in logistic regression models. Odds ratios *can* be reported as infinite valued. Furthermore, I believe you can invert the score test to obtain a valid confidence interval for the odds ratio. – AdamO Apr 18 '14 at 17:27
  • It really depends on what purpose the values are being turned to. $exp(x)/(1+exp(x))$ for large $x$ goes to $1-exp(-x)$; this may be useful for some purposes and not much good for others. – Glen_b Apr 20 '14 at 09:39

2 Answers2

11

In this case the NaN (not a number) is returned because the calculation of the exponential overflows in double precision arithmetic.

An algebraically equivalent expression, expanded in a MacLaurin series around $0$, is

$$\frac{\exp(x)}{1+\exp(x)} = \frac{1}{1+\exp(-x)} = 1 - \exp(-x) + \exp(-2x) - \cdots.$$

Because this is an alternating series, the error made in dropping any term is no greater than the size of the next term. Thus when $x \gt 710$, the error is no greater than $\exp(-710) \approx 10^{-308} \approx 2^{-1024}$ relative to the true value. That is far more precise than any statistical calculation needs to be, so you're fine replacing the return value by $1$ in this situation.

Interestingly, R will not produce an NaN when the exponential underflows. Thus you could just choose the more reliable version of the calculation, depending on the sign of x, as in

f <- function(x) ifelse(x < 0, exp(x) / (1 + exp(x)), 1 / (1 + exp(-x)))

This issue shows up in almost all computing platforms (I have yet to see an exception) and they will vary in how they handle overflows and underflows. Exponentials are notorious for creating these kinds of problems, but they are not alone. Therefore it's not enough just to have a solution in R: a good statistician understands the principles of computer arithmetic and knows how to use these to detect and work around the idiosyncrasies of her computing environment.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
  • 1
    It may be worth pointing out that when $x \lt -36$ or so, $1 + \exp(x)$ will evaluate to $1$ (*exactly*) due to floating point rounding. Similarly, when $x \gt 36$, $1+\exp(x)$ evaluates to $\exp(x)$, whence the quotient produces an *exact* value of $1$. The precision issues when $|x|\gt 710$ are astronomically smaller! – whuber Apr 18 '14 at 22:04
1

Others have already discussed the computational issues, so I'll leave that to them. Since I assume you're working with R, I thought I'd point out the boot package comes with its own inverse logit function for you to use that is pretty computationally stable:

require(boot) inv.logit(710)

seems to evaluate to 1 as desired.

Samuel Benidt
  • 534
  • 3
  • 5
  • 1
    Or if you wish to avoid introducing a package dependency, ```plogis(710)``` achieves the same result. (Indeed ```inv.logit``` is just an alias for ```plogis```.) – orizon Apr 21 '14 at 23:58