0

This might be a little unusual, but please bear with me. I'm working on a theoretical exercise in chromatography (chemistry), a method that is used to separate different molecules. It is assumed that after performing the chromatography procedure, two different molecules separate from each other along a one-dimensional "tube" in zones that have Gaussian (normally distributed) shapes, with the y-axis representing the concentrations of the molecules along a one-dimensional space. The maxima of the peaks that represent the two molecules are separated from each other by a large number of standard deviations (10 in the following example):

Molecules separated from each other in zones with Gaussian shapes

I want to calculate the theoretical amount (cumulative probability) of molecule A in the space that corresponds to the maximum of molecule B's peak. Thus, I am trying to calculate the cumulative density of a standardized normal distribution to the right of an extremely high (>10) z-score. I have tried using R (and Matlab) to calculate this:

1-pnorm(x)

where x is the large z-score. I can increase x=8, which returns the cumulative density of 6.661338e-16. Anything above x=8 that returns 0. I know that it is probably related to memory limitations and rounding off. My question, is there a way (configuration of R or Matlab, specialized software, an existing table) that will allow me to calculate the cumulative density of a standardized normal distribution to the right of z-scores approaching 40, without rounding the answer to 0?

  • 1
    Instead of `1-pnorm(x)`, use `pnorm(-x)`. – MAB Jul 05 '17 at 19:00
  • @Patty No, it will not. Try `pnorm(-30)` and `1-pnorm(30)`. – MAB Jul 05 '17 at 21:20
  • Well I stand corrected! I guess that makes sense, it's the operation of subtraction that causes the rounding. Why not answer his question? Might help future people with his problem. – Patty Jul 05 '17 at 21:21
  • Don't bother to compute the value: **compute its logarithm.** The values will underflow double-precision floats once $z$ exceeds 37 or so, but the logs will be fine. [Mills' ratio](https://stats.stackexchange.com/a/7206/919) will give a decent approximation: it says (for $z\gg 0$) the log is approximately $$\log(1-\Phi(z)) \approx -\frac{1}{2}\left(\log(2\pi) + z^2 + \log(z^2)\right).$$ It's good to $1\%$ at $z=10$, better than $0.1\%$ for $z\approx 32$, and gets better as $z$ grows. – whuber Jul 05 '17 at 21:36
  • Patty's right to say to use pnorm(-x) rather than 1-pnorm(x) for large x; the reason why you get 0 for 1-pnorm(30) is due to subtracting two things that are essentially 1 when you're really interested in the difference. pnorm(-30) is a good way to get it. However the "R way" to get upper tail areas would be `pnorm(30,lower.tail=FALSE)` (and I'd add `,log=TRUE`). However, for extreme values out beyond that you might as well use the approximation $\phi(x)/x$, (whuber's suggestion) -- and as he says, work on the log-scale. You should get at least 3 figure accuracy out that far. – Glen_b Jul 06 '17 at 01:41
  • Note that `pnorm(x,lower.tail=FALSE,log=TRUE)` seems to work correctly out a very long way – Glen_b Jul 06 '17 at 01:48
  • Fantastic! Thank you very much for your comments, I have learned a lot! – Mirzo Kanoatov Jul 06 '17 at 13:08

0 Answers0