How pvalue is calculated internally here ? And what is the data point for which it is calculating pvalue

Question

I got a into a question while going through poisson distribution.

Here it is:--- A nuclear pump failed 5 times out of 94.32 days. Give a 95% confidence interval (CI) for the failure rate per day

So i ran:--

poisson.test(x = 5, T = 94.32, r = 1, alternative = "two.sided", conf.level = 0.95)

Which gave output of:-

Exact Poisson test

data:  5 time base: 94.32
number of events = 5, time base = 94.32, p-value < 2.2e-16
alternative hypothesis: true event rate is not equal to 1
95 percent confidence interval:
 0.01721254 0.12371005
sample estimates:
event rate 
0.05301103

But i did not understand on what value pvalue is calculated and how it is calculated here ?

Welcome to `stackexchange`. You can have a look at the source code by typing `poisson.test` (without parenthesis) in your `R` console. Does it help you ? — linog, Apr 03 '20 at 16:26
One ought to suspect the answer will depend on whether the period of 94.32 days was determined independently of the failures or was based on the failures: after all, if 94.32 days was exactly the moment of the fifth failure, one should suspect the failure rate is overestimated. — whuber, Apr 03 '20 at 17:48

RyanFrost · Accepted Answer · 2020-04-04T17:20:57.283

As @linog mentioned in his comment, typing poisson.test into R will print the source code for the function.

The relevant bit is:

⋮
m <- r * T
PVAL <- switch(alternative, less = ppois(x, m), greater = ppois(x - 
            1, m, lower.tail = FALSE), two.sided = {
            if (m == 0) (x == 0) else {
                relErr <- 1 + 1e-07
                d <- dpois(x, r * T)
                if (x == m) 1 else if (x < m) {
                  N <- ceiling(2 * m - x)
                  while (dpois(N, m) > d) N <- 2 * N
                  i <- seq.int(from = ceiling(m), to = N)
                  y <- sum(dpois(i, m) <= d * relErr)
                  ppois(x, m) + ppois(N - y, m, lower.tail = FALSE)
                } else {
                  i <- seq.int(from = 0, to = floor(m))
                  y <- sum(dpois(i, m) <= d * relErr)
                  ppois(y - 1, m) + ppois(x - 1, m, lower.tail = FALSE)
                }
            }
        })
⋮

And particularly in this case:

N <- ceiling(2 * m - x)
while (dpois(N, m) > d) N <- 2 * N
i <- seq.int(from = ceiling(m), to = N)
y <- sum(dpois(i, m) <= d * relErr)
ppois(x, m) + ppois(N - y, m, lower.tail = FALSE)

with the final line being the p-value that's calculated for the given x, r, and T values.

If $X \sim \textrm{Poisson}(rT)$, the p-value for the two sided test is given by: $$ \sum_{x=0}^{\infty} \mathbb{P}(X = x) * \mathbb{I}_{\mathbb{P}(X=x) \le \mathbb{P}(X = 5)}$$

Where $\mathbb{I}$ is the indicator function.

This is just the sum of all possible observed values that are as probable or less probable than the true observed value, given the null hypothesis ($\lambda = rT$). This is the basic idea of a two-sided test - we want to know the probability of observing any value as extreme or more extreme than the one we did (given the null).

The reason for the extra work in the source code is because we can't take the sum over an infinite number of $x$ values - we have to choose a cutoff after which the probabilities become small enough that they don't matter. Where this cutoff is depends on x, r, and T.

Addressing your comment -

$\lambda$ (the mean & variance of the Poisson) is equal to the rate ($r$) times the exposure ($T$). So here $\lambda=94.32$.

To find the two-sided probability, we add all the probabilities that are less than the probability of the observed value. This includes both the left and right tails of the Poisson. The left tail probability can be calculated with ppois(x = 5, lambda = 94.32, lower.tail = TRUE).

The right tail is a bit more tricky, as we need to find the point on the right tail where the values become as or more extreme (less probable to observe) than the observed value.

The following works for these values, though you'll need a more rigorous method such as that used by poisson.test to work with different parameters.

i <- 0:1000

# First value on the right tail that's more extreme than 5
right_tail <- min(which(dpois(i, 94.32) <= dpois(5, 94.32) & i > 5))
p_left <- ppois(5, 94.32)
# Subtract one from right_tail first to go from 0-index to 1-index of i,
# then subtract one more to include the probability of the first value (right_tail)
p_right <-  ppois(right_tail - 1 - 1, 94.32, lower.tail = FALSE)
p_val <- p_left + p_right

what i feel is this "ppois(5,5/94.32,lower.tail = FALSE)" should also give the same pvalue but it is not giving the same result. Is my lambda here 5/94.32 is wrong ? What should be the ppois parameters here for calculation of pvalue ? — Abhishek soni, Apr 04 '20 at 04:09
Hi , Thanks a lot. I understood the approach and my mistakes but when i followed your code it gave me result p_val= 1.629968e-33. And with `poisson.test(x = 5, T = 94.32, r = 1, alternative = "two.sided", conf.level = 0.95)` p_val comes out to be 2.2e-16. Why is the difference ? — Abhishek soni, Apr 08 '20 at 04:30
The call to `poisson.test` gives p-value of < 2.2e-16, not 2.2e-16. To get the value to full precision, try `poisson.test(x=5, T = 94.32, r = 1, alternative = "two.sided", conf.level = 0.95)$p.val` — RyanFrost, Apr 08 '20 at 14:03
Oh , thanks :) I got it but then on what basis p-value of 2.2e-16 is calculated then ? — Abhishek soni, Apr 08 '20 at 14:36
It's only saying that the p value is less than 2.2e-16 ("p-value < 2.2e-16"). That's what it will report for any p-value smaller than 2.2e-16. See: https://stats.stackexchange.com/questions/78839/how-should-tiny-p-values-be-reported-and-why-does-r-put-a-minimum-on-2-22e-1?noredirect=1&lq=1 for why that is. — RyanFrost, Apr 08 '20 at 14:41
Also, please consider accepting the answer (clicking the check mark) if it has resolved your question. — RyanFrost, Apr 08 '20 at 14:53
Yeah thanks for your support. I went through the `https://stats.stackexchange.com/questions/78839/how-should-tiny-p-values-be-reported-and-why-does-r-put-a-minimum-on-2-22e-1?noredirect=1&lq=1` , all it is talking about is precision and how small value can go but can you explain the mathematics behind the calculation of figure "2.2e-16" — Abhishek soni, Apr 08 '20 at 15:02
The point is that any value below 2.2e-16 will be reported as "p-value < 2.2e-16". It calculates the precise value (1.629968e-33), but since it's below 2.2e-16, it only reports "p-value < 2.2e-16". That link explains why this is done. — RyanFrost, Apr 08 '20 at 15:37

How pvalue is calculated internally here ? And what is the data point for which it is calculating pvalue

1 Answers1