Bernoulli - confidence interval estimate for small observed $\hat{p}$ when sample is large?

Question

I have a population which can be thought of as infinitely large. I take a large sample (with replacement) of $n\approx10^{11}$, and observe a small probability of success ($\hat{p}\approx10^{-10}$).

I see that this answer states that to use the normal distribution approximation, the conditions should be:

$n\hat{p}>5$ and $n(1−\hat{p})>5$

The answer also lists the following alternative methods:

Wilson score
Clopper-Pearson
Agresti-Coull

Given $n\hat{p}=10$, and $n(1−\hat{p})\gg 5$, is it okay for me to use the normal distribution approximation or should I use another method?

You accept that the normal approximation is suitable when $n\hat{p}>5$ and $n(1-\hat{p})>5$. You have $n\hat{p}=10>5$ and $n(1-\hat{p})\gg 5$. What leads you to believe that the normal approximation is not suitable? — Sycorax, Sep 08 '20 at 14:39
@Sycorax the distribution is slightly but visibly skewed, so a more accurate Binomial CI might be preferable. For illustration in `R`: `plot(0:24, dbinom(0:24, 1e11, 1e-10), type="h")` It may help to overplot the Normal reference curve, `curve(dnorm(x, 1e1, sqrt(1e1*(1-1e-11))), add=TRUE)` The Poisson approximation will, of course, be excellent. — whuber, Sep 08 '20 at 14:50
@Sycorax I've read that normal approximation doesn't well approximate a bernoulli distribution if $\hat{\rho}\approx 0$ or $\hat{\rho}\approx 1$. And on the other hand, the stack question I've linked states that it can be used if $n\hat{p}=10$, and $n(1−\hat{p})\gg 5$, so I guess I just wanted some clarification around the two statements. FYI, I think I read the former on wikipedia. — Ashlea Sexton, Sep 08 '20 at 15:15
@whuber This is a valuable observation. I wonder if skew coincides with the intent of OP's question, or if OP had a different concern in mind. — Sycorax, Sep 08 '20 at 15:15
@whuber Thanks whuber. I didn't think of using a Poisson approximation, but that does seem to fit my use-case, since $n$ is large and $\hat{\rho}$ is very small. — Ashlea Sexton, Sep 08 '20 at 15:23
@Sycorax I didn't actually have skew in mind, but I probably should have — Ashlea Sexton, Sep 08 '20 at 15:26
What you might have read about "doesn't well approximate" is flat-out wrong, as you can check (if you wish) by computing and plotting Bernoulli distributions with such extreme values of $\hat p$ but large values of $n\hat p$ and $n(1-\hat p).$ Since you are aware of the Poisson approximation, you may with less work just plot its probability function for parameters greater than $10$ or so or compare its cumulant generating function with the Normal cgf. — whuber, Sep 08 '20 at 15:26
@whuber Thanks very much. I think you've answered my question quite well. And thanks to you as well Sycorax. Whuber, if you would like to add a formal answer then I'd be happy to accept. Otherwise I think I'm good! — Ashlea Sexton, Sep 08 '20 at 15:35

user295357 · Answer 1 · 2020-09-08T15:15:07.610

0

Whether normal distribution approximation is fine depends on how accurate you want. In particular, if Wilson score is used as a normal distribution approximation, compared with the accurate CI, Wilson score will yield a looser lower limit with an error more than 10%, while the upper limit is quite close to the accurate CI.

edited Sep 08 '20 at 15:15

answered Sep 08 '20 at 14:58

user295357

707
2
8

Thanks for your answer – Ashlea Sexton Sep 08 '20 at 15:38

Bernoulli - confidence interval estimate for small observed $\hat{p}$ when sample is large?

1 Answers1