1

I have a population which can be thought of as infinitely large. I take a large sample (with replacement) of $n\approx10^{11}$, and observe a small probability of success ($\hat{p}\approx10^{-10}$).

I see that this answer states that to use the normal distribution approximation, the conditions should be:

$n\hat{p}>5$ and $n(1−\hat{p})>5$

The answer also lists the following alternative methods:

  • Wilson score
  • Clopper-Pearson
  • Agresti-Coull

Given $n\hat{p}=10$, and $n(1−\hat{p})\gg 5$, is it okay for me to use the normal distribution approximation or should I use another method?

Sycorax
  • 76,417
  • 20
  • 189
  • 313
  • You accept that the normal approximation is suitable when $n\hat{p}>5$ and $n(1-\hat{p})>5$. You have $n\hat{p}=10>5$ and $n(1-\hat{p})\gg 5$. What leads you to believe that the normal approximation is not suitable? – Sycorax Sep 08 '20 at 14:39
  • @Sycorax the distribution is slightly but visibly skewed, so a more accurate Binomial CI might be preferable. For illustration in `R`: `plot(0:24, dbinom(0:24, 1e11, 1e-10), type="h")` It may help to overplot the Normal reference curve, `curve(dnorm(x, 1e1, sqrt(1e1*(1-1e-11))), add=TRUE)` The Poisson approximation will, of course, be excellent. – whuber Sep 08 '20 at 14:50
  • @Sycorax I've read that normal approximation doesn't well approximate a bernoulli distribution if $\hat{\rho}\approx 0$ or $\hat{\rho}\approx 1$. And on the other hand, the stack question I've linked states that it can be used if $n\hat{p}=10$, and $n(1−\hat{p})\gg 5$, so I guess I just wanted some clarification around the two statements. FYI, I think I read the former on wikipedia. – Ashlea Sexton Sep 08 '20 at 15:15
  • 1
    @whuber This is a valuable observation. I wonder if skew coincides with the intent of OP's question, or if OP had a different concern in mind. – Sycorax Sep 08 '20 at 15:15
  • @whuber Thanks whuber. I didn't think of using a Poisson approximation, but that does seem to fit my use-case, since $n$ is large and $\hat{\rho}$ is very small. – Ashlea Sexton Sep 08 '20 at 15:23
  • @Sycorax I didn't actually have skew in mind, but I probably should have – Ashlea Sexton Sep 08 '20 at 15:26
  • What you might have read about "doesn't well approximate" is flat-out wrong, as you can check (if you wish) by computing and plotting Bernoulli distributions with such extreme values of $\hat p$ but large values of $n\hat p$ and $n(1-\hat p).$ Since you are aware of the Poisson approximation, you may with less work just plot its probability function for parameters greater than $10$ or so or compare its cumulant generating function with the Normal cgf. – whuber Sep 08 '20 at 15:26
  • 1
    @whuber Thanks very much. I think you've answered my question quite well. And thanks to you as well Sycorax. Whuber, if you would like to add a formal answer then I'd be happy to accept. Otherwise I think I'm good! – Ashlea Sexton Sep 08 '20 at 15:35

1 Answers1

0

Whether normal distribution approximation is fine depends on how accurate you want. In particular, if Wilson score is used as a normal distribution approximation, compared with the accurate CI, Wilson score will yield a looser lower limit with an error more than 10%, while the upper limit is quite close to the accurate CI.

user295357
  • 707
  • 2
  • 8