8

For a variable $B\sim \textrm{Bin}(p,n)$. I observe $m$ successes. I know I can estimate $\hat{p}$ with

$$\hat{p} = \frac{m}{n}$$

and I can approximate the CI by using a normal approximation by CLT where I assume

$$p\sim\mathcal{N}\left(\hat{p}, \frac{\hat{p}(1-\hat{p})}{n}\right)$$

However if $m =0$ then I am in trouble (sort of) as the CI's lower bound would be negative. I remember there was a transform I can use involving a matrix that will eliminate that problem.

What is the tranformation in question?

COOLSerdash
  • 25,317
  • 8
  • 73
  • 123
xiaodai
  • 706
  • 8
  • 19
  • 2
    Re the wording of your question: under a frequentist interpretation of binomial sampling, you can't say that the population parameter $p$ has that distribution, or indeed any other distribution; it's a fixed (but unknown) value, not a probabilistic outcome. Instead, one thing you can sensibly attempt to do is [construct a quantity](http://en.wikipedia.org/wiki/Pivotal_quantity), $q=Q(p,\hat p)$ whose distribution doesn't depend on $p$; that allows you to make probability statements about $q$, and hence to obtain an interval for $p$ with the usual characteristics of a confidence interval. – Glen_b Jul 10 '13 at 03:46
  • 2
    ctd ... In this case - as with many others - you can't get a $Q$ that has exactly those characteristics (is an exactly pivotal quantity), but you can get some that in large samples is approximately so (e.g. $\sqrt{n}(p-\hat p)\,$), and a number of related quantities that perform well even at small sample sizes, including some that only take 'possible' values - see the link in @John's answer for a list of possibilities. – Glen_b Jul 10 '13 at 03:50
  • 1
    This has been extensively discussed on our site: nearly all the hits on a [search for your tags](http://stats.stackexchange.com/questions/tagged/binomial+confidence-interval) are relevant. Take a look at them! – whuber Jul 10 '13 at 14:12
  • I just recall in my survival analysis course we used a (Hessian?) matrix to tranform p -> log(p/(1+p)) and then computed the CI and transformed it back. This way the CI never falls below 0. Just can't remember the exact method and after googling to no avail I asked this question. – xiaodai Jul 11 '13 at 07:55

2 Answers2

6

The wikipedia page on binomial distributions has several measures of confidence intervals. In R, they, and others, are implemented in the binom.confint command in the binom package. There are costs and benefits to them all. You should look into them further and select the one you like the best.

Now that I've given the standard advice...I tend to believe that the extensive work on binomial CI's clearly demonstrate that trying to get an exact one is pointless. While they often can vary considerably in the proportion of coverage that's only because the tails can change dramatically for the distribution with p values that deviate by small amounts and the distribution of real values is discrete (i.e. the actual p-values really aren't that different reported by them).

When N is small you can usually just pick any CI and round it to values supported by your actual distribution and you get the same result. If you have an N of 10 and p = 0.2 there is no way you will ever replicate that experiment and get p = 0.04588727 (Wilson interval lower bound) because the number can't possibly appear. It's as impossible as the -0.04791801 from the CLT based interval you want to avoid because it's negative. Just enter 0 for the lower bound and 0.5 for the upper. The true proportion for your experiment can't be a value that can't be produced by the experiment, and the 95% CI is about what the results are of the experiment when repeated, not what mu is. If n is large then the CLT works pretty well anyway. It may not be the best but just round away from the mean one point and you'll usually be fine with a lot less effort than working out the other values (and it's often recommended to be conservative).

John
  • 21,167
  • 9
  • 48
  • 84
3

There's a neat little article published decades ago in JAMA entitled "If nothing goes wrong, is everything all right?". The authors considered the possibilities of a binomial parameter being in a variety of "locations": and derived the probabilities of zero outcomes out of N integer instances under varying sample sizes, N. They first did it by hand (or calculator since this was 1983), but they also pointed out that the expression:

$$1 - \text{maximum risk} = 0.05^{1/N}$$ has asymptotic expansion $$1+\ln(0.05)/N + O(1/N^2)$$

So the upper (and only confidence limit other than $0$) CL is $-\ln(0.05)/N$ or very close to $= 3/N$. Take a look at the fancy intervals and look at the upper row where observed values of 0 are tabulated. You will find that $3/N$ is a very good approximation to the exact limits.

Searching for earlier instances of citations to this article I find that I already posted such an answer.

DWin
  • 7,005
  • 17
  • 32