Estimating binomial distribution from small sample

Question

I have a set of 9000 outcomes and I wish to make an estimate (with 95% confidence interval) of the amount of 'positive' examples. After viewing 413 examples I found 13 positives (positive rate of 13/413 = 0.0315). With a cumulative binomial distribution I can make an estimate of the total positives with a 95% confidence interval.

However, the calculated positive rate is based on a small sample and addition of another single positive example can shift the 95% confidence interval by a large amount. How can I incorporate the size of my sample into the estimate so it is more robust?

My matlab code:

N_found = 14;
N_excl = 400;
N_tot=9000;
binoinv([0.05 0.95],N_tot-N_found-N_excl,N_found/(N_found+N_excl))+N_found

If you go from 13 positive to 14 positive cases, the interval *should* shift substantially -- the point estimate will increase by a multiple of nearly 14/13 (i.e. up by about 7.5%) — Glen_b, Feb 07 '17 at 23:32

Michael R. Chernick · Answer 1 · 2017-02-07T18:00:02.317

2

To get a confidence interval 95% or otherwise, for a binomial parameter p you can use the Clopper-Pearson method which is exact. You can take a look at Hahn and Meeker's Statistical Intervals: A Guide for Practitioners First Edition Wiley 1991.

edited Feb 07 '17 at 18:00

answered Feb 07 '17 at 17:47

Michael R. Chernick

39,640
28
74
143

Are you suggesting "exact" means "robust"? – whuber Feb 07 '17 at 17:57
1

@whuber No exact means that is has the exact confidence level specified. – Michael R. Chernick Feb 07 '17 at 18:01
Then how does this respond to the question? It asks for "robust" solutions and it makes it clear what it means by "robust." – whuber Feb 07 '17 at 18:11
Should I use the lower and upper proportion values in my estimate and run it twice? And then take the upper and lower bounds of both answers? – Héctor van den Boorn Feb 07 '17 at 18:35
1

For a two-sided confidence interval the upper and lower bounds that you get initially are the appropriate ones. You do not need to do this twice. Also you asked for a robust solution. The solution gives you exactly 95% coverage and there is no need for robustness. – Michael R. Chernick Feb 07 '17 at 18:41
This doesn't work since the 95% interval does not take into account how many instances are left. However, by running my original algorithm twice (once with lower p and upper p estimates) I do get a relatively stable estimate which converges as N increases. Problem with using only the p-values is that at the end it estimates e.g. 240 positves as lower bound while I already found 260, by double-running it, the lower bound of the total positives is set at 262 for example. – Héctor van den Boorn Feb 07 '17 at 19:39
1

One almost never gets "exactly" the nominal coverage, Michael, because the distribution is discrete. Intervals for rare proportions are particularly problematic in this regard. I checked Hahn & Meeker (first edition, pp 103-108) and could not find any claim there that this is an exact procedure. – whuber Feb 10 '17 at 18:54

Estimating binomial distribution from small sample

1 Answers1