Is there a way to reduce a sample size such that I gain roughly the same amount of information out of it as I would get from a bigger one?

Question

I was thinking of Cochran's formula for determining the ideal sample size under maximum variability, however, I couldn't find a way to formulate my problem such that I end up with some sort of relevant attribute.

Nonetheless, suppose I want to have 99.9% accuracy for a given task –– this means 1 fail out of 1000 experiments. However, performing 1000 experiments is quite expensive, so I want to reduce that number. Is there a way, for example, to perform 100 experiments and use that data to talk "as if" I would have done 1000?

Any hints would be appreciated!

score 2 · Answer 1 · answered Jul 14 '21 at 13:05

2

I'm assuming you want to confirm your 99.9% accuracy based on experiments. But actually, running 1000 experiments won't confirm a 99.9% succes rate. You may be unlucky, and have 2 fails, even if the "true" success rate is 99.9%. There are various ways to arrive at a confidence interval for the succes rate, based on data, and there are also ways to figure out how many experiments you need for getting a small enough (for your purposes) confidence interval. As you can imagine, if you want to be sure the "true" success rate is between 99.45% and 99.55%, you're going to need to run many more than 1000 experiments.

answered Jul 14 '21 at 13:05

Gijs

3,409
11
18

Suppose I have no experiments done whatsoever. But I want to start doing them and suppose I get up to 100 experiments, and I have no failures. What does this data tell about what I would expect from performing 1000 experiments? Moreover, suppose I use Cochran's formula with p=.5, Z=1.96, and 5% error. I end up with 385. Is it correct to state that, by performing 385 experiments, I end up with a 95% confidence interval within 5% error? – Kael Spicula Jul 14 '21 at 13:19
2

I am not familiar with Cochran's formula, but it looks like a normal approximation. Such approximations are rather poor when the true proportion is near 0 or 1. I recommend you take a look at [Confidence Interval for Proportion that is 100%](https://stats.stackexchange.com/q/520717/1352) and work backward from the CIs given there. – Stephan Kolassa Jul 14 '21 at 13:31
@KaelSpicula You might be interested in [Laplace's rule of succession](https://en.wikipedia.org/wiki/Rule_of_succession). – Dave Jul 14 '21 at 13:40
2

For such low proportions, you should use the Clopper-Perason Interval. In the case of *zero* hits, it can be coputed by a closed formula, but in other cases it must be computed numerically. See sction 3.1 of https://lionel.kr.hs-niederrhein.de/~dalitz/data/publications/fb03-tb-2017-01-en.pdf for the underlying idea and the formulas. – cdalitz Jul 14 '21 at 13:51
Y'all, thanks for the tips! @cdalitz, the resource you shared is pretty neat, thanks –– please answer separately so that I can mark your answer and close the question! I will go for Clopper-Pearson as it seems to be a reasonable direction. – Kael Spicula Jul 15 '21 at 07:50
@kael-spicula Done. Glad you find the report useful, because that is exactly what I wrote it for: I did not find any source where this information was comprehensively summarized. – cdalitz Jul 15 '21 at 08:35

score 2 · Accepted Answer · answered Jul 15 '21 at 08:33

If you have very low proportions, approximations based on the normal distribution like the Wilson Interval do not have a good coverage probability.

You should thus use the Clopper-Pearson Interval, which, e.g. the R function binom.test computes:

> ci <- binom.test(0, 100, conf.level=0.95)$conf.int
> ci[1:2]
[1] 0.00000000 0.03621669

For zero hits, the interval can even be computed in closed form as $[0,1-\sqrt[n]{\alpha/2}]$, where $\alpha$ is the error probability (i.e. $1-\alpha$ is the confidence level). For other values, however, it must be computed numerically.

See section 3.1. of this Technical Report for the underlying ideas and formulas. It also shows the problems of other confidence intervals for small (or large) $p$.

(+1) If the "Clopper-Pearson interval" is understood as the *equal-tailed* exact interval, it's not what `binom.test` computes. See e.g. https://stats.stackexchange.com/a/173149/17230 — Scortchi - Reinstate Monica, Sep 18 '21 at 16:44

Is there a way to reduce a sample size such that I gain roughly the same amount of information out of it as I would get from a bigger one?

2 Answers2