Comparing 0/10 to 0/20

Question

When discussing task achievement rates, is there a way to show that 0 out of 20 attempts is "worse" than 0 out of 10 attempts?

You may try to use https://en.wikipedia.org/wiki/Additive_smoothing but will be rather hands waving than solid proof — abukaj, Feb 01 '17 at 19:52
How do you know it is worse? E.g. if only 10 attempts were possible, then you *don't* know what would be the score with more attempts. — Tim, Feb 01 '17 at 21:20
This seems like a reasonable question to me. It is based on a perfectly normal intuition that can be discussed, & there are statistical ways (eg, Bayesian) to address the issue. I'm voting to leave open. — gung - Reinstate Monica, Feb 02 '17 at 00:44

DaL · Accepted Answer · 2017-02-05T13:04:56.463

Suppose the we know the probability of success in an attempt. In this case we compute the probability of 0 out of 10 and 0 out of 20 cases.

However, in this case we go the other way around. We don't know the probability, we have the data and we try to estimate the probability.

The more cases we have, the more certain we can be regarding the results. If I'll flip a coin one and it will be head, you won't be very certain that it is double headed. If I'll throw it 1,000 times and it will be all heads, it is unlikely that it is balanced.

There are methods that were designed in order to consider the number of trails when giving the estimations. One of them is additive smoothing that @abukaj comment about above. In additive smoothing we add extra pseudo samples into consideration. In our case, instead to the trail we have seen we add two more - one successful and one failed.

In the first case the smoothed probability will be $\frac{1+0}{10 +1 +1}$ = $\frac{1}{12}$ ~ 8.3%
In the second case we will get $\frac{1+0}{20 +1 +1}$ = $\frac{1}{22}$ ~ 4.5%

Note that additive smoothing is only one method of estimation. You will get different results with different methods. Even with additive smoothing itself, you would have gotten different results if you added 4 pseudo samples.

Another method is using the confidence interval as @mdewey suggested. The more samples we have, the shorter the confidence interval will be. The size of the confidence interval is proportional to the square root of the samples - $\frac{1}{\sqrt{n}}$. Therefore, doubling the number of samples will lead to a $\sqrt{2}$ shorter confidence interval.

The mean in both cases is 0. It we take confidence level of 90% (z=1.645)

In the first case we will get 0 + $\frac{1.645}{\sqrt{10}}$ ~ 52%
In the second case we will get 0 + $\frac{1.645}{\sqrt{20}}$ ~ 36%

In case of missing data, there is uncertainty. The assumptions you make and the external data you'll use will change what you will get.

Thank you very much Dan Levin. Your answer was clear enough for a non-mathematician to follow, and yet robust enough for me to intuitively accept your explanation. Thank you all commenters for your input. — vinne, Feb 06 '17 at 13:35

score 1 · Answer 2 · answered Feb 10 '17 at 23:14

Extending the idea of invoking confidence intervals, there is a concept of an exact binomial interval.

Binomial distribution is that of the total number of successes in independent trials that end up with either 0 (failure) or 1 (success). The probability of obtaining 1 (success) is traditionally denoted $p$, and its complement is $q=1-p$. Then the standard probability result is that the probability of exactly $k$ successes in $n$ trials is

$$ p_{n,k} = {n \choose k} p^k q^{n-k} = \frac{n!}{k!(n-k)!} p^k q^{n-k} $$

The concept of the confidence interval is to bound a set of possible values of the model parameters (here, probabilities of success $p$) so that we can make probabilistic (well, frequentist) statements about whether the true parameter value is inside this interval (namely, that if we repeat the probabilistic experiment of making 10 or 20 trials, and construct the confidence interval in a specified way, we will observe that the true value of the parameter is inside the interval 95% of the time).

In this case, we can solve for $p$ in that formula: $$ p_{n,0}=(1-p)^n $$

So if we wanted a 95% one-sided interval, we would set $p_{n,0}=5\%$ to solve for the probability of the observed zero count being at most 5%. For $n=20$, the answer is $[0\%,13.9\%]$ (i.e., at the extreme, if the probability of a success in each trial is 13.9%, then the probability of observing zero successes is 5%). For $n=10$, the answer is $[0\%,25.9\%]$. So from a sample of $n=20$, we learned more than from the sample of $n=10$, in the sense that we can ``exclude'' the range $[13.9\%,25.9\%]$ that the sample of $n=10$ still leaves as plausible.

score 1 · Answer 3 · answered Feb 10 '17 at 23:49

A Bayesian Approach

Let $X_i$ for $i=1,\ldots n$ be a series of IID Bernoulli random variables with parameter $p$.
Let us represent our uncertainty of the parameter $p$ by assuming it follows the Beta distribution with hyperparameters $\alpha$ and $\beta$.

The likelihood function is Bernoulli and the Beta distribution is a conjugate prior for the Bernoulli distribution, hence the posterior follows the Beta distribution. Furthermore, the posterior is parameterized by:

$$ \hat{\alpha} = \alpha + \sum_{i=1}^n X_i \quad \quad \hat{\beta} = \beta + n - \sum_{i=1}^n X_i$$

Consequently:

\begin{align*} \mathrm{E}[p \mid X_1, \ldots, X_n] &= \frac{\hat{\alpha}}{\hat{\alpha} + \hat{\beta}}\\ &= \frac{\alpha + \sum_{i=1}^n X_i }{\alpha + \beta + n} \end{align*}

Thus if you see 10 failures, your expectation of $p$ is $\frac{\alpha}{\alpha + \beta + 10}$, and if you see 20 failures, your expectation of $p$ is $\frac{\alpha}{\alpha + \beta + 20}$. The more failures you see, the lower your expectation of $p$.

Is this a reasonable argument? It depends on how you feel about Bayesian statistics, whether you're willing to model uncertainty over some parameter $p$ using the mechanics of probability. And it depends on how reasonable is your choice of a prior.

Comparing 0/10 to 0/20

3 Answers3

A Bayesian Approach