Confidence Score for a Single Proportion

Question

I was asked to calculate a confidence score for a single proportion representing a conversion rate. I saw a lot of documents that calculate confidence intervals for proportions based on the binomial distribution; however I did not see any material for a single conversion (or proportion). If $X \sim Bin(n,p)$, with large sample size normal approx to binomial is valid, then

$$ \hat{p} \leadsto \mathcal{N}\bigg( p, \sqrt{\frac{p(1-p)}{n}} \bigg) $$

Using this approximation, I saw statistical significance involving a second ratio (through $z$-scores) can be calculated, etc. However, can any error metric be calculated for a single proportion? Ideally this metric would be a number between 0% and 100%.

What do you mean by single proportion? Isn't your proportion based on many observations? — soakley, Jun 27 '13 at 15:32
An example: 100 kids take a test, 30 pass it. The proportion I have is 0.3. There isn't another one to compare against; I need to report some kind of error rate for this single proportion. — BBSysDyn, Jun 27 '13 at 15:34
In your example, $n$ would be 100 and $\hat p$ is 0.3 Then you could use the textbook formula to get a confidence interval around the 0.3 — soakley, Jun 27 '13 at 15:38
... and in R, `binom::binom.confint` provides a whole bunch of these. — cbeleites unhappy with SX, Jun 27 '13 at 18:06

COOLSerdash · Accepted Answer · 2013-06-27T21:02:34.227

I think what you want is a confidence interval for a proportion. How to calculate a confidence interval for a proportion is an astonishingly controversial matter and there are many different approaches. The Wikipedia page lists several of those approaches.

In their paper, Agresti and Coull show that what they call the score confidence interval performs well, even when the sample size is small. The paper from Brown et al. (2001) Interval estimation for a binomial proportion calls it the Wilson interval because it was first introduced by Edwin Bidwell Wilson in his paper in 1927. In addition, the confidence interval has the convenient property that the endpoints are always within 0 and 100% which is not the case for the much-used Wald confidence interval.

In summary (thanks to @NickCox): the normal approximation (Wald interval) is often inappropriate and other approaches should be used, such as the Wilson interval. For other valid alternatives consult the paper of Brown et al. (2001).

The $100(1-\alpha)\%$ Wilson confidence interval can be calculated as follows:

$$ \begin{align} \mathrm{UL} &= \frac{\hat{p}+\frac{z_{1-\alpha/2}^{2}}{2n}+z_{1-\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}+\frac{z_{1-\alpha/2}^{2}}{4n^{2}}}}{1+\frac{z_{1-\alpha/2}^{2}}{n}} \\ \mathrm{LL} &= \frac{\hat{p}+\frac{z_{\alpha/2}^{2}}{2n}+z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}+\frac{z_{\alpha/2}^{2}}{4n^{2}}}}{1+\frac{z_{\alpha/2}^{2}}{n}} \\ \end{align} $$

Here $\mathrm{UL}$ denotes the upper limit and $\mathrm{LL}$ the lower limit of the confidence interval; $\hat{p}$ is the proportion of your sample (0.3 in your case); $n$ the sample size (100 in your case); $z_{1-\alpha/2}$ is the $(1-\alpha/2)$-quantile of the standard normal distribution. For a 95%-CI, this is about $1.96$ ($z_{\alpha/2}\approx -1.96$ because of the symmetry of the normal distribution).

In your case, the 95% Wilson confidence interval for a proportion of 0.3 with a sample size of 100 is $[0.219, 0.396]$.

Here is an R script that calculates the (modified) Wilson confidence interval for any given $n$ and $\hat{p}$ (x is the number of successes out of n). Even more convenient is the R package binom which provides the function binom.confint which calculates the confidence interval using 11 different methods (the Wilson interval is included).

n <- 100
x <- 30

alpha <- 0.05

p.hat <- x/n

upper.lim <- (p.hat + (qnorm(1-(alpha/2))^2/(2*n)) + qnorm(1-(alpha/2)) *
                sqrt(((p.hat*(1-p.hat))/n) + (qnorm(1-(alpha/2))^2/(4*n^2))))/
  (1 + (qnorm(1-(alpha/2))^2/(n)))

lower.lim <- (p.hat + (qnorm(alpha/2)^2/(2*n)) + qnorm(alpha/2) *
                sqrt(((p.hat*(1-p.hat))/n) + (qnorm(alpha/2)^2/(4*n^2))))/
  (1 + (qnorm(alpha/2)^2/(n)))

#==============================================================================
# Modification for probabilities close to boundaries
#==============================================================================

if ((n <= 50 & x %in% c(1, 2)) | (n >= 51 & n <= 100 & x %in% c(1:3))) {
  lower.lim <- 0.5 * qchisq(alpha, 2 * x)/n
}

if ((n <= 50 & x %in% c(n - 1, n - 2)) | (n >= 51 & n <= 100 & x %in% c(n - (1:3)))) {
  upper.lim <- 1 - 0.5 * qchisq(alpha, 2 * (n - x))/n
}    

lower.lim
[1] 0.2189489

upper.lim
[1] 0.3958485

#==============================================================================
# Using the package "binom"
#==============================================================================

library(binom)

binom.confint(x=30, n=100, conf.level=0.95)

          method  x   n      mean     lower     upper
1  agresti-coull 30 100 0.3000000 0.2186514 0.3961460
2     asymptotic 30 100 0.3000000 0.2101832 0.3898168
3          bayes 30 100 0.3019802 0.2168414 0.3945465
4        cloglog 30 100 0.3000000 0.2135522 0.3910559
5          exact 30 100 0.3000000 0.2124064 0.3998147
6          logit 30 100 0.3000000 0.2184030 0.3966128
7         probit 30 100 0.3000000 0.2168949 0.3950896
8        profile 30 100 0.3000000 0.2160309 0.3940967
9            lrt 30 100 0.3000000 0.2159984 0.3941141
10     prop.test 30 100 0.3000000 0.2145426 0.4010604
11        wilson 30 100 0.3000000 0.2189489 0.3958485

This is, or should be, standard in any decent software (read that as a definition of "decent"). For example, see `ci` in Stata. — Nick Cox, Jun 27 '13 at 18:25
An important bottom line here is that this longstanding problem was more or less tidied up around 2001, but not all texts and papers have yet caught up with that fact. The normal approximation often works poorly and there are better ones. — Nick Cox, Jun 27 '13 at 18:28
How can I turn a confidence interval to a single, numeric score that is between 0 and 100? That's what I need for a custom calculation. Thanks for all the information presented above however. — BBSysDyn, Jun 27 '13 at 18:30
I hate to close threads when they have good answers like this one, but our site has *extensive* material on [binomial confidence intervals](http://stats.stackexchange.com/search?tab=relevance&q=binomial%20proportion%20confidence). We ought to put some effort into consolidating some of this stuff, rather than spreading it around into more and more threads. Any thoughts? Maybe turn your answer into a blog post? Maybe post a question about binomial/Bernoulli CIs and expand this answer (and related answers) into a comprehensive reply? — whuber, Jun 27 '13 at 18:33
@user423805 I don't know what you mean by "score". Do you mean the standard error? That would be $\widehat{\mathrm{SE}}(\hat{p})=\sqrt{\hat{p}(1-\hat{p})/n}$. — COOLSerdash, Jun 27 '13 at 18:36
I don't think anyone knows what you mean by score so repeating the request makes it no clearer. Even the idea of a single SE is problematic here as confidence intervals need not be symmetric for this situation. — Nick Cox, Jun 27 '13 at 18:39
The standard error sqrt(0.3*0.7/100) would give me 0.0458, right, I guess this could be used to judge the quality of the estimation. — BBSysDyn, Jun 27 '13 at 18:41
Go easy people - not everyone is a big shot stat experts like you. Stupid question indexed by Google can provide some other non-experts with stupid question easy answers, who knows. This is not wasted effort here. — BBSysDyn, Jun 27 '13 at 18:45
I don't think we would try to answer your question if we thought it wasted effort. No one is trying to squash you, just to give you correct answers. The simple fact, repeated in this thread, is that several people could not make sense of what you wanted with a score. If a rough SE serves your purpose, that's fine, but be advised that +/- SE is a problematic method for this situation. — Nick Cox, Jun 27 '13 at 18:50
What I am saying is that the fact you could not make sense of the question, my trying to rephrase the question, etc. is a valueable record - and can help someone. "Several people" jumped into conclusion and started talking about confidence intervals where the question was about a simple score, can by itself be useful. — BBSysDyn, Jun 28 '13 at 05:33

Confidence Score for a Single Proportion

1 Answers1

Linked