0

1.000 people attempted a task and 30 succeeded. Given this information, we can say that the success rate was 3%.

But what if I want to estimate from this sample a confidence interval for the success rate in the population? For example, given that previous data, the success rate is between let's say 1% and 5% in 95% of cases.

How can I calculate that interval with a 95% certainty? is there any tool online?

Thank you, Luca

rolando2
  • 11,645
  • 1
  • 39
  • 60
LucaP
  • 33
  • 5
  • 1
    Look for a binomial distribution calculator. – Alexey Burnakov Mar 27 '18 at 12:13
  • Or the formula for the standard error of a proportion. E.g., the first equation at https://stats.stackexchange.com/questions/266442/standard-error-for-proportion-with-small-sample-size – rolando2 Mar 27 '18 at 20:17

1 Answers1

-2

If you have a binomial with $n=1000$ and $p=0.03$ the outcome of one sampling of this distribution is $k$. You're looking for the interval from $k-\Delta$ to $k+\Delta$ which contains 95% of the probability of this distribution. Defining $F(x) = P(X \leq x)$ then you need to solve the following equation for $\Delta$:

$$F(k+\Delta) - F(k-\Delta) = 0.95$$

Given that the expression for $F$ is awkward, I'm imagining this that to be done numerically.

oneloop
  • 598
  • 3
  • 14
  • The problem isn't awkwardness: it's that you don't even know $F$! – whuber Mar 27 '18 at 20:23
  • What do you mean? If the PMF is binomial, the CDF is just a finite sum. Awkward, but totally known. – oneloop Mar 27 '18 at 20:40
  • 1
    On the contrary, the entire point to a CI is that you do *not* know the underlying distribution. If you know $F$ then you know $p$ and there's nothing to estimate. – whuber Mar 27 '18 at 21:34
  • There's at least two ways to interpret the original question. One is "1000 people attempted task and 30 succeeded, what's the underlying distribution?". That's not how interpreted it. I interpreted it as: "we know we have a binomial with p=0.03, what's the interval of likely outcomes." If you look at the original LucaP question, he says "But what if I want to know what the success rate actually is?", which is a lot more ambiguous than rolando2's edited version. – oneloop Mar 28 '18 at 07:53
  • In other words, if you take rolando2's edited version as ground truth, then I agree with you that this is an inference problem, the distribution isn't known, and therefore my response doesn't apply. If you take LucaP's original version as ground truth, I would say it's ambiguous, but I can also see how you can read it as being a question about inference. – oneloop Mar 28 '18 at 10:42