Specifically, let's say I take a random sample of 20 products from a manufacturing batch of 1000 and they all tested good, what assumptions and conclusions can I make about the whole batch? Is it possible to say "There is an x% chance that the entire batch is good"? How could I calculate x in this case? Note that this is assuming I have no prior knowledge of what the defect rate should be.
-
1[This](https://stats.stackexchange.com/questions/134380/how-to-tell-the-probability-of-failure-if-there-were-no-failures) seems relevant. (Not sure if there is a similar Q with an accepted answer? Else this would be a duplicate) – GeoMatt22 Apr 27 '17 at 03:16
-
https://stats.stackexchange.com/questions/275049/what-should-the-estimated-proportion-be-for-the-population-when-the-sample-propo#comment527359_275049 is also relevant. – whuber Apr 27 '17 at 13:29
2 Answers
20 of 1000 are good. For rest 980, number of good ones can be from 0 to 980. Calculate the probabilities that a random sample of 20 products from a manufacturing batch of 1000 and they all tested good, when # of good ones are 20, 21, ..., 1000 among 1000. (totally 981 probabilities). Add them together as denominator, and last one (1000 goods among 1000) as numerator. This ratio is your x.
Let $Y$ be the number of good ones among 1000 products. Because no prior information, we assume $\Pr(Y=k) = 1/1001$ because $Y$ can be 0, 1, ..., 1000. It is uniform distribution and is used as no informative prior very often.
Let $B$ be the event that all of 20 products are good in the random sample of 20 from 1000 products.
So the asked question is
$\Pr(Y=1000|B)$
So $\Pr(Y=1000|B) = \frac{Pr(B|Y=1000)\Pr(Y=1000)}{\sum_{y=0}^{1000}\Pr(B|Y=y)\Pr(Y=y)}$
$=\frac{1/1001}{1/1001}\frac{Pr(B|Y=1000)}{\sum_{y=0}^{1000}\Pr(B|Y=y)}$
$=\frac{Pr(B|Y=1000)}{\sum_{y=0}^{1000}\Pr(B|Y=y)}$
$=1/47.666667 = 0.02097902$ appr. 2.1%
Under the assumption of simple random sample, the # of good ones among 20 sampled units follows hyper-geometric distribution. So
$\Pr(B|Y=y)={\frac {{\binom {y}{20}}{\binom {980}{0}}}{\binom {1000}{20}}} = \frac {{\binom {y}{20}}}{\binom {1000}{20}}$. Need to know that $\binom {y}{20}= 0 $ for $y<20$
In fact, calculation can be performed by any software with probability density function of hyper-geometric distribution.

- 7,032
- 2
- 9
- 19
-
This recipe does not seem to correspond to any classical confidence limit or Bayesian estimate. Do you have a reference? – whuber Apr 27 '17 at 13:31
-
-
What prior are you presupposing? Evidently *some* explanation of your recipe is needed. And please note that the original question prominently states "I have no prior knowledge...," indicating that any prior you might use will require justification. – whuber Apr 27 '17 at 15:38
-
-
Thank you -- that is helpful. Since the data are given us, what numerical answer do you actually get? – whuber Apr 27 '17 at 16:11
-
I think Vic can finish programming. It would take me two hours and I have no so much time. – user158565 Apr 27 '17 at 16:20
-
I am suggesting *you* try it, because the effort will reveal the errors in your formulas. – whuber Apr 27 '17 at 17:45
-
-
That's a shame: don't you think the number should be much closer to 5%? After all, with a uniform prior and a simple random sample (with replacement), you would expect the posterior mean to be close to the usual Bayes estimate of $1/(20+2)=4.5\%$, and this situation clearly is well approximated by simple random sampling. That makes a result of $2.1\%$ implausible. – whuber Apr 27 '17 at 20:28
-
If you want to prove I am wrong, you can point out in my math derive process, or by the simulation. Otherwise, how should I believe your 4.5%? – user158565 Apr 27 '17 at 20:45
-
The burden of supporting any answer you post here is on you: we do not accept answers only because they have not been completely debunked. Sufficient doubt as to the correctness, in conjunction with a reasonable suggestion concerning the right answer, should be more than sufficient reason to revisit your calculation. Note that I'm not saying your final result is wrong, but I am only saying you haven't provided sufficient explanation for people to believe you are right. – whuber Apr 27 '17 at 21:16
-
(Incidentally, with a uniform prior the posterior mean is $490/110000 =0.04454545\ldots.$ It differs from the approximation $1/22$ by less than $0.001$.) – whuber Apr 27 '17 at 21:46
-
1I owe you an apology for having confused the question: you have been addressing (correctly) the chance that the entire population is good, whereas I shifted the discussion to estimating the mean of the posterior distribution. The results we are getting are consistent. For instance, returning to the Beta$(1,22)$ approximation, I would estimate the chance of less than $1/1000$ of the population being bad is $0.02177$, quite consistent with your answer. Along with the apology goes an upvote. – whuber Apr 27 '17 at 21:57
-
1Nice! But you should probably explain why $\frac{Pr(B|Y=1000)}{\sum_{y=0}^{1000}\Pr(B|Y=y)}=1/47.666667$. The numerator is easy (of course the probability of drawing 20 good products from a population where all products are good is 1), but computing the denominator $\sum_{y=0}^{1000}\Pr(B|Y=y)=\sum_{y=20}^{1000}\Pr(B|Y=y)$ is not as simple and requires a model for sampling w/o replacement. You should at least sketch how it's done. – DeltaIV Apr 27 '17 at 22:46
-
Yes, please explain the denominator calculation and then I will accept the answer. – Vic Apr 29 '17 at 02:43
-
1
-
great! Now it's an exemplary answer. Cannot +1 because I already did :) – DeltaIV Apr 30 '17 at 22:54
-
It is assumed prior distribution of Y (# of good products among your 1000 products). We know that Y can be 0, 1, 2, ..., 1000. (1001 possible values). $Assume$ we do not have any knowledge about Y before the analysis, so the all of them have equal probabilities, then we have Pr(Y=y) = 1/1001. – user158565 May 14 '17 at 17:59
You can construct a confidence interval for the proportion p that are not defective. The interval will be one-sided [p$_c$, 1] say 95% this gives you an idea based on the width of the interval ( [1-p$_c$,1] ) how confident you are that the proportion is close to 100%. This would be based on the hypergeometric distribution.
Based on Bill Huber's comment about the Bayesian approach. This shows that a lot can be said without prior knowledge. This computation is based strictly on frequentist methods.

- 39,640
- 28
- 74
- 143
-
1"Based on the hypergeometric distribution" is correct but may be misleading, because the calculation is dead simple and does not require any knowledge of the hypergeometric distribution. See https://en.wikipedia.org/wiki/Rule_of_three_(statistics). – whuber Apr 27 '17 at 13:32
-
2@whuber I think the important thing to say was that a confidence interval might be the right way to think of the problem. I was the first to answer this question. I mentioned the hypergeometric distribution because that is the distribution that applies in this case and the OP might not be aware of this. – Michael R. Chernick Apr 27 '17 at 13:37
-
1Those are excellent points. I think your answer would be stronger if you were to emphasize them, especially because the question has been phrased in a Bayesian way. – whuber Apr 27 '17 at 13:45