3

I have no solid background in statistics so please bear with me. In general, I am trying to figure out methods to determine sample sizes required to establish a certain level of confidence for a population (e.g. out of 1000 computers, how many do I need to sample to be sure that the rest has the same properties with a certain level of confidence?).

There are online calculators for "sample sizes", often using the Cochran's (below) formula. But I am unsure whether that is valid for this purposes as basically no quality control book uses that, quoting ANSI sampling tables instead.

$$\text{Sample Size} = \frac{n}{1 + (n/\text{population})}$$ in which $n$ is equal to $Z * Z [P (1-P)/(D*D)]$ (using a 95% confidence and $5\%$ margin of error and $p = 0.5$) which gives me sample size $323$.

This formula tells me that for a population of 2000, I need to sample 323 (CL 95% and 5% margin of error). Is it valid for any sampling where I expect random distribution? Because my related question (Determine required sample size with unknown standard deviation) got quite complicated answers/comments for me so I suspect this is not gonna work..just do not know why.

Pietross
  • 135
  • 4
  • You're seeing the analogy between the sampling schemes clearly enough. If a quality engineer were interested in calculating a confidence interval of a given length for the proportion of defectives in a batch he might use Cochran's formula to work out the required sample size, given the conditions were met that make it a decent approximation. But he'd more often be interested in specifying rules for accepting or rejecting the batch that guarantee batches with more than a given number of defects are rejected with no less than a given probability. What are you interested in? ... – Scortchi - Reinstate Monica Jan 05 '16 at 16:35
  • ... "How many do I need to sample to be sure that the rest has the same properties with a certain level of confidence" isn't at all clear. BTW You should define Cochran's rule in your question: there are variants & not everyone will know it by name. It seems you've based the sample size calculation on the normal approximation to the binomial assuming a proportion of one-half - giving a pessimistic estimate of the variance - & then carried out a finite population correction. – Scortchi - Reinstate Monica Jan 05 '16 at 16:36
  • @Scortchi Thanks. What I mean by that sentence is: What sample size is enough for me to expect that it represents the whole population. If those 323 devices out of 2000 work, can I say that there is 95% confidence that all work? – Pietross Jan 05 '16 at 16:40
  • No. The 95% confidence interval for the number of defective computers in the whole batch of 2000 would be $\{0, 1, \ldots, 16\}$ - using the hypergeometric distribution, as explained in [the post](http://stats.stackexchange.com/q/139171/17230) I linked to from your previous question. (Suppose there were just one defective computer in the batch. The probability of getting no defectives in your sample of 323 would still be 84%. Only when you get up to 17 defective computers in the whole batch would the probability of getting no defctives in your sample fall below 5%.) – Scortchi - Reinstate Monica Jan 05 '16 at 16:45
  • @Scortchi Sorry I do not follow. So what does this 323 (result of the formula) mean? I mean, it is the sample size with 95% confidence but what does it actually mean? I thought the whole point of this sample size is to estimate the whole population. – Pietross Jan 05 '16 at 16:55
  • It's the answer to a different question: "If I want the 95% confidence interval for the proportion of defective computers in the batch to be no more than 10 percentage points wide, what sample size do I need?". See [Derivation for the confidence interval for a population proportion](http://stats.stackexchange.com/q/26395/17230), [Standard error of proportion that takes into account population size](http://stats.stackexchange.com/q/80162/17230), & [What, precisely, is a confidence interval?](http://stats.stackexchange.com/q/6652/17230) for some background. – Scortchi - Reinstate Monica Jan 05 '16 at 17:00
  • @Scortchi So it says that if I test 323 computers and none fails, I can be 95% confident that the number of defective computers is less than 10%. Or it does not matter whether those in the sample fail? (e.g. what if 300 out of those 323 is defective) – Pietross Jan 05 '16 at 17:33
  • In fact the 95% confidence interval is much narrower when none fail: $\frac{16-0}{2000} \approx 0.8\%$. Your formula for required sample size assumes the largest standard error possible (when the proportion of defective computers in the batch is one-half) to be on the safe side. There are variants of it where you make an educated guess of the proportion defective in the batch to avoid taking larger samples than you need. – Scortchi - Reinstate Monica Jan 05 '16 at 17:41
  • @Scortchi So it says you need 323 samples. But what if 300 fails? What does it imply for the population, according to your comment these 323 is required to be confident that no more than 10% is defective but I failt to see the relation to actual findings from the samling. – Pietross Jan 05 '16 at 18:13
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/33906/discussion-between-scortchi-and-pietross). – Scortchi - Reinstate Monica Jan 06 '16 at 10:45

0 Answers0