0

If I have a population of 5000, statistics suggests that I need to sample about 350+ to get a confidence interval of 95% with margin of error 5%.

So why do I see that sometimes we can get away with as low as 30?! That seems so much lower than what all these tables and calculators suggest.

Remy F
  • 35
  • 3
  • 2
    Depends on your effect size, standard error, etc. Read about sample size calculation, power analysis, and study design. – Ellis Valentiner Jan 30 '14 at 17:57
  • Under what circumstances does it make sense to use a sample size of 30? For a population of 5000 at the 95% confidence level, the error is like 18! – Remy F Jan 30 '14 at 18:01
  • If the effect size is very large, it is quite easy for a model to discriminate. In such a scenario a small sample size might be appropriate – bdeonovic Jan 30 '14 at 18:05
  • What does that mean? – Remy F Jan 30 '14 at 18:06
  • 1
    It means that you cannot make a sample size prediction only based on "the size of your population" (not exactly sure what this is or how you are using this to calculate CI). You need to know other factors such as the effect size, power, type I error, etc. If you are confused you need to do more reading on power analysis as someone suggested – bdeonovic Jan 30 '14 at 18:14
  • Maybe I will simplify: If *all* I know is that the population size is 5000, how much would I need to sample? – Remy F Jan 30 '14 at 18:15
  • If that's all you know, you need to find out more. – so12311 Jan 30 '14 at 18:18
  • That's all I have to go from, here. I can't even ballpark it? – Remy F Jan 30 '14 at 18:19
  • @UnbanRonMaimon Also, if that is not sufficient then how come there are so many calculators that seem to be able to give you sample size if all you know is the population size and how accurate you want the results to be? – Remy F Jan 30 '14 at 18:21
  • @RemyF like what? – so12311 Jan 30 '14 at 18:21
  • @RemyF, ok I think I see what you're confused about now. When you give those calculators a desired "Confidence Interval" you are making an implicit assumption about the size of the effect you are interested in. If you're interested in a small effect, you need a tighter confidence interval. – so12311 Jan 30 '14 at 18:26
  • What do you mean by small and large effect? What if I don't know the "effect"? Can I assume? – Remy F Jan 30 '14 at 18:40
  • @RemyF Modern statistics can deal with sample sizes much smaller than 30. Fisher's tea tasting experiment technically only had n=8. To really be a statistical thinker, you should set up conditions for your experiment in which, when certain assumptions fail, there are better methods for data analysis. For instance, with binomial data, a normal approximation may not be appropriate. Use the exact binomial distribution to calculate confidence intervals, then! Better still is to set methods in place which are assumption-free and correct in small sample sizes (resampling statistics). – AdamO Jan 30 '14 at 19:02
  • @RemyF What sample size calculators are you using that only require you to input 'population size'? – bdeonovic Jan 30 '14 at 20:01
  • Who, exactly, says *what*, exactly, about n=30? [If someone claimed that n=30 was enough to "get a confidence interval of 95% with margin of error 5%", then you'd have an argument that there was a problem!] – Glen_b Jan 31 '14 at 00:48

1 Answers1

5

The sample size of 30 is typically a rule of thumb for how large of a sample size you need for the sample average to be approximatelly normally distributed. This is necessary if you are doing some kind of inference on a population parameter.

As another user pointed out, you might need a larger sample size if you want your hypotheses test to have a particular power.

bdeonovic
  • 8,507
  • 1
  • 24
  • 49
  • The statement in the first sentence is accurate (that it's a typical rule of thumb), but the 'rule of thumb' itself is deeply flawed, almost to the point of being dangerous. As such, I disagree with the second sentence. Can you justify n=30 on the basis that you need it for doing inference on a population parameter? – Glen_b Jan 31 '14 at 00:51
  • I meant that it was necessary to know the distribution of your test statistic if you want to do inference on a population parameter. The CLT provides this. – bdeonovic Jan 31 '14 at 02:51
  • But the CLT only tells you about behavior as $n\rightarrow\infty$. 30 is very often not sufficiently close to infinity. By way of example, see the discussion [here](http://stats.stackexchange.com/questions/81074/how-useful-is-the-clt-in-applications/81087#81087). – Glen_b Jan 31 '14 at 03:10