1

What is the upper bound for the sampling fraction for the Central limit theorem to hold when sampling without replacement?

Context
The context for my question is that it is regularly argued (see e.g. here) that even in cases where your population is not normally distributed you can still use a t-test when your sample size is big enough because your sample means will be approx. normal. The given reason for this is the CLT.

But when your sampling fraction is huge this argument obviously breaks down because when you sample a large part of your finite population without replacement you won't get a normally distributed sample mean. So there must be an upper bound where the CLT still holds, therefore the question.

vonjd
  • 5,886
  • 4
  • 47
  • 59
  • 2
    One reason I love this site is that I see questions like this about topics I never considered, then have to do some digging just to get context for the question. I found this paper that might be relevant: http://www.jstor.org/discover/10.2307/2685700?uid=3739696&uid=2460338175&uid=2460337855&uid=2&uid=4&uid=83&uid=63&uid=3739256&sid=21104569736131 – shadowtalker Nov 17 '14 at 15:50
  • @ssdecontrol: Thank you, this is already interesting and it shows me that there is more to this than I originally thought (as is often the case ;-) I gave some additional context. – vonjd Nov 17 '14 at 16:31
  • 1
    Some clarification of how you think the CLT would be applied would help us understand what this question is really asking. Since the CLT is an assertion about sequences of ever-larger samples--which are obviously impossible when sampling without replacement from a finite population--then precisely *what* sequences do you have in mind? – whuber Nov 17 '14 at 16:51
  • @whuber: I gave some context for the question. What part of that is unclear? I mean basically the practical question is how large can the sampling fraction be to still be justified to use a t-test. – vonjd Nov 17 '14 at 16:56
  • Nothing is unclear about the context. But even so, the question as stated still needs *your* clarification. Why do you even suppose that such an upper bound ought to exist? If one is to derive such a thing mathematically, it is imperative that it be done with a specific asymptotic model in mind. – whuber Nov 17 '14 at 17:00
  • @whuber: Well, an upper bound must exist because if you e.g. use the full population as your sampling size the means will not be normally distributed because they will be the same every time. You then have a population that is not normally distributed and means that are not normally distributed. Arguably that case is not so interesting because you won't want to use a t-test after all because everything is known but I guess there might be cases where your sampling fraction is large enough so that the CLT might not hold either so that even with a big sampling size you cannot use a t-test. – vonjd Nov 17 '14 at 17:07
  • This makes no sense, because "the full population" is finite and fixed. There are no asymptotics and therefore one cannot even state a CLT-like theorem! In order to make any progress towards an asymptotic statement you *must* posit some sense in which your population size grows. But how can it do that and retain the same statistical behavior? – whuber Nov 17 '14 at 17:34
  • @whuber: Ok, thank you - I will think about that... – vonjd Nov 17 '14 at 17:48
  • @whuber: I think you are right. The question doesn't make sense as it stands... my fault was that I thought that the CLT for the mean of the samples would not hold in case you had e.g. a sample size of 4050 and a population size of 4100. But I did some simulations in R and saw that I was wrong. What should I do? Delete the question or voting to close? Thank you! – vonjd Nov 17 '14 at 18:01
  • Well, your thinking was pretty good: the distribution of the mean of samples of size 4050 is going to be linearly related to the distribution of the mean of samples of size 4100-4050 = 50, which could look highly non-normal in certain populations. The basic underlying problem, though, is that the kinds of appeals to the CLT you mention are really not appropriate; they work only because of other unstated assumptions or vaguely-adduced "rules of thumb" (all of which have exceptions). There's interesting stuff here. You could try rephrasing your question, but you can always delete it if you wish. – whuber Nov 17 '14 at 18:09

0 Answers0