1

I've been told in my current textbook that for approximation of the exact distribution by means of $X^2$ or $G^2$ the expected frequencies have to be $\ge 5$.

The book says that the approximation could be biased, but I wonder how and why?

mreq
  • 205
  • 2
  • 7
  • No, not realy. Check the question and answers. – mreq May 26 '13 at 21:28
  • I think those answers answer your question: 5+ expected frequencies are not needed. – Momo May 26 '13 at 21:34
  • 1
    I agree w/ @Momo that the linked question seems to cover the required territory. I respect that you believe it doesn't, but in that case, you should edit your Q to clarify what you want to know more specifically & how it is distinct from the other Q, otherwise, this question will end up being closed. – gung - Reinstate Monica May 27 '13 at 01:38
  • I know the condition is "old" and there are less strict ones. The point of the question was, why are even the less strict conditions needed. @Glen_b thanks, that's pretty close to what I wanted to know. However, I still don't know **why** the approximation wouldn't be reasonable for lower numbers. – mreq May 27 '13 at 09:26
  • 1
    I have taken my responses to an answer. I think if you edit your question to reflect your clarification it's more likely to stay open. – Glen_b May 27 '13 at 13:27

1 Answers1

6

The counts are discrete. The chi-square approximation for $X^2$ relies on the counts being approximately normally distributed; when all the expectations are greater than 5 the chi-square approximation tends to be reasonable; it's a pretty arbitrary cut-off. For $G^2$ it relies on an asymptotic argument; I think it generally comes in more slowly than for $X^2$. The condition of all expecteds above 5 is very old; many more recent papers suggest somewhat less stringent requirements are fine for the Pearson.

Basically, when a discrete r.v. has nearly all its probability concentrated in a few values, even the best of continuous approximations is not going to be much good. Consider a chi-square goodness of fit test for a bernoulli(.25), where we have 4 observations. The chi-square is the sum over the number of $0$'s and $1$'s of the usual $(O-E)^2/E$. The two expected counts are 3 and 1. The actual distribution of the chi-square statistic takes exactly four values. The chi-square(1) has a 5% critical value of 3.84, but the 95th percentile of the actual distribution is 5.33.

pdf discrete X^2 stat

cdf discrete X^2 stat with chi-square approx

Glen_b
  • 257,508
  • 32
  • 553
  • 939