1

just started learning about Pearson's Chi Squared test, and found some parts to be confusing:

In the Chi Squared test, I don't understand why we can just assume the sum of the difference between observed and expected values to follow a Chi Squared distribution. Can it be another distribution? If so, how can one tell?

On a related note, the Chi Squared test does not care what distribution the observed or expected random variables actually follow. Is there some intuition how why Chi Square would work "globally" for observed and expected r.v.'s with any distribution?

foobar
  • 643
  • 1
  • 7
  • 17
  • [Pearson's Chi Squared test](https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test) can be against any distribution but is easiest to understand against a uniform distribution. Basically the difference between values, observed and expected in each cell or histogram category, can be combined and in so doing have the form of a Chi-squared statistic. Remember, these are differences that are being combined, not the original numbers, so it should make sense that they obey a different distribution law than the numbers that were subtracted do. – Carl Sep 13 '16 at 02:53
  • Your not assuming a Chi Squared distribution completely, but instead using an approximation that says if your sample expected values ie $E_i \geq 5$ for $i=1,...n$ where $n$ is your number of bins roughly speaking. Hence it is this theoretical justification that helps to justify why we assume the test statsitic follows a chi squared distribution under the null. – user60887 Sep 13 '16 at 02:56
  • 1
    See also [Why does independence test use the chi-squared distribution?](http://stats.stackexchange.com/q/82260/17230) & [Why does chi-square testing use the expected count as the standard deviation?](http://stats.stackexchange.com/questions/14797/why-does-chi-square-testing-use-the-expected-count-as-the-variance) – Scortchi - Reinstate Monica Sep 13 '16 at 11:14
  • 1
    At http://stats.stackexchange.com/questions/16921/how-to-understand-degrees-of-freedom/17148#17148 I posted an explicit example of how this assumption can be incorrect: a seemingly routine computation of the chi-squared statistic proved not to have a chi-squared distribution at all. The way to tell how this can happen is to check that *all* the technical assumptions needed for the test apply. If they do not, then you have grounds to *suspect* there might be a problem. – whuber Sep 13 '16 at 15:42
  • 1
    What I was taught is that if the counts in each cell are Poisson distributed as in testing nuclear decay counting equipment, then the problem is Chi-squared. – Carl Sep 13 '16 at 17:21
  • 1
    @Scortchi Not the SD, but the variance of a Poisson counting is the count rate. The SD is its square-root. – Carl Sep 13 '16 at 22:50
  • @user60887 For $n$ below ~5, counting of nuclear decays is no longer Poisson, I think it is then binomial. – Carl Sep 13 '16 at 23:10
  • @Scortchi Doesn't Poisson tend to Normal for $n>40$ or there about? That may explain Chi-squared using $\sqrt n$ for standard deviation for Gaussian. – Carl Sep 14 '16 at 14:50

0 Answers0