1

So usually the logic for all chi-squared problems is as follows:

  1. Formulate the Null and Alternative hypothethis
  2. Calculate pearson residuals
  3. Now we see, that those residuals fit well into the chi-sqared distribution.
  4. Use chi-square distribution to get probability of observed data.

I've read and watched many explanations of how chi-squared distribution is build. It's more or less clear here.

But I can't figure out transition between 1 and 2: WHY do we calculate (observed-expected)/sqr(observed) in the first place?. Why don't we use any other (random) function from "observed" and "expected"? Why not (observed-expected)^3/log(observed)? Then it wouldn't fit to chi-squared distribution, maybe it will fit for another type of test...

AntonZi
  • 111
  • 3
  • 2
    One justification comes from analysis of the likelihood. The conditions for this to be valid are described in my answer at https://stats.stackexchange.com/a/17148/919 (but the analysis is not provided). A great deal of your question is addressed by the basic theory of hypothesis tests, which implies (among other things) that the test statistic must have a definite distribution when the null hypothesis holds. Please see our thread at https://stats.stackexchange.com/questions/31. – whuber Dec 14 '21 at 00:23

0 Answers0