Logical transition from real problems to chi-squared test

Question

So usually the logic for all chi-squared problems is as follows:

Formulate the Null and Alternative hypothethis
Calculate pearson residuals
Now we see, that those residuals fit well into the chi-sqared distribution.
Use chi-square distribution to get probability of observed data.

I've read and watched many explanations of how chi-squared distribution is build. It's more or less clear here.

But I can't figure out transition between 1 and 2: WHY do we calculate (observed-expected)/sqr(observed) in the first place?. Why don't we use any other (random) function from "observed" and "expected"? Why not (observed-expected)^3/log(observed)? Then it wouldn't fit to chi-squared distribution, maybe it will fit for another type of test...

One justification comes from analysis of the likelihood. The conditions for this to be valid are described in my answer at https://stats.stackexchange.com/a/17148/919 (but the analysis is not provided). A great deal of your question is addressed by the basic theory of hypothesis tests, which implies (among other things) that the test statistic must have a definite distribution when the null hypothesis holds. Please see our thread at https://stats.stackexchange.com/questions/31. — whuber, Dec 14 '21 at 00:23

Logical transition from real problems to chi-squared test

0 Answers0