Where the square comes from in chi-square test?

Question

Chi-squared distribution with $k$ degrees of freedom is defined as: the distribution of the sum of the squares of $k$ independent standard normal random variables.

Why the sum of the "squares" of the normal random variables but not, say, just their sums? Where does the need for squaring comes from?

Thanks.

The squaring originates in many ways: your question has it backwards. Given that one needs to analyze the sum of squares of iid Normal variates, the chi-squared distribution emerges. In circumstances where sums of Normal variables are involved, the analysis shows that such sums are themselves Normal. You can find a very great many threads on this site that concern circumstances where sums of squared Normal variates arise: search for "ANOVA," for instance, or even multiple regression. — whuber, Oct 04 '19 at 22:39
@whuber The common example given for illustrating chi-squared test is the test for fairness of a die. In that case the squares of the difference between expected and the observed frequency (divided by expected freq.) is used. I don't see why the sum of squares is used here instead of, for example, just the sum of the differences as a test statistic. — Sanyo Mn, Oct 04 '19 at 22:55
The sum of differences is always zero--that's a useless statistic. Squaring therefore gives the *lowest order approximation* to something that often is very complicated. — whuber, Oct 05 '19 at 16:51
@whuber I mean the sum of the absolute differences which is not zero. — Sanyo Mn, Oct 06 '19 at 19:45
@whuber the link explains the use of squaring in standard deviation, I don't see how this explains the use of squaring in chi-square test statistic. — Sanyo Mn, Oct 06 '19 at 20:19
The answers in that thread are also answers to your broad question. — whuber, Oct 07 '19 at 13:50

score 0 · Answer 1 · answered Oct 04 '19 at 23:56

Why the sum of the "squares" of the normal random variables but not, say, just their sums?

If you take their sums you revive a normal distributed random variable again - see https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables#Independent_random_variables. Therefore one doesn't need a new name for the distribution of the sum of independent normal distributions, since it can be described as a normal distribution already.

Where does the need for squaring comes from?

The (sum of) squared normal distributions cannot be expressed as normal distribution. Therefore one needs to define a new name for them - "Chi-squared distribution". As written in the comments, there are lots of applications of chi-squared distributions.

what is the problem of getting a normal distribution when we take the sums. Wouldn't it be easier to use as a test statistic? (to reiterate, my question is what is the purpose of squaring the absolute differences) — Sanyo Mn, Oct 07 '19 at 08:27

Where the square comes from in chi-square test?

1 Answers1