Mathematical Principle behind ANOVA?

Question

I am writing this question to ask about the mathematical principle behind ANOVA.

My knowledge of ANOVA comes from a business statistics text book, as one can imagine, it won't go too deep into the underlying principles.

I understand the mechanics of computing the F statistics, I am having a hard time to understand why it should have the F distribution as its sampling distribution.

I understand that the F distribution is the quotient of two chi square variables divided by their degree of freedom, but I do not understand why the MST and MSE (the numerator and the denominator of F) follows the Chi square distributions with those degree of freedoms.

I could have just missed some very basic things.

Also in [this](http://math.stackexchange.com/questions/1317671/2-way-anova-degrees-of-freedom-proof/1329090#1329090) post, I explained from a pure math perspective how the degrees of freedom are computed. It might be of your interest. — Zhanxiong, Dec 27 '16 at 05:16
May be the following links are useful:http://math.stackexchange.com/questions/2010313/what-is-the-meaning-of-ems-and-bms-in-an-anova/2012234#2012234 — L.V.Rao, Dec 27 '16 at 13:07
http://stats.stackexchange.com/questions/240795/proof-that-the-sampling-distribution-of-the-sample-variance-from-n0-1-sim-ch/240814#240814 — L.V.Rao, Dec 27 '16 at 13:08
Thank you Zhanxiong, I didn't know about the Cochran's theorem. Aftering reading the link below, the mystery is solved, that seems to be the key missing ingredient. The rest is really just arguing the rank of the matrices involved in the ANOVA splitting of the sum of squares. http://www.stat.columbia.edu/~yangfeng/W4315/lectures/lecture-cochran's-theorem/cochran's-theorem.pdf — Andrew Au, Dec 27 '16 at 21:10

score 4 · Answer 1 · edited Apr 13 '17 at 12:44

I would encourage OP to conceptually separate the mathematical and statistical principles of ANOVA.

Mathematical Principles of ANOVA

Consider variable $Y_k, \; k = 1, \ldots, N,$ with sample variance $s^2 = \sum_{k = 1}^N (Y_k - \bar{Y}_{\centerdot})^2.$ Now consider a grouping index $i = 1, \ldots, I$ with no particular meaning that divides $1, \ldots, N$ into equal (for convenience) groups of size $n$. We can then rewrite the variance as $$s^2 = \sum_{i = 1}^I \sum_{j = 1}^n (Y_{ij} - \bar{Y}_{\centerdot \centerdot})^2/(N - 1).$$ This is the exact same quantity with a different indexing scheme. The following two operations of subtracting and adding the group means, and expanding the square (needs demonstration that the cross-product goes to zero), is entirely algebraic:

\begin{align*} (N - 1)s^2 &= \sum_{i = 1}^I \sum_{j = 1}^n (Y_{ij} - \bar{Y}_{i \centerdot} + \bar{Y}_{i \centerdot} - \bar{Y}_{\centerdot \centerdot})^2 \\ &= n\sum_{i = 1}^I (\bar{Y}_{i \centerdot} - \bar{Y}_{\centerdot \centerdot})^2 + \sum_{i = 1}^I \sum_{j = 1}^n (Y_{ij} - \bar{Y}_{i \centerdot})^2. \end{align*} The math doesn't care about the interpretation of these terms, and the decomposition always works (at least for the one-way layout).

Statistical Principles of ANOVA

So far in this example, not a single distributional statement was made about $Y_k$ or the re-indexed $Y_{ij}$, and that's because the mathematical decomposition didn't need any. A statistical device, the null hypothesis that there are no group effects, along with the assumption of normality, leads to $Y_{ij} \sim N(\mu, \sigma^2)$ for all $i, j$.

I won't go through every step along the way to the F-statistic, but note that the sample variance of $\bar{Y}_{i \centerdot}$ is $\sum_{i = 1}^I (\bar{Y}_{i \centerdot} - \bar{Y}_{\centerdot \centerdot})^2 / (I - 1)$, and, when scaled by the appropriate constant, has a $\chi_{I-1}^2$ distribution. You can probably "see" this quantity in the decomposition above, and well as hints of the F-statistic if you divide the first term by the second.

In Casella and Berger's text Statistical Inference, both $t$ and $F$ distributions are introduced under a section "The Derived Distributions." As far as I know, the F-distribution was derived ad hoc (from a scaled ratio of $\chi^2$ random variables) for the purposes of testing ANOVA null hypotheses.

Mathematical Principle behind ANOVA?

1 Answers1

Mathematical Principles of ANOVA

Statistical Principles of ANOVA