Why is OLS/ANOVA assumption about the normality of residuals rather than normality of the error distribution (of the mean)?

Question

One of the assumptions of ANOVA is normally distributed residuals. However, as noted here some authors have stated that this assumption is barely important at all. Simulation results have shown that, provided sufficient sample size, deviations from normality can be tolerated without a large adverse effect. In practice I often hear people recite that 'the test is robust to violations of the assumption of normality'.

My question is this: When making inferences at the group level (not point estimates for individual observations), why is it not the case that the assumption is about the normality of the error distribution (i.e. the hypothetical distribution that we quantify with terms like the Standard Error of the Mean) as opposed the specifically observed residuals?

Isn't the most proximal cause of Type I inflation or deflation, non-symmetrical errors in estimating the mean, etc, really a reflection of the non-normality of the error distribution?

I grant that, clearly, there is an equivalence here that may lead to this being a bit of semantics. Non-normally distributed residuals almost inevitability are going to lead to a somewhat non-normal error distribution of the mean estimate. However, I think the simulation results I referred to above (and our general knowledge of the Central Limit Theorem) hold out the idea that if the means being described have a sufficient number of samples, then the error distribution can become reasonably normal and thus nothing bad happens.

Moreover, if the assumption was about the distribution of errors it would neatly describe why it is that the violation of this assumption can frequently be papered over with larger sample sizes. It would also highlight characteristics of datasets where 'larger' is so huge as to make it impractical (e.g. zero inflated datasets).

Is there something I am missing that makes it critical that we describe the assumption as 'normality of residuals'?

Possible duplicate: http://stats.stackexchange.com/questions/60410/normality-of-dependent-variable-normality-of-residuals — HStamper, Jan 04 '17 at 16:17
The assumption is that $y \sim \mathcal{N} \left( X \beta, \sigma^2 I \right) = X \beta + \mathcal{N} \left( 0, \sigma^2 I \right)$, which is on the error distribution. Under this assumption, it follows that the regression's residuals $(I - H) y \sim \mathcal{N} \left( 0, \sigma^2 (I-H) \right)$, where $H = X (X^T X)^{-1} X^T$. They're both normal. However, note that the second distribution does not imply the first. — user795305, Jan 04 '17 at 16:27
@Benjamin Clearly. :) That's the part where 'there is an equivalence here that may lead to this being a bit of semantics' comes in. Thanks for defining that equivalence so concisely. However, I don't think I've found a source that states that the assumption is on the error distribution rather than on the regression residuals. Have you? If you can answer with a source, then the check mark is yours. — russellpierce, Jan 04 '17 at 16:33
That's what my last sentence comments on. The assumption has to be on the distribution of $y$. If we only assumed the distribution of $(I-H) y$, which is strictly weaker than assuming the distribution of $y$, then, for instance, ANOVA tables wouldn't exist: what would the distribution of $Hy$ be? — user795305, Jan 04 '17 at 16:38
Residuals are observed, so there is no need to make assumptions about them. Meanwhile, errors are unobserved, and assumptions such as normality are common on them. — Richard Hardy, Jan 05 '17 at 13:26
@RichardHardy That was my knee-jerk reaction too, but just because the residuals are observed doesn't mean there isn't some underlying distribution. (See, for instance, my first comment above. $y - \hat{y}$ does have a distribution under an assumed distribution on $y$.) — user795305, Jan 05 '17 at 17:10
@Benjamin, you are right. But the assumption we are making in OLS estimation is still for the true errors rather than residuals, isn't it? — Richard Hardy, Jan 06 '17 at 03:04
@RichardHardy Yes, definitely the assumption is that the true error $y - X \beta \sim \mathcal{N} \left( 0, \sigma^2 I \right)$. — user795305, Jan 06 '17 at 20:38

user795305 · Accepted Answer · 2017-01-04T19:37:59.547

3

The assumption is on the error distribution so that $y \sim \mathcal{N} \left( X \beta, \sigma^2 I \right)$. See, for instance, page 9 of https://www.stat.osu.edu/~pfc/teaching/7410/notes/A1_linear_models.pdf. For a textbook that mentions this, see the first page in chapter 3 of Agresti's Foundations of Linear and Generalised Linear Models--I think the textbook is amazingly clear.

That simulation linked to comments on the robustness of the $F$ test to normality. If, however, we generated data with with mean $X\beta$ and some highly non symmetric error, there'd be issues. You mention perhaps invoking the CLT to straighten out everything, but there are no sums here to invoke the CLT on. We're just accumulating more and more samples from whatever the residual distribution is.

edited Jan 04 '17 at 19:37

answered Jan 04 '17 at 16:43

user795305

2,692
1
20
40

Would you please clarify in lay-terms, when you say 'on the error distribution' do you mean the hypothetical distribution that we quantify with terms like the Standard Error of the Mean, the specific residuals observed, or the hypothetical distribution of residuals one might observe? – russellpierce Jan 04 '17 at 22:35
There's really only 2 distributions at play here. The standard error of the mean is just a transformation of the sample standard deviation of a random variable. The "specific residuals observed" is the regression's residuals $(I-H)y = y - \hat{y}$ and the "hypothetical residuals" are the residuals if we knew the true regression line, which is $y - X \beta$. Under the assumptions of a linear model (that $y \sim \mathcal{N} \left( X \beta, \sigma^2 I \right)$), we have that the mean is $X \beta$ and the "hypothetical residuals" have distribution $\mathcal{N} \left( 0, \sigma^2 I \right)$. – user795305 Jan 05 '17 at 12:48
1

Ah. Thank you. Sorry it took me a while to catch on. Flu + statistics is a bad mix. :) – russellpierce Jan 05 '17 at 20:44

Why is OLS/ANOVA assumption about the normality of residuals rather than normality of the error distribution (of the mean)?

1 Answers1