Why (or when) can we neglect residuals mutual correlation?

Question

I've found in several books so far only a waving of hands explanation on why one can simply neglect the residuals mutual correlation, in a linear model context, and plot a QQ-graph to do a qualitative assessment on the distributional properties of the residuals.

In a comment to this other question of mine, there's a comment stating that the average mutual correlation is $-1/(n-1)$ and I am not sure why ...

In this other question, StasK answer gives the following statement: "The reason I am saying that the off-diagonal values are small is because $\sum_{j≠i}h^2_{ij}+h^2_{ii}=h_{ii}$, and in fact either the diagonal or off-diagonal entries are roughly of order $O(1/n)$ although this is not a very strict statement that is easily thrown off by the high leverage points.»

I'm looking for two well laid out explanations of the two above statements, in simple terms, for a plain, simple man. And please, do include equations.

A similar question with no useful answer (at least to me) https://stats.stackexchange.com/questions/96144/why-doesnt-correlation-of-residuals-matter-when-testing-for-normality — An old man in the sea., Apr 17 '21 at 15:26
Another similar question with no answer https://stats.stackexchange.com/questions/279105/why-check-normality-of-raw-residuals-if-raw-residuals-do-not-have-the-same-norma?rq=1 — An old man in the sea., Apr 17 '21 at 15:27

score 1 · Answer 1 · answered Apr 17 '21 at 16:43

I would take a different approach from your other answers.

There are really 2 different sets of residuals: The "true" residuals (epsilon's) from the "true" model: $y_i = \beta_0 + \beta_1 x_i + \epsilon_i$ and the observed residuals (the e's) from the computed model: $y_i = \hat\beta_0 + \hat\beta_1 x_i + e_i$.

The "true" residuals are the distance between the y-values and the "true", population, or theoretical model, what we would see if we had the entire population, an infinite sample size, or perfect knowledge of the true relationship. These are parameters and in general are not ever observed, only estimated (like the betas).

The assumptions about independence, normality, equal variances (and any others) are about the "true" residuals, but since we do not have these values we use the observed residuals since they approximate the "true" residuals. Tools like standardizing and studentizing the residuals are attempts to find better approximations of the "true" residuals.

Since Least squares regression is pretty robust to minor deviations from the assumptions, we really only care about major deviations (in the "true" residuals), which will also show up in the observed residuals, so they work well for the diagnostic purposes. But we do not care about minor deviations because the assumptions are about the "true" residuals, not the observed ones. Knowledge of the science behind the data is also needed to suggest the ways in which the assumptions may or may not hold.

This seems to just kick the answer down the line: *how* does one tell when the deviations will be "minor"? — whuber, Apr 17 '21 at 18:16

Why (or when) can we neglect residuals mutual correlation?

1 Answers1