Normality of residuals vs. normality of unobserved error in linear regression

Question

I frequently encounter people saying that normality is assumed in the "errors" in linear regression. It seems what they mean is "residuals" rather than errors. For example, for MLE, we can arrive at the same result as OLS if we assume the residuals are normally distributed zero mean.

I understand that to arrive at the OLS closed form solution, no assumptions about the distribution of the residuals are needed. However, sometimes we make assumptions about the probabilistic distribution of the residuals for statistical inference, e.g., building confidence intervals on our estimators.

Now my question is, when do we typically assume normality in the unobserved errors, and what insight can we gain from this assumption?

errors are always unobserved, and for inference they don't need to be normal in OLS. normality is sometimes assumed on errors, not residuals. inference can be done without distributional assumptions on errors because it just happens so that CLT can be applied — Aksakal, Jul 07 '20 at 02:52
@Aksakal Hmm, looks like there's something wrong with my understanding. To arrive at the same result as OLS using MLE, you do assume normality of residuals in that case right? — user5965026, Jul 07 '20 at 02:54
yes, in MLE you have the distribution assumption, and it is on errors, not residuals. only when you assume normal errors you get to the same result as OLS. however, for inference within OLS itself you don't need normal errors — Aksakal, Jul 07 '20 at 02:56

EdM · Accepted Answer · 2020-07-07T20:37:34.530

1

Inference about a coefficient value in simple linear regression is based on the ratio of the coefficient estimate $\hat\beta$ to the standard error of the estimate, $s_{\hat\beta}$. See for example this Wikipedia page on simple linear regression with one predictor variable. That ratio is the statistic on which a test against a distribution is made.

The classic normality assumption for inference in linear regression is about the errors, not the residuals. As the Wikipedia page explains, that assumption about the errors* leads to defined distributions of both the numerator and the denominator in that ratio.

Under the assumption of a normal error distribution, $\hat\beta$ has a normal distribution with variance related to the underlying error variance $\sigma^2$. As the residuals result from a normal distribution of errors, the sum of squared residuals used to determine $s_{\hat\beta}$ is "distributed proportionally to $\chi^2$ with $n − 2$ degrees of freedom" for a simple linear regression that estimates an intercept and one slope.

The numerator and the denominator are then independently distributed with their ratio taking a t distribution. So the classic normality assumption is used not directly in the test procedure for linear regression but in the derivation of the t-test.

If the number of observations is large then the law of large numbers and the central limit theorem hold. Then the ratio $\hat\beta/s_{\hat\beta}$, as a good approximation, does behave as a normal distribution. But this is an assumption about the distribution of the test statistic, not about the underlying errors or residuals (except that the errors have finite mean and variance).

This latter case is one example an asymptotic test that holds in the limit of large numbers of observations. I don't know if much can be said about what "insight can we gain from this [normality] assumption" about "unobserved errors," but very often if you see a statistic tested against a normal distribution you will be seeing the application of an asymptotic test.

*People do check the distribution of the residuals to see whether the assumption of normally distributed errors is reasonable, but as this discussion points out such checking can be of limited usefulness.

edited Jul 07 '20 at 20:37

answered Jul 07 '20 at 02:44

EdM

57,766
7
66
187

It looks like I've been thinking about this the wrong way. I had thought the assumption was on the residuals, but when I think about it, that actually makes no sense. The residuals are KNOWN, so no need for assumptions. – user5965026 Jul 07 '20 at 03:50
I don't quite understand the claim that "residuals are drawn from the normal distribution of errors." Doesn't the residual equal the sum of (1) unobserved error / random noise (2) bias error (3) variance error? If so, the bias and variance doesn't affect change the distribution of the residuals? – user5965026 Jul 07 '20 at 03:52
@user5965026 My initial terminology "drawn from" was perhaps imprecise. The residuals represent the result of error around the true regression line (often assumed to be normally distributed errors) plus the uncertainty in the estimated regression line itself (with normal distributions of coefficient values around their true values). I've edited with that in mind. – EdM Jul 07 '20 at 12:25
1

Under classic assumptions (less stringent than the assumption about normality of errors), ordinary least squares regression provides the [best linear unbiased estimate](https://en.wikipedia.org/wiki/Gauss–Markov_theorem) (BLUE) of the model. Without bias in the expectation over repeated sampling and modeling, you are left with variance. Residuals from any one model, as errors around the estimated regression line, sum to 0 so they are linearly dependent, unlike errors that are assumed uncorrelated in the proof of the BLUE characterization. – EdM Jul 07 '20 at 12:36
@user5965026: Think of the residuals as the estimates of the errors ( I called the error the noise term because error confuses me ) except as EdM said, they are linearly independent. Note though that, without the assumption of normality of the noise term, you can't do inference but you also can't just do MLE because minimizing the sum of the noise terms squared in the non-normal case won't necessarily minimize the two normed distance between $Y$ and $X \beta$. The latter is why you have all the GLM machinery. – mlofton Jul 07 '20 at 12:48
@EdM Ah yes, I never thought about residuals as being linearly dependent because they sum to zero and contrasting that with uncorrelatedness between errors. Thanks for noting that. – user5965026 Jul 07 '20 at 17:20
@mlofton GLM is generalized linear model and not gaussian linear model, right? – user5965026 Jul 07 '20 at 17:20
@user5965026 yes, the intended use of "GLM" in that comment was almost certainly "generalized linear model." Different types of GLM make different assumptions about how the linear predictor is mapped to the outcome variable and about the associated error distributions. – EdM Jul 07 '20 at 20:35
@EdM In https://stats.stackexchange.com/questions/280189/linear-regression-and-assumptions-about-response-variable#:~:text=Wikipedia%20states%3A,of%20observed%20values%20(predictors).&text=So%2C%20Wikipedia%20makes%20an%20assumption,that%20it%20is%20normally%20distributed. Haitou Du claims OLS makes assumptions on the residuals. Based on our discussion and your answer here, isn't his answer wrong since OLS makes assumptions on the error and not residuals? – user5965026 Jul 08 '20 at 03:42
@user5965026 people often use the word "residuals" when they technically mean "errors." That's particularly the case when a volunteer is trying to write a helpful answer in a limited amount of time. After all, normality tests are performed on _residuals_ to gauge whether the assumption of normally distributed _errors_ is reasonable; normality of errors will lead to normality of residuals. I wrote this answer in part to make sure that I got the technical distinction clear for myself; trying to write answers is a great way to learn. To me, linear dependence among residuals is a key distinction. – EdM Jul 08 '20 at 12:24
@user5965026 EdM answered the question that you asked me ( which is fine ) regarding what I meant by GLM and he was correct. I think, if you can understand the concept behind GLM's, then you will also understand the standard OLS model because it too belongs to the class of GLMs with special values for the family and the link. I don't know what the "bible" is for GLMs these days but John Fox's text and his companion to the text (CAR) does a pretty good job with GLMs. – mlofton Jul 09 '20 at 01:39

Normality of residuals vs. normality of unobserved error in linear regression

1 Answers1