I often hear that when the residuals depart from normality, the central limit theorem can be used to fix things. I do not quite understand how this works, since the central limit theorem is a statement about scaled sums of random variables. How exactly is the CLT used to make the data normal?
Asked
Active
Viewed 199 times
3
-
1You appear to have a common misconception about the central limit theorem: https://stats.stackexchange.com/questions/473455/debunking-wrong-clt-statement. – Dave Jun 01 '21 at 15:31
1 Answers
7
The CLT does not make the data normal. For OLS the CLT is a result about the regression parameters. Indeed, they are expressed as a sum of random variables.

AdamO
- 52,330
- 5
- 104
- 209
-
I read something about the residuals being able to be thought of as the sum of independent errors, does that make sense? – user321627 Jun 01 '21 at 07:16
-
Not really, the residuals are $Y-\hat{Y}$, so you can think of them as a *sum*, but they don't tend to normal as $n \rightarrow \infty$. – AdamO Jun 01 '21 at 12:59
-
I read somewhere that Gauss (*the* Gauss) had an interesting (but incorrect) argument as to why regression residuals should be normal. Even the giants make mistakes. – BigBendRegion Jun 01 '21 at 13:07
-
-
See here, bottom of p. 64. http://pzs.dstu.dp.ua/DataMining/mls/bibl/Gauss2Kalman.pdf – BigBendRegion Jun 01 '21 at 14:11
-
@BigBendRegion Thank you! Gauss wasn't wrong here. This touches on a different issue. Gauss's argument for the normality of errors was an assumption is his astronomical predictions paper, in that it was the result of possibly hundreds of *unobserved* variables contributing to a (seemingly) random error. Gauss didn't believe in intrinsic randomness (nor did Einstein, hence the schism between mechanical and quantum physics). OP's question seems to be an eggcorn of this; where non-normal (observed) residuals are somehow remedied by big data. – AdamO Jun 01 '21 at 15:05
-
Here is another excellent answer from the past: https://stats.stackexchange.com/questions/29731/regression-when-the-ols-residuals-are-not-normally-distributed – Ariel Jun 01 '21 at 15:50
-
@AdamO I agree that Gauss was not wrong as the OP suggests, in that a larger sample size of the existing data set somehow justifies normality. On the other hand, and maybe this is too narrow a point, but if Gauss really argued that the density was precisely, mathematically normal, then he indeed was wrong, even given the astronomical context. Yes, there were caveats about extremes, but I did not see any such caveats about the rest of the distribution. Maybe it is in the original paper. All that I would need would be the word "approximation" as regards the central portion. – BigBendRegion Jun 01 '21 at 17:22