3

I often hear that when the residuals depart from normality, the central limit theorem can be used to fix things. I do not quite understand how this works, since the central limit theorem is a statement about scaled sums of random variables. How exactly is the CLT used to make the data normal?

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
user321627
  • 2,511
  • 3
  • 13
  • 49
  • 1
    You appear to have a common misconception about the central limit theorem: https://stats.stackexchange.com/questions/473455/debunking-wrong-clt-statement. – Dave Jun 01 '21 at 15:31

1 Answers1

7

The CLT does not make the data normal. For OLS the CLT is a result about the regression parameters. Indeed, they are expressed as a sum of random variables.

AdamO
  • 52,330
  • 5
  • 104
  • 209
  • I read something about the residuals being able to be thought of as the sum of independent errors, does that make sense? – user321627 Jun 01 '21 at 07:16
  • Not really, the residuals are $Y-\hat{Y}$, so you can think of them as a *sum*, but they don't tend to normal as $n \rightarrow \infty$. – AdamO Jun 01 '21 at 12:59
  • I read somewhere that Gauss (*the* Gauss) had an interesting (but incorrect) argument as to why regression residuals should be normal. Even the giants make mistakes. – BigBendRegion Jun 01 '21 at 13:07
  • @BigBendRegion better provide a source! – AdamO Jun 01 '21 at 13:33
  • See here, bottom of p. 64. http://pzs.dstu.dp.ua/DataMining/mls/bibl/Gauss2Kalman.pdf – BigBendRegion Jun 01 '21 at 14:11
  • @BigBendRegion Thank you! Gauss wasn't wrong here. This touches on a different issue. Gauss's argument for the normality of errors was an assumption is his astronomical predictions paper, in that it was the result of possibly hundreds of *unobserved* variables contributing to a (seemingly) random error. Gauss didn't believe in intrinsic randomness (nor did Einstein, hence the schism between mechanical and quantum physics). OP's question seems to be an eggcorn of this; where non-normal (observed) residuals are somehow remedied by big data. – AdamO Jun 01 '21 at 15:05
  • Here is another excellent answer from the past: https://stats.stackexchange.com/questions/29731/regression-when-the-ols-residuals-are-not-normally-distributed – Ariel Jun 01 '21 at 15:50
  • @AdamO I agree that Gauss was not wrong as the OP suggests, in that a larger sample size of the existing data set somehow justifies normality. On the other hand, and maybe this is too narrow a point, but if Gauss really argued that the density was precisely, mathematically normal, then he indeed was wrong, even given the astronomical context. Yes, there were caveats about extremes, but I did not see any such caveats about the rest of the distribution. Maybe it is in the original paper. All that I would need would be the word "approximation" as regards the central portion. – BigBendRegion Jun 01 '21 at 17:22