4

As the title explains I was wondering whether the additional OLS assumption of having a normally distributed error term isn't redundant if the sample is large enough. I understand that we want the conditional normality of the error term so that our estimator is normally distributed and further, so that one is able to conduct standard inference. However, as we replace the expectations by sample averages (in the estimation process), shouldn't a central limit theorem ensure the normality regardless of how the error terms are distributed as long as the sample is large enough?

Thanks in advance

J3lackkyy
  • 535
  • 1
  • 9

2 Answers2

1

Normality is actually quite important. Not in the sense that it must be true, because it never is true, but in the sense that with gross non-normality you should not use OLS, despite asymptotically correct inferences. For example, with grossly outlier-prone processes (substitute "rare, extreme value" for "outlier" to disentangle it from "incorrectly entered data value"), OLS is grossly inefficient compared to likelihood-based methods that model the conditional distributions more accurately. For another example, with highly discrete distributions, Poisson, multinomial logit, ordinal regression, logistic regression, etc., are more appropriate.

When viewing regression as a model for the conditional distribution of $Y$ given $X=x$, which is essential for predicting individual $Y$ values, for scientific integrity of the model, as well as for efficiency of estimates, it is clearly important to try to model that distribution reasonably well. Normality provides a reasonable approximation in many cases, but one should always consider alternatives.

BigBendRegion
  • 4,593
  • 12
  • 22
  • I understand your concern with respect to the efficiency, however, this is not the point of my question. My point is that even though the true distribution might be fat-tailed (or anything else) OLS still provides consistent unbiased estimates and standard inference can be used if the sample is large enough. Further, with respect to the efficiency concern, isn't the Gauss-Markov-Theorem stating that OLS is BLUE (and therefore efficient) if the error term is serially uncorrelated with expected value 0 and homoscedastic? There isn't stated that it has to be normally distributed. – J3lackkyy Feb 19 '21 at 12:19
  • @J3lackkyy probably in the case of a generic distribution of the residuals the maximum likelyhood estimators will be non-linear – Thomas Feb 19 '21 at 14:51
  • How do you figure? Ofc if you still use maximum likelyhood assuming that the errors are normally when they aren't its a problem. But if you specify them correctly? Why shouldn't the estimators be linear if itis still a linear context? – J3lackkyy Feb 19 '21 at 14:57
  • If I have some time I can try to make a calculation explicetly. One can fix a distribution of the residuals, e.g. logormnal (in this case the residuals would be always positive but ok) and perform the calculation. The non-linearty I think would come from the fact that the log of the distribution is not quadratic any-longer and the set of equations to get maximum likelyhood is not any longer a linear system. – Thomas Feb 19 '21 at 15:00
  • Could you specify what you exactly mean by "linear" estimator? – J3lackkyy Feb 19 '21 at 15:06
  • Gauss-Markov is a complete swindle. It is nearly useless except as a technical comment. Who cares whether estimates are linear functions of the data? Back when they had to do hand-calculations, maybe, but nowadays? – BigBendRegion Feb 19 '21 at 15:07
  • "Linear" means the estimator has the form $a_1Y_1 + \dots + a_n Y_n$ where the $a$'s are constants. – BigBendRegion Feb 19 '21 at 15:08
  • @J3lackkyy what BigBendRegion wrote is also what I intended as linear esimator, that is also the meaning in the Gauss-Markov-Theorem if I am not mistaken – Thomas Feb 19 '21 at 15:09
  • @bigbendregion where do you see the problem with the efficiency then? – J3lackkyy Feb 19 '21 at 15:14
  • Simply that OLS is inefficient when the conditional distributions are grossly non-normal. This does not contradict Gauss-Markov, but Gauss-Markov applies only to an extremely limited pool of estimators; namely, those that are unbiased and linear functions of the data, This pool effectively rules out all the interesting and better estimators such as maximum likelihood. One way to characterize the Gauss-Markov Theorem is that OLS is a big fish in a tiny puddle. – BigBendRegion Feb 20 '21 at 12:19
0

You speak about linear regression I suppose. Linear regression can be justified under different set of assumptions, more or less general.

... shouldn't a central limit theorem ensure the normality regardless of how the error terms are distributed as long as the sample is large enough?

Asymptotically yes, even if some moment conditions are needed and independence of observations is needed as well.

Sometimes people talk about Normal linear regression, underscoring the fact that the data are normally distributed. Consider that small sample properties can be useful sometimes; in those cases normality assumption is needed. Moreover, sometimes the framework used, like ML, demand precise distributional assumptions.

However the idea to not impose normality and refers to the TLC is quite common (asymptotic theory).

markowitz
  • 3,964
  • 1
  • 13
  • 28