7

I stumbled on this really nice blog.

http://www.statisticssolutions.com/assumptions-of-linear-regression/

It has mentioned- "the linear regression analysis requires all variables to be multivariate normal".

I think the assumption of normal distribution is for the residuals. I understand that skewed data can distort significance tests and it is desirable to have normally distributed data. But can we say "normal distribution of variables" is one of the assumptions of linear regression?

I am confused with this. Could somebody please shed light on this?

gruangly
  • 221
  • 2
  • 7
  • 7
    If ordinary multiple linear regression is being discussed, the assertion is simply not true. The IVs are treated as fixed, not random. You'd have to ask the person saying it why they'd make such a claim. One need only do the derivations (which are not difficult) to understand what's going on. The error term is only assumed normal when doing the usual normal theory hypothesis tests and intervals. – Glen_b May 01 '15 at 15:15
  • 4
    You are much more nearly right than the source you cite. At most, it's helpful if error terms are normally distributed. The rest is unsound. Here's one simple refutation: it would not be sound to use binary predictors if that were so, but every non-trivial account of regression explains how that's a useful technique. – Nick Cox May 01 '15 at 15:16
  • 2
    Your second quotation "the normal distribution or any distribution" seems mangled: please check it and edit. – Nick Cox May 01 '15 at 15:18
  • `the assumption of normal distribution is for the residuals` More strictly speaking, "is for errors" (there is a [subtle difference](http://en.wikipedia.org/wiki/Errors_and_residuals_in_statistics)) or "fully conditioned Y is normal". – ttnphns May 01 '15 at 16:05
  • @ttnphns, you don't need normality of residuals either. It's good to have it for finite samples, of course. – Aksakal May 01 '15 at 16:36
  • See also [here](http://stats.stackexchange.com/questions/94337), [here](http://stats.stackexchange.com/questions/86835), & [here](http://stats.stackexchange.com/questions/50206). – Scortchi - Reinstate Monica May 01 '15 at 17:11

1 Answers1

3

As a general assertation this is just plain wrong, I agree complexly with @Glen_b. For a review of the classical linear model assumptions see this

But essentially normality of the error term ensures, that the distribution of the $\hat{\beta}_k$ is exactly normal. Instead of just being approximated by the t distribution. Note the comment below, about the distribution when the standard error is estimated.

I have no idea why the webpage, would say, what is says. To say why normality is not needed, you could just derive the estimator your self (or look it up online). OLS is (must typically) derived by methods of moments - normality enters no where in those equations.

Perhaps it is a misunderstanding of the maximum likehood method, at least the OLS MLE assumes normality. However, it is not needed since the estimates are entirely robust to this assumption (quasi-MLE).

Repmat
  • 3,182
  • 1
  • 15
  • 32
  • 1
    (+1) Note that if the error variance is estimated from the data (the typical case), the distribution of the standardized coefficient estimates follows a t-distribution (with degrees of freedom equal to the sample size minus the no. predictor terms) **exactly** when the error term is normal. See [here](http://stats.stackexchange.com/questions/117406). – Scortchi - Reinstate Monica May 01 '15 at 17:24