2

I am currently writing my thesis using multiple regression. However my data does not meet the regression assumption of normal distribution.

I want to describe that, due to the non normal distribution, the interpretation of the data is limited. I don't want to transform the data, i just want to say what the impact of the non normal distribution on my regression results (N =110) is.

If have been googling a lot, however only found the assumptions, but never really consequences when i still conduct my tests.

Does any of you know, what the consequences of non normal distribution for my regression results are? And also very important, which paper could be cited?

I would be very grateful for your hint!

Glen_b
  • 257,508
  • 32
  • 553
  • 939
Jay
  • 23
  • 1
  • 3
  • Does anybody actually have normal data? How non-normal are we talking? With 110 observations mild non-normality may be of little consequence. If you're constructing prediction intervals it might matter a lot more. What kinds of inference were you performing? – Glen_b May 14 '14 at 06:11

1 Answers1

2

Regression does not assume normal distribution of your data.

Regression assumes normal distribution of the errors about predicted values of $y$ (i.e. $y_{i} - \hat{y}$).

You're welcome.

Alexis
  • 26,219
  • 5
  • 78
  • 131
  • 3
    Good concise statement. Much more is easily accessible on this site e.g. http://stats.stackexchange.com/questions/16381/what-is-a-complete-list-of-the-usual-assumptions-for-linear-regression/16460#16460 – Nick Cox May 13 '14 at 18:09
  • Thanks Alexis - thats what I meant. So concluding from Nick's link, when my errors are not normally distributed, my beta values are wrong? Is that right? And why is this so? – Jay May 13 '14 at 19:24
  • Yes. See for example, Anscombe, F. J. (1973). Graphs in statistical analysis. *The American Statistician*, 27(1):17–21 for an example of what statistical outliers can do to bias regression estimates. However, the normality assumption may not be the worst assumption to violate, particularly for large data sets: Lumley, T. and Emerson, S. (2002). The Importance of the Normality As- sumption in Large Public Health Data Sets. *Annual Review of Public Health*, 23:151–69. – Alexis May 13 '14 at 19:33
  • From Lumley and Emerson: "_In small samples most statistical methods do require distributional assumptions_". Yes, it's different when you can rely on asymptotic normality, but... is Jay interested in small or large samples? Is N=110 a large sample? – Sergio May 13 '14 at 20:52
  • Hence, "particularly for large data sets." – Alexis May 13 '14 at 20:58