1

I once was a research assistant for a professor who wanted me to do some regressions, but before that he wanted me to test all the sample data for the variables to ensure they were normally distributed.

Today I was talking with a colleague (Mr. X) who considers himself an accomplished researcher, and he told me that he always tests sample data for normality (and insists that authors do that when he reviews papers).

Standing with us was an Econometrics professor who really is good at statistics, and she insisted that only the population needs to be normally distributed, the sample needn't be. I concurred.

Mr. X didn't say anything then, but later when he was alone with me, he told me he didn't agree with her. I challenged him on it, and first he gave me a word jumble with "type 1" and "type 2" "errors" thrown in. Further challenged, he told me that if the sample data were not normally distributed, it wouldn't be possible to generalize the results. (Is he correct?)

My feeling is that checking sample data for normality is really a proxy for some other test (of cleanliness of data or something). And these people have forgotten or lost the reason why they were taught to test the sample data for normality.

What do you think? Why do some people believe that sample data need to be tested for normality?

thanks_in_advance
  • 735
  • 1
  • 7
  • 14
  • 1
    The answer to this question may well be covered here already: http://stats.stackexchange.com/q/2492/16974 – James Stanley Jan 31 '14 at 02:25
  • @JamesStanley : Thank you for your comment. I went there, but didn't find it helpful. The discussion was far too technical for me, and was about whether normality testing is essentially useless... which is certainly related to my question, but is a superset of my question. – thanks_in_advance Jan 31 '14 at 02:44
  • 8
    This issue is addressed in **many** posts here. *Neither* the distribution of the independent variables (x's), nor the unconditional distribution of the the dependent variable (Y) is assumed normal for the usual normal-theory regression inference. It is only the *conditional* distribution of $Y$ that is assumed to be normal (equivalently the error distribution). If the model is correct, the marginal distribution of $Y$ (that is, considered by itself) depends on the pattern of the x's and might be almost anything. The only way to assess the assumption at all is *after* the model is fitted (ctd) – Glen_b Jan 31 '14 at 03:15
  • 1
    (ctd)... and even then, consensus here seems to be mostly (but not quite universally) opposed to formal testing of the hypothesis (again, plenty of posts here, though usually not the same ones, discuss this issue); in reasonably moderate-to-large samples, actual normality isn't even that important for inference (other assumptions have a bigger impact), while when it does matter (in small samples), you have little power to detect the violation anyway. – Glen_b Jan 31 '14 at 03:17
  • 1
    As a result, testing often only tells you it's not normal when it doesn't matter. Further, you can't check normality if the assumption about linearity of the relationships or homoskedasticity fail (since they will make normal errors look non-normal in aggregate, when it's other assumptions that failed). – Glen_b Jan 31 '14 at 03:24
  • @Glen_b : Wonderful! Thank you so much. Just wonderful. – thanks_in_advance Jan 31 '14 at 04:02
  • Here are six links that may help throw some light on various of the issues. Some include additional links: [link_1](http://stats.stackexchange.com/questions/32600/in-what-order-should-you-do-linear-regression-diagnostics) $\,$ [link_2](http://stats.stackexchange.com/questions/49870/non-normality-and-heterogeneity-in-ancova) $\,$ [link_3](http://stats.stackexchange.com/questions/76163/why-are-diagnostics-based-on-residuals) $\,$ [link_4](http://stats.stackexchange.com/questions/55113/where-do-the-assumptions-for-linear-regression-come-from) ...(ctd) – Glen_b Jan 31 '14 at 06:07
  • (ctd)... [link_5](http://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless) $\quad$ [link_6](http://stats.stackexchange.com/questions/58791/what-to-do-when-kolmogorov-smirnov-test-is-significant-for-residuals-of-parametr) – Glen_b Jan 31 '14 at 06:08

0 Answers0