1

I'm an amateur statistician. I do it mostly for fun. I'm finishing up on my first semester stats class that covers up to hypothesis testing. I'm planning on beginning learning simple regression testing by myself so I can be prepared for the next level class when I have a free slot in my schedule again.

Anyway, I have some data I would like to test to see if it's normal. Or, at least I can assume it to be normal with reasonable certainty. I have access to tons of samples but I cannot see the whole population. I understand that I can take a large enough sample from any type of population and treat the sampling distribution as normal by the central limit theorem but I'm actually trying to gain insight into the population itself.

Reading around here I've realized that I need to do some hypothesis testing. From what I understand I need to set up the null hypothesis to be that the population is not normal and then try to disprove it.

However, I don't know where to go from there. Can anyone suggest any reading I can do before I undertake this? Steps I should take?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
user35261
  • 11
  • 1
  • 1
    You might be interested to look at this - http://stats.stackexchange.com/questions/16611/why-would-all-the-tests-for-normality-reject-the-null-hypothesis At the very least, you would know the names of some of the tests used for normality - Anderson Darling test, Cramer von Mises test, Kolmogorov Smirnov test, Pearson Chi Square test and Shapiro Francia test. – TenaliRaman Nov 24 '13 at 21:41
  • 2
    There is also, the [Shapiro-Wilk](http://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test) test – Andre Silva Nov 24 '13 at 21:47
  • 3
    The rap that goodness-of-fit tests have on the street is that when you don't have much data, they have very poor power to reject anything (as non-normal, for instance). However, when you have lots of data, they reject almost every hypothesis, because of course your data's distribution won't be _exactly_ normal. – Ben Ogorek Nov 25 '13 at 00:20
  • @baogorek could you elaborate a little more on that? – user35261 Nov 25 '13 at 03:36
  • I was parroting things I heard in grad school, and perhaps it's not really fair to these goodness-of-fit tests for normality. You could say the same of regression coefficients - of course they're not _exactly_ zero and with enough data the hypotheses will be rejected. If you're looking for gaining insight, then you should explore these techniques for sure, just don't be surprised if the hypothesis of Normality gets rejected. It looks like Shapiro-Wilk is pretty easy to implement in R: http://stat.ethz.ch/R-manual/R-patched/library/stats/html/shapiro.test.html – Ben Ogorek Nov 25 '13 at 06:04
  • The sample mean asymptotically has a normal sampling distribution, not the distribution of samples itself. So, you can't just assume a normal dist. of samples. But why do you think that you "need to do some hypothesis testing"? Whether some test rejects normality of **residuals** (not the marginal distribution in any case) is no good reason to decide about e.g. what regression model to fit. For small samples normality tests do not have power even for huge deviations, for large samples they will reject even if the deviation is irrelevant for fitting regression models. – Björn Aug 22 '18 at 07:52
  • What does it mean that the data is "normal"? Do you mean that it is distributed according to a normal distribution? – HelloGoodbye Aug 22 '18 at 08:07

0 Answers0