Normality test for large samples

Question

So I working on a programming assignment that uses multiple algorithms to solve the floodit game. I have taken some of my data that I have collected thus far. I did a shapiro test:

   shapiro.test(x[1:5000])

   Shapiro-Wilk normality test

   data:  x[1:5000]
   W = 0.9806, p-value < 2.2e-16

To my understanding, I must reject the null hypothesis, which means my data is not normal. I then used the hist function in R: Hist

Does this Hist show a normal distributed data? if so does that mean i reject my results from the shapiro test? and If so, why do reject the shapiro test?

This histogram doesn't look normal, especially given the size of the data set. — Aksakal, Apr 16 '15 at 18:31
Quoting another thread: "real data are likely never actually normal. The useful question is not "are my data normal" (no, they're not), but something more like "is the extent to which my data deviate from normality enough to affect my inference in ways I need to worry about?"." (http://stats.stackexchange.com/questions/126788/how-to-judge-if-5-point-likert-scale-data-are-normally-distributed) — snoram, Apr 16 '15 at 18:38
snoram: taking the P-Value into account, should a t-test be done, I cannot make a strong argument about any results gained. — user2079902, Apr 16 '15 at 18:54
Would it be best to use a man whitney u test? If i would like to make some inferences. Also. my real data set is 10,000, how does one go about statistically analyzing such a large data? — user2079902, Apr 16 '15 at 18:56
Perhaps the most constructive response one might make does not directly answer your question: *why do you care whether the data look exactly Normal*? Many, many people come to this site supposing that this matters, only to discover that it is nearly or totally irrelevant to their actual objectives. Perhaps you could tell us more about your problem? — whuber, Apr 16 '15 at 21:48
Well its an assignment I have where I have 5 different algorithms, that tries to solve a floodit board game (http://unixpapa.com/floodit/). It must be able to solve the board for different sizes. Now i must prove one of my algorithms gives the most optimal solution. I dont really care if it is exactly normal. I would like to do a t-test but I cannot since my data is no where near to normal. I just want to have enough evidence as to why I didnt use a t-test. — user2079902, Apr 16 '15 at 23:20
Testing normality is almost totally useless in this situation (it answers entirely the wrong question -- your data are counts, so obviously they're not normal ... but why does that matter?). Are the values plotted in that histogram the values for a single algorithm, or are they for all algorithms lumped together? If those are all runs for one single algorithm, your data look easily close enough to normal for the sample size you have, but the *terrible* choice of histogram bin-width makes it look much worse than it is. ... (ctd) — Glen_b, Apr 17 '15 at 01:56
(ctd)... These are discrete counts, so plot spikes, not blocks. If you must show a histogram, make the bin width exactly 1 (centered on the integers). If it's all algorithms combined you can't conclude anything whatever. One suggestion is to consider looking toward analyses geared more to count data (especially since you'd expect the variance to be somehow related to the mean), but the variation in mean isn't so large that there should be all that much problem with a straight ANOVA; a Welch-Satterthwaite adjustment should be more than sufficient. — Glen_b, Apr 17 '15 at 01:56
More briefly, the most important questions: (1) how many times did you run each algorithm? (2) Are you interested in/expecting a difference in means (e.g. "4 more moves on average") or something more like a ratio of means (e.g. "10% more moves on average", where the better algorithm will tend to win by more moves on the games that take more moves)? (I'd expect the second, but you know more about your problem than me) — Glen_b, Apr 17 '15 at 02:00
This data shown is for a single algorithm. My hypothesis, is that my "greedy to all" algorithm gives the most optimal solution, the optimal being the one with the least number of moves. This is only thing I am trying to show. It has been a long time does statistical testing so I am bit rusty at the moment. I ran the algorithm 10,000 times for small board sizes and 500 times for large board sizes. — user2079902, Apr 17 '15 at 19:39

score 2 · Answer 1 · answered Apr 16 '15 at 18:44

2

Since the sample size is large, statistical hypotheses tests have a large power (1 - probability of II type error), and hence any small difference between your distribution and the null distribution (Normal distribution) is meaningful and leads to the rejection of the null hypothesis.

Your data looks (approximately*) Normally distributed, but considering the large sample size you can trust Shapiro-Wilk test: your data are not Normally distributed.

*your histogram has only 7 bins and thus your data looks approximately Normally distributed, but maybe if you increase the number of bins you can see a larger departure from the Normal distribution. Moreover, you could show the QQ-plot (your data VS theoretical Normal) to highlight the departures of your data from the Normal distribution.

answered Apr 16 '15 at 18:44

stochazesthai

4,616
2
18
26

1

This is incorrect, please see gung's answer [here](http://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless). Shapiro-Wilk will almost always always reject on large sample sizes. – Chris C Apr 16 '15 at 18:51
Yes, and that's because real data are NEVER perfectly Normally distributed. – stochazesthai Apr 16 '15 at 18:52
The fact that on large samples you almost always reject the null hypothesis does not mean that the test is incorrect. – stochazesthai Apr 16 '15 at 18:52
1

I think you should understand what you read before giving (-1) randomly. – stochazesthai Apr 16 '15 at 18:56
1

You are correct in this, the test is doing exactly what it is expected to do. However, does the amount of deviation from normality matter at $n=5000%$? I would say no. I apologize for my strong wording in my previous comment. You are correct in saying that the data is not perfectly normal, but it doesn't mean that it can't satisfy the normality assumption for tests. – Chris C Apr 16 '15 at 18:56
I apologize for my fast reaction, if you make a trivial edit, I will reverse my downvote. – Chris C Apr 16 '15 at 18:59
Would it be best to use a man whitney u test? If i would like to make some inferences. Also. my real data set is 10,000, how does one go about statistically analyzing such a large data? – user2079902 Apr 16 '15 at 19:01
1

@stochazesthai, I would like to rephrase and say that your answer is not technically incorrect, but rather misleading. To a quick glance, such as I gave it, your answer implies that because the Shapiro-Wilk rejected, the data is not useful. I agree that it is non-normal, however, just because it was rejected by Shapiro-Wilk does not mean it is unusable for tests which demand normality. I did not mean to insult you; I wanted to link to a great answer that would educate OP in the danger of thinking this way. I will gladly reverse the downvote if you make a small edit; my vote is locked in. – Chris C Apr 16 '15 at 19:22

ID4 · Answer 2 · 2017-10-19T05:09:26.010

-5

When your sample-size is big enough (i.e. > 30) you can assume normality according to the Central Limit Theorem CLT. Andy Field, author of Discovering Statistics using R, has an easy video on this question: https://www.youtube.com/watch?v=ermii2fQWOo. I know this answer may be 2 years too late, but hopefully it helps.

A helpful document on the Central Limit Theorem is found here: http://www2.psychology.uiowa.edu/faculty/mordkoff/GradStats/part%201/I.07%20normal.pdf

Bottom line: if your sample size is greater than or equal to 30, you can assume normality.

edited Oct 19 '17 at 05:09

answered Oct 19 '17 at 04:46

ID4

1
4

I don't think the question really has to do with the CLT. The sample size may be large but the question is really asking about the Shapiro-Wilk test which rejects normality and the histogram doesn't look like a normal distribution to me either. – Michael R. Chernick Oct 19 '17 at 05:32
In OP's case a two-sample t-test is probably fine even at n1=n2=2, so the OP is probably safe. However, the general advice in the last line here doesn't relate to the CLT (which has *nothing whatever* to do with n=30); and is not a good rule for deciding when to use the t -- it's often too strict (as it probably is here) and is at other times much too optimistic, especially with the one-sample test. Finally, it's not simply the distribution under the null that must be considered when advising people to use the t-test -- it's also the impact on *power*. It's unclear why that's ignored here. – Glen_b Oct 19 '17 at 05:51
2

A number of posts on site discuss the claim in relation to n=30 issue in detail. It might be worth reviewing them. If that's Field's advice it would be interesting to hear his justification for it. – Glen_b Oct 19 '17 at 05:53
3

You cannot make the blanket assumption that 30 data points or larger will always be normally distributed. The CLT tells you that the MEAN of a sample from a distribution will be normally distributed even if that distribution itself is not normal. That doesn't imply that the sample itself is normally distributed. – Steven M. Mortimer May 09 '18 at 15:40

Normality test for large samples

2 Answers2