Do these Q-Q graphs show that the data is approximately normally distributed?

Question

The ends of these graphs confuse me. I know most of the values fall on or near the line. But I am unsure of whether the data is indeed approximately normal. These are the two graphs.

Plot 1:

enter image description here

Plot 2:

enter image description here

You have cropped the graphs. We can't even tell whether the theoretical quantiles are on the horizontal or vertical axis (different software make different choices). Can you paste in better figures? — gung - Reinstate Monica, Jul 29 '15 at 22:47
Sorry about that, I have edited the question to specific which axis relates to what — Chris, Jul 29 '15 at 22:51
That's somewhat helpful, but we still can't see the numbers. Can you just paste in the complete (uncropped) figures? — gung - Reinstate Monica, Jul 29 '15 at 22:52
For what purpose are you assessing "approximately normal"? In some situations how close you need to be can be quite different from others (some situations have a lot of sensitivity to normality, and in some other cases it may not matter much if at all). — Glen_b, Jul 29 '15 at 23:51

gung - Reinstate Monica · Answer 1 · 2018-05-24T19:01:30.543

4

It's hard to say too much one way or the other from those plots. They certainly don't seem to deviate too wildly from the expected normal distribution shape. (Of course, they don't match perfectly either.) You may be OK with assuming normality, many tests are pretty robust to violations of the assumption of normality anyway.

On the other hand, you really are best off using methods that don't require these assumptions in the first place instead of checking and then choosing a test afterwards. (For more on that, it may help to read this excellent CV thread: How to choose between t-test or non-parametric test e.g. Wilcoxon in small samples.)

edited May 24 '18 at 19:01

answered Jul 29 '15 at 23:08

gung - Reinstate Monica

132,789
81
357
650

1

Actually, one can obtain fairly precise p-values from these graphs by looking for the greatest horizontal deviations from the line. *E.g.*, in plot 1 it occurs for the point near $(0.9, 4.4)$, which ought to be near $(1.9, 4.4)$. The values $0.9$ and $1.9$ are the 82nd and 97th percentiles of the standard Normal distribution, a difference of $0.15$. For $n=20$ (the number of points in that plot), Lilliefors' original 1967 JASA article indicates the p-value would be slightly greater than $0.20$. Although this is not quite valid for residuals (which are correlated), it's a good approximation. – whuber Jul 30 '15 at 13:49

score 0 · Answer 2 · answered Jul 29 '15 at 22:58

0

Use a Shapiro-Wilk test in R to test for normality. Null Hypothesis is Data is Normal.

answered Jul 29 '15 at 22:58

Hidden Markov Model

938
1
8
16

4

There are many tests for normality (see: [here](http://stats.stackexchange.com/a/62320/) & [here](http://stats.stackexchange.com/a/1723/)). You might also find [this](http://stats.stackexchange.com/q/2492/) interesting to read. – gung - Reinstate Monica Jul 29 '15 at 23:03
6

Rejection only tells you the non-normality is detectable with the test you use, not how much it matters, and failure to reject doesn't tell you that everything is fine (especially in small samples). In neither case does using the test really solve the problem at hand -- "are my data near enough to normal" for whatever purpose you're assessing normality for. – Glen_b Jul 29 '15 at 23:53
Good comment it got me thinking. You didn't indicate the importance of normality in your original post. Also, the question of "Is normality important?" is quite different from "Is my data normally distributed?". – Hidden Markov Model Jul 31 '15 at 18:24

Do these Q-Q graphs show that the data is approximately normally distributed?

2 Answers2