2

I have data on weight of a group of people after and before a diet. I want to see if weight loss is distributed normally.

> weightloss <- dietA$Peso.inicial-dietA$Peso.final
 [1] 7.48 3.71 4.30 5.47 3.80 6.31 7.76 4.07 3.70 4.11 4.96 4.63 5.18 5.68 4.76 1.87
[17] 7.80 3.29 7.23 6.67 3.96 0.72 4.36 0.10 2.30 7.15 5.61 7.20 5.27 7.86 4.81 6.08
[33] 5.90 5.16 1.60 5.50 6.16 5.99 6.36 0.91

I ran the Shapiro-Wilk test using R:

> shapiro.test(weightloss)

    Shapiro-Wilk normality test

data:  weightloss
W = 0.95123, p-value = 0.08357

Now, if I assume the significance level at 0.05 then the p-value is larger than alpha (0.08357> 0.05) and I cannot reject the null hypothesis about the normal distribution, so can we accept the null hypothesis? I know that divergent views exist on this (see Interpretation of Shapiro-Wilk test, What is Hypothesis Testing? and When to use Fisher and Neyman-Pearson framework?), in addition the difference between values is short and with significance level at 0.1 we can reject the null hypothesis.

I try to see other way that gave me other element like QQPlot:

> qqnorm(weightloss)
> qqline(weightloss)

enter image description here

As you can see the first points are rightmost and at the end we have one point rightmost too, so maybe I can conclude not normality of the data(see How to interpret a QQ plot). In order to see other points I create a histogram enter image description here

So with this almost I can say that the sample hasn't a normal distribution or I can accept normality with uncertainty. I am not sure about this.

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Cyberguille
  • 519
  • 2
  • 9
  • 20
  • I don't see meaningfully "divergent views" in the pages you link. The stattrek page (What is Hypothesis Testing?) is poorly written, IMO, w/ a weirdly ambiguous message about accepting vs failing to reject the null hypothesis, but it doesn't actually advocate for the idea that you can accept the null. They only say that "Some researchers say... [you can, but] Many statisticians... take issue with [that idea]". FWIW, that is literally true: there are researchers who believe (mistakenly) that you can accept the null, but statisticians (correctly) point out that is wrong. – gung - Reinstate Monica Sep 12 '18 at 15:10
  • 1
    It may help to read my answer here: [Why do statisticians say a non-significant result means “you can't reject the null” as opposed to accepting the null hypothesis?](https://stats.stackexchange.com/a/85914/7290) – gung - Reinstate Monica Sep 12 '18 at 15:11

1 Answers1

6

Failure to reject doesn't imply you have normality. In fact you can be pretty certain you don't.

But even if your data could have been drawn from a normal distribution there's no way to be sure that it was, because there are non-normal alternatives that are sufficiently close to normal that you cannot distinguish from normal at some given sample size.

Failure to reject will be due to the fact that the sample size was too small to detect whatever non-normality you have (outside a few special situations).

[What would you need to do a formal test of normality for? It's rarely an answer to a useful question, at least for the purposes that it's typically used.]

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • I understand your point, but if you generate a normal distribution ( with length of sample of 1000) and you run shapiro.test the p-value for me is around of 0.6 like I waiting in this case, Can you say that this sample is normal or no? – Cyberguille May 24 '16 at 20:20
  • 1
    @cyb Failure to reject doesn't tell you the null is actually true. You can only say the sample is drawn from a normal distribution there because you know that's how you generated it. Imagine instead someone handed us a sample of size 1000, and the Shapiro-Wilk test gave p=0.6. The p-value being 0.6 doesn't tell us the sample was drawn from a normal distribution -- I can generate samples of much larger sizes than that from a non-normal distribution and still get high p-values (though the population may be pretty close to normally distributed as measured by the criterion of the test in question) – Glen_b May 24 '16 at 22:50
  • @Cyberguille Again, why are you testing normality? What problem does that address? – Glen_b May 24 '16 at 22:52
  • Really I needed take a decision between to use t-student or a non parametric test in two samples. I decided by a non parametric test (Mann-Whitney-Wilcoxon). But I see that when H0 is rejected can accept with some uncertainty the hypothesis H1 then why can not accept H0 with some level of uncertainty. You require a binary response for many problems of real life, you can't say -I can't say nothing. In fact I used other tools like qqnorm an histogram, but I don't liked so much and I was just looking for other ways to take this decision(normality in this particular case) – Cyberguille May 27 '16 at 20:13
  • A hypothesis test of normality is not a good way to choose between a t-test and a Wilcoxon--Mann-Whitney test. It answers the wrong question (you want to know how much it will matter if you use a t-test with non-normal data), and the SW test will tend to reject more when it matters less. If you're not confident - before you see data - that your distribution will be reasonably close to normal, then you should consider something that doesn't rely on normality. That *might* be W-MW or it might be something else (e.g. a permutation test of means for example, or any number of other possibilities). – Glen_b May 28 '16 at 01:12
  • But when I'm pretty sure about normality of my data? I think that almost never. The normality is almost possible in theoric scenary.Thanks a lot. I accept your answer now, because this conversation has been enlightening and pructive for me. – Cyberguille May 30 '16 at 16:16
  • 1
    If your original variables consist of sums or averages of many independent (or not strongly dependent) components, they're often close to normally distributed -- and many (but not all) normal-theory procedures are fairly tolerant to moderate non-normality (more so if you're mostly concerned about level robustness rather than power-robustness) – Glen_b May 30 '16 at 16:22