2

I am trying to do a one way ANOVA and having some difficulty proceeding it. Among all the assumptions, I am stuck with these two: normality and equal variance. My questions are,

  1. My independent variable has 4 categories. The normality assumption is satisfied for two of the 4 categories. I read that ANOVA is quite robust so a small violation of normality is not a big deal. How can I decide that the violation is acceptable?

  2. If homogeneity of variances is violated, it is suggested to do a Welch's F test. I assume I can only do that if the normality assumption is satisfied. Is that correct?

Looking forward to any suggestions! .

Glen_b
  • 257,508
  • 32
  • 553
  • 939
curiousmind
  • 551
  • 1
  • 5
  • 12
  • 3
    Since ANOVA is regression, some of these questions have answers at http://stats.stackexchange.com/questions/32600. It's unclear what a "significant outlier assumption" might be--one hopes there are *no* outliers and that check is a big part of normality testing in the first place. Might I suggest you therefore focus on the third and fourth questions alone? – whuber Mar 24 '17 at 21:04
  • 2
    You should probably break this into separate questions / threads. You should also probably search the site, a lot of this may be available already. Lastly, very few people will know what SPSS does or why. That question (2) you should ask the SPSS tech support, not us. – gung - Reinstate Monica Mar 24 '17 at 21:05
  • @gung I searched. Specifically, I would like to know how far the violation of normality is acceptable. Doing some normality test like Kolmogorov-Smirnov or Shapiro-Wilk test produce non-normality (based on p-value) for some categories but a rough look at the histogram seems okay. – curiousmind Mar 24 '17 at 21:15
  • If your histogram looks ok then it probably is. If your sample size is large then statistical tests will *always* produce a significant value. If it looks normal to you then I wouldn't worry – Conor Neilson Mar 25 '17 at 03:50
  • @ConorNeilson thanks. One thing just crossed over my head. If my sample size is over 30 in each group, can I assume normality? – curiousmind Mar 27 '17 at 14:45
  • If the non-normality is due to skew then your test will be conservative and you don't have to worry. Tests of normality either tell you what you already know (the population isn't exactly normally distributed) or tell you nothing. – David Lane Mar 27 '17 at 21:34

2 Answers2

3
  1. I would suggest that you run a test for normality in each category. Shapiro-Wilks and Kolmogorov–Smirnov are the two main ones and a good rule of thumb is that if you have less than 50 observations do the Shapiro-Wilks, otherwise the Kolmogorov–Smirnov. Kolmogorov–Smirnov is more conservative - it doesn't reject the normality hypothesis as easy as the Shapiro-Wilks.

  2. If normality assumption holds then you run Welch's F test and if everything is fine you can proceed with the ANOVA. If the normality assumption is violated you'll have to do a non parametric test and without any assumptions for the underlying distribution of the data (eg: Kruskal–Wallis)

Vasilis Vasileiou
  • 1,158
  • 1
  • 9
  • 16
  • Thanks. As I mentioned, the normality assumption is not satisfied for all of the categories, say for 2 out of 4 categories, the assumption is satisfied. Do I still go for the non-normal test? Or, can I assume normality if my sample size is over 30 in each group? – curiousmind Mar 27 '17 at 14:43
  • Nice question. I personally am not a big fan of this oversimplified "30 per group" rule of thumb that people tend to use because there are other factors that need to be checked before specifying this minimum size. The safe way is to do the non-normal test as no one can argue with your choice as soon as the assumption of normality is violated. That being said, you can definitely try an ANOVA afterwards, if you have at least 30 people in each group, and comment on the similarities of the results between these two approaches. My guess is that they will be similar for a small enough signif. level – Vasilis Vasileiou Mar 27 '17 at 15:07
  • A little bit off, but addresses the same issue. I have a paired sample as well. The sample size is 300 and the histogram of the difference looks quite bell-shaped. However, when I run the Kolmogorov-Smirnov test, it's significant indicating the normality assumption is rejected. I am highly confused about doing a non-parametric test in this case or stick to the paired-t test because of my large sample size and histogram. Any thoughts? – curiousmind Mar 27 '17 at 19:50
  • What you can do is you can check for outliers that may not be "visible" from the histogram but influence the KS test reject the normal distribution hypothesis. Furthermore I would suggest that go ahead and do the paired t-test. Remember that H0 is that the mean is zero and H1 otherwise and the fact that you have "enough" observations makes the mean normally distributed because of the Central limit theorem. In a sense with "enough" data there is no need to look for the underlying distribution because the central limit theorem states that the mean will be normally distributed anyway. – Vasilis Vasileiou Mar 27 '17 at 21:51
  • Thank you. I was thinking the same. I addition, I can do a sensitivity by doing the non-parametric test in addition. I figured they both produces the same conclusion. – curiousmind Mar 28 '17 at 19:26
  • Could you please tell me adopting this approach will make sense? In my one way ANOVA, I have 4 groups, 3 with more than 100 sample size, 1 with 28. Can I assume normality by CLT for the 3 groups, and test for normality for the one with 28 observations? If normality for the smallest group satisfies along with homogeneity of variances for all groups, can I run an ANOVA? I can mention I also have done sensitivity analysis by running a Kruskal-Wallis test (I have done that, and both produces similar results). I would prefer to stick to anova for easy interpretability if possible. – curiousmind Mar 28 '17 at 19:56
0

So far I know Kruskal–Wallis test is not applicable if the distribution have different shape ( one is positively skewed skewed and one id negatively skewed).