1

I have a data with two variables - one variable is quantitative, and one is nominal (4 categories). I ran the normality tests and it turned out that this quantitative variable is not normally distributed. When I checked Levene's test it turned out that the variance is not equal either. Therefore instead of Anova I thought about Kruskal-Wallis test, but still the test assumption is that the variance between the groups is equal. Can I simply run Anova with Welch statistics additionaly and that would be fine? (in SPSS).

What more: I have near 3000 observation, so on charts I see that the data are normally distrubuted but are right-skewed and kurtosis is bigger (that's probably why tests showed not-normal distribution).

Thanks for any suggestion! Mary

Mary
  • 21
  • 3
  • 1
    Note that it suffices if the residuals are normally distributed. Furthermore, with that many observations, the tests have extreme power and can detect already minor deviations from normality that may not necessarily be a problem for the ANOVA. – hplieninger May 11 '18 at 13:18
  • @hplieninger Hi! Thanks for your reply. That's true, I also read that it's possible to run Anova even if the quantitative variable is not normally distributed. But what about the equality of variance? If the Levene's test shows p < 0.001, how we can perform Anova (Analysis of variance) or the non-parametric test like Kruskal-Wallis? Is it OK then to run Anova but with this additional Welch option? – Mary May 11 '18 at 13:30
  • You will find a lot of information about that on Cross Validated, you may transform your DV or you may try robust methods or you may ignore it if the differences in variances are small despite significant. – hplieninger May 11 '18 at 14:00
  • Have a look at [this question](https://stats.stackexchange.com/questions/56971/alternative-to-one-way-anova-unequal-variance) wrt the heterogeneity of variance problem. – Joel May 11 '18 at 15:15
  • Once I was told that if I want to apply parametric statistics then a population I sampled from has to be close to normal distribution. Not my sample, but the population. To estimate the closeness residuals are used. Sample residuals are estimates of population errors [Wiki](https://en.wikipedia.org/wiki/Errors_and_residuals). There was a software RundomBC that allowed to transform sample with Box-Cox algorithm and control heterogeneity of variance simultaneously. Probably there are R-based tools available now. – abc May 12 '18 at 08:55
  • @Joel Thanks a lot for this link. It's certainly worth reading. – Mary May 14 '18 at 09:47
  • 1
    Just to emphasise what @hplieninger has already stated... your measured quantitative data need not be normal... the residuals of your model need to be normal. In fact, most data won't be normal if your conditions actually have some effect. For example, height is a very normally distributed variable, but if measured across a whole population might seem bimodal (two peaks) because men and women differ. After accounting for gender, the residuals will be closer to normal. – Mensen Jun 07 '18 at 15:20
  • For it's common null hypothesis, Kruskal-Wallis does not have an assumption of equal variances. – Sal Mangiafico Jul 18 '18 at 15:01
  • Why do you need a hypothesis test in the first place? Personally I'd probably be fine with an ANOVA if the result is clearly in line with what you see from visualising the data (like: groups have visibly clearly different means and you get $p=1e-6$ or something). It may be a hard sell but trying to squeeze something more sophisticated out of the data may ultimately not help more. – Christian Hennig Jun 10 '21 at 20:45
  • Major thing to understand about model assumptions: They are *never* true. What matters is not whether they are violated at all (they are always anyway), but rather whether they are violated so that it leads to misleading results. This can in your situation probably be diagnosed by graphical means. – Christian Hennig Jun 10 '21 at 20:46

0 Answers0