What are the consequences of using ANOVA instead of Kruskal-Wallis on non-normally distributed data?

Question

I am doing a high-school statistics assignment and am having trouble finding answer to this question. Anova and Kruskal-Wallis test are very confusing and I am lost trying to find and explanation to this question.

We welcome questions like this, but we treat them differently. Please add the `[self-study]` tag & read its [wiki](http://stats.stackexchange.com/tags/self-study/info). Then edit your Q to state what you understand thus far & where you're stuck. We'll provide hints to help you get unstuck. — gung - Reinstate Monica, Oct 29 '14 at 02:06
These links may be helpful: http://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless and http://stats.stackexchange.com/questions/121330/best-way-to-determine-normality-of-data — rnso, Oct 29 '14 at 11:46

Glen_b · Answer 1 · 2022-01-24T22:33:03.893

That's a pretty sophisticated question for a high school student to have to answer. (To respond properly would require dozens of pages of investigation of the properties of ANOVA relative to Kruskal-Wallis under different situations.)

The first thing to keep in mind is that when you use a hypothesis test, there are 4 possible situations

                  H0 true                   H0 false

reject H0   :    Type I error, alpha       power, 1-beta
don't reject:    1-alpha                   Type II error, beta

The main issues to consider, then, is the degree to which the Type I and Type II error rates are affected. Of course the type II error rate depends on the particular situation you're in under the alternative. Those two considerations are not entirely separate, since changing $\alpha$ up or down will also change $1-\beta$.

What exactly, will happen to them will depend on the particular distribution, the sample size, and the alternative under consideration.

To begin with, it makes sense to consider the simplest case of two groups (which is effectively a t-test).

So you might first consider what happens to the significance level with continuous, symmetric non-normal distributions. One tool to investigate the effect on significance level is simulation.

For example, if you look at the rejection rate when H0 is true for a uniform (light-tailed) or $t_3$ (fairly heavy-tailed) distribution, there's only a very modest effect on the significance level (the actual significance level is slightly higher than the nominal level for the uniform and slightly lower for the $t_3$). We might consider, say n=10, 30 and 100.

If we look at power, the t-test tends to have better power than the Kruskal-Wallis for the uniform, and worse power than the Kruskal-Wallis for the $t_3$.

Similar results occur with a variety of lighter-tailed-than-normal and heavier-tailed-than-normal symmetric distributions. Similar results can be found with more groups, but ANOVA tends to perform relatively better.

So if distributions are symmetric, the main issue is that if tails are heavy you will lose some power, but broadly speaking, ANOVA does quite well.

Similar investigations can be done with skewed distributions; significance levels for ANOVA tends to be more affected by skewed distributions, and in many cases Kruskal-Wallis might be preferred. However, if the distributions are both heavily discrete (only taking a few different values) as well as skew, the differences are usually less clear.

As you can see, there's not a short, clear-cut answer. However, I'd be surprised if this is the sort of answer they're looking for -- I expect whoever wrote the question expects a short, clear-cut answer. If that's the case, it's likely to be an oversimplification of the actual situation.

What are the consequences of using ANOVA instead of Kruskal-Wallis on non-normally distributed data?

1 Answers1