Do I need to check for normality in a one-way test (in R)?

Question

http://www.sthda.com/english/wiki/one-way-anova-test-in-r

Here it says when the LeveneTest has small p-value we can use the alternative one

$oneway.test(...,var.equal=FALSE)$

but it doesn't mention how to check or if I need to check for the normality of the residuals.

Follow up question, regardless of the LeveneTest result (i.e. if the LeveneTest big p-value, and I use ANOVA (lm function) for example), can I use the CLT to assume normality of the residuals if n>>30?

A quick google search says that Welch's one way ANOVA is robust to normality violation, so I'd say you'll be fine. — LAP, Jan 25 '18 at 14:31
Possible duplicate of [Is normality testing 'essentially useless'?](https://stats.stackexchange.com/questions/2492/is-normality-testing-essentially-useless) — Kodiologist, Jan 25 '18 at 20:44
Not necessarily definitive, but the first line in the [documentation for oneway.test](https://www.rdocumentation.org/packages/stats/versions/3.4.3/topics/oneway.test) is "Test whether two or more samples from normal distributions have the same means." — Sal Mangiafico, Jan 26 '18 at 15:16
As to your followup question, see the answer by Frank Harrell [here](https://stats.stackexchange.com/questions/195452/role-of-central-limit-theorem-in-one-way-anova). — Sal Mangiafico, Jan 26 '18 at 15:23
@LAP, I think with such a broad statement, you'll have to bring some quality citations. I'm rather suspicious of the conclusion. — Sal Mangiafico, Jan 26 '18 at 16:18
@LAP, the issue may be less about the sources per se than the overly-broad statement. I suspect Welch's anova is somewhat robust to deviations from normality, but the question is "How much?" Unfortunately, most sources aren't particularly helpful on this matter, especially for students and young analysts. — Sal Mangiafico, Jan 27 '18 at 13:51

kjetil b halvorsen · Answer 1 · 2020-09-05T16:09:18.020

There is a very similar question about t-test or nonparametric with very good answers, most of which can be applied to ANOVA. See also this very relevant post: Role of central limit theorem in ANOVA (with an answer by Frank Harrell).

But the practice you are alluding to, choose which test to apply after seeing the results of some preliminary test of normality, is strongly advised against. If you are not reasonably sure about the normality assumption, choose some test which do not depend on it, at the outset! In R that could be kruskal.test.

That practice is called a multi-step procedure, and does not in general have the usual properties. So you cannot longer trust computed p-values are correct. You could of course still do normality tests or qq-plots, but to learn for the future (probably you will see some similar problems in the future).

You could look at the R package (on CRAN) for package WRS which have modern nonparametric methods. See this expository article by Rand Wilcox: New statistical methods would let researchers deal with data in better, more robust ways.

Your follow-up question: If sample size is large in all the groups, then you can use the central-limit theorem to justify normal-based inference, but not to justify normality of the residuals. The CLT is about means, not about individual random variables. But then analysis will be based only on large-sample approximations, which only can guarantee approximately correct significance levels. Alternative analysis could well give much more powerful tests, so if you are not (from theory/experience) reasonably sure about the normal assumption, better to plan to use tests that do not depend on it.

Do I need to check for normality in a one-way test (in R)?

1 Answers1