1

If you intend to run some analysis like an ANOVA or linear regression that assumes normality, how do you determine if a given method for checking normality is appropriate? What kinds of issues should it check for? What tolerance should the normality test have?

I often have cases with >10k values, and I know that Shapiro-Wilk rarely works for such large datasets. There are other alternatives, but what criteria should be used to evaluate the other options? What I don't want to do is try a bunch and pick the one that best supports my hypothesis (that the data is normal).

From my understanding, the main reason for normality testing is that deviating too far from a normal distribution can bias some analyses to inflate/deflate the alpha. Is that correct?

If so, could a bootstrapped approach be a general (albeit slow) alternative to normality testing?

  1. Scale the residuals from each condition to mean=0, sd=1.
  2. Sample ~100 values from those scaled residuals, and run a t-test on the sampled values
  3. Repeat the previous step ~10k times
  4. Calculate the proportion of repetitions with p-value < alpha.
  5. The difference between that proportion and alpha is indicative of the false positive rate due to the residuals' distribution. So if alpha is 0.05 and the proportion of p-values<alpha is 0.0506, you conclude that the distribution is unlikely to impact the false positive rate.

As much as I would like someone to just say, use [method X]. I'd rather get a general sense to make the decision myself.

Edit: This post suggested by @nick-cox below generally answers the question

sharoz
  • 193
  • 11
  • 1
    Don't test the assumption. Instead, visualize the residuals of the model and look at qq plots. You can determine if the residuals are sufficiently normal from looking at the plot. Its more of an art than an exact science. – Demetri Pananos Mar 16 '21 at 14:09
  • I'm trying to avoid that. What is "normal enough"? Is what looks normal enough to me the same as what looks normal enough to you? What happens if the two people involved are author and reviewer? Who does the editor conclude has the proper interpretation of a QQ-plot. In the end, you often get a binary decision, like "Use an ANOVA or go non-parametric". So this is case where I'd rather have a binary outcome. – sharoz Mar 16 '21 at 14:12
  • But more generally, I'd like to have an interpretation of normality relative to the outcome of an analysis. I don't see how to translate degree of deviation in QQ plot to the appropriateness of an analysis that assumes normality, – sharoz Mar 16 '21 at 14:14
  • 2
    "I often have cases with >10k values, and I know that Shapiro-Wilk rarely works for such large datasets." You've answered your own question there. You're aware that tests often answer the wrong question, because necessarily they can't answer the question "Is deviation from normality important enough for my purposes to imply doing something different?". (Presumably "rarely works" here means is over-sensitive for large enough samples; if you mean something else, please spell it out.) – Nick Cox Mar 16 '21 at 14:16
  • 1
    To measure deviation from normality use skewness and kurtosis, or (better) L-moments ratios. – Nick Cox Mar 16 '21 at 14:18
  • @NickCox Yes!!! Thank you. Here's the bit I found most helpful: "a test for normality is directed against a class of alternatives if it is sensitive to alternatives from that class, but not sensitive to alternatives from other classes. Typical examples are tests that are directed towards skew or kurtotic alternatives. The simplest examples use the sample skewness and kurtosis as test statistics. Directed tests of normality are arguably often preferable to omnibus tests (such as the Shapiro-Wilk and Jarque-Bera tests) since it is common that only some types of non-normality are of concern" – sharoz Mar 16 '21 at 14:28
  • 1
    @sharoz The fact that your decision about ANOVA vs nonparametric is why Harrell suggests going straight to the nonparametric method: https://stats.stackexchange.com/a/122629/247274, https://twitter.com/f2harrell/status/1033884919884328960. – Dave Mar 16 '21 at 14:41
  • @Dave Here is a recent counterargument I read to that approach: https://www.biorxiv.org/content/10.1101/498931v2.full – sharoz Mar 16 '21 at 14:45

0 Answers0