If you intend to run some analysis like an ANOVA or linear regression that assumes normality, how do you determine if a given method for checking normality is appropriate? What kinds of issues should it check for? What tolerance should the normality test have?
I often have cases with >10k values, and I know that Shapiro-Wilk rarely works for such large datasets. There are other alternatives, but what criteria should be used to evaluate the other options? What I don't want to do is try a bunch and pick the one that best supports my hypothesis (that the data is normal).
From my understanding, the main reason for normality testing is that deviating too far from a normal distribution can bias some analyses to inflate/deflate the alpha. Is that correct?
If so, could a bootstrapped approach be a general (albeit slow) alternative to normality testing?
- Scale the residuals from each condition to mean=0, sd=1.
- Sample ~100 values from those scaled residuals, and run a t-test on the sampled values
- Repeat the previous step ~10k times
- Calculate the proportion of repetitions with p-value < alpha.
- The difference between that proportion and alpha is indicative of the false positive rate due to the residuals' distribution. So if alpha is 0.05 and the proportion of p-values<alpha is 0.0506, you conclude that the distribution is unlikely to impact the false positive rate.
As much as I would like someone to just say, use [method X]. I'd rather get a general sense to make the decision myself.
Edit: This post suggested by @nick-cox below generally answers the question