I have a mixed design that includes both repeated (condition) and between (sex and genotype) subjects factors. I would like to assess whether my data meets the normality assumptions for 1) General linear models (repeated) and 2) linear mixed models using SPSS. What is the best method for doing so? Assuming that I find violations to the normality assumptions how do I select the optimal method(s) for transformation?
2 Answers
I wouldn't do a test of normality (on the residuals). Make a quantile quantile plot of the residuals (qqnorm()
in R). Now, put that aside and make some more but this time of just random normal data with the same N as your residuals. How does your plot look compared to the simulations? If it's in the range of what you might expect to see it's normal. If not then you may have some concerns.

- 21,167
- 9
- 48
- 84
-
1+1, I would note that in SPSS (which the OP mentions they are using) one could get a QQ plot of an observed distribution vs a theoretical one in the `PPLOT` command. Also some quick googling it appears there exist techniques for [95% confidence intervals for QQplots](http://exploringdatablog.blogspot.com/2011/03/many-uses-of-q-q-plots.html). It may be easier to code up 19 simulations though and plot the min-max at each quantile as you suggest. – Andy W Jul 25 '12 at 02:00
-
interesting suggestion – John Jul 25 '12 at 02:53
Check the model residuals for normality. Shapiro-Wilk test is what I use but you need to be aware of the shortcomings of goodness of fit tests. This has been discussed many time on this site. Nonnormality may not be a problem if it is not too severe. You need to check for constant varinace of the residuals as well. If normality fails badly transformation like the Box Cox power tranformation is an option but there are other alternatives. Nonparametric rank tests that correspond to fixed effects ANOVA designs can be used. I am not sure what to do with the random effects in the mixed models or the repeated measures aspect. But there is a bootstrap approach that should work for any of these designs.

- 39,640
- 28
- 74
- 143
-
if I have _n_ [**dependent** groups](http://stats.stackexchange.com/q/11887/5003), do I have to check normality for _n_ batches of residuals or just 1 time for all residuals? – abc Dec 17 '12 at 12:23