Normal distribution necessary for linear-mixed effects? (R)

Question

This is my first post on this site. I'm a linguistics graduate student who is struggling to grasp the basics of statistics.

I've run a questionnaire in which participants had to rate sentences from 1 (totally unacceptable) to 7 (fully acceptable). I had two different factors with two levels each (a 2x2 design).

Following previous papers whose authors used the same design, I have log-transformed the ratings and then I have calculated z-scores by subject:

dat$rating.log <- log(dat$rating)
dat$z.score.rating2 <- ave(dat$rating.log, dat$subject, FUN=scale)

After that, I've considered ratings above and below 2.5 standard deviations from the mean as outliers and I've removed them (also following previous studies).

I report here the histogram for the cleaned data:

And these are the histograms per condition:

As you can see, the data is far from normal. My question is the following: does this matter if I want to conduct a linear-mixed effects model? If it does, how can I normalize the data?

Thank you very much!

1.) It's not important how your DV is distributed. Important is the distribution of the residuals. 2.) There are [better ways](http://www.ats.ucla.edu/stat/r/dae/mlogit.htm) of dealing with a categorical DV. 3.) I'm very skeptical of your outlier removal. — Roland, Jun 07 '16 at 15:19
Why do you consider these "outliers"? Why do you remove them? Often, "outliers" is a synonym for "interesting values". Sometimes it's better to focus on these values and not to discard them from the analysis. — Roland, Jun 07 '16 at 15:50
@Roland I'm aware of that but I'm just following standard procedure in my field. I've only removed 1.25% of the data and I've analyzed the data with and without outliers with no difference in the results. Thanks for your feedback! — serlosan, Jun 07 '16 at 15:59
If everyone propagates bad standards in their field, the standards will never improve. — Roland, Jun 07 '16 at 16:06

score 10 · Accepted Answer · edited Apr 13 '17 at 12:44

As per the comment by @Roland, there is no requirement for the response variable itself to be normally distributed in a linear mixed model (LMM). It is the distribution of the response, conditional on the random effects, that is assumed to be normally distributed. This means that the residuals should be normally distributed. Therefore, you can proceed with fitting an LMM and then check the residuals to see if they are normally distributed. Treating likert item responses as continuous data is a contentious topic - for example see here:

Parametric tests and Likert Scales (Ordinal data) - Two different views

This simulation study plays down the concerns. Clearly, with fewer levels in the likert scale there is going to be more of a problem. This presentation from one of the authors of the lme4 package for R seems to suggest that 10 or more levels is OK.

So with a 7 point scale, there is a good chance that the residuals will not be normally distributed, in which case you can look at fitting a generalised linear mixed model for ordinal data - two such packages which fit these models in R are ordinal and MCMCglmm

also the relatively new `brms` package for ordinal data ... and `mixor` ... — Ben Bolker, Jun 08 '16 at 00:16

score -5 · Answer 2 · answered Jun 07 '16 at 15:20

-5

If you use something like a generalized linear mixed model, then the response variables don't have to be gaussians. This fact is the key differentiator from GLMM and LMM.

answered Jun 07 '16 at 15:20

Tophat

91
1

There is no requirement for the DV to be normal distributed for LMMs. – Roland Jun 07 '16 at 15:51
3

what kind of a GLMM? (i.e. what "family" or assumed conditional distribution of the response ?) – Ben Bolker Jun 08 '16 at 00:17

Normal distribution necessary for linear-mixed effects? (R)

2 Answers2

Linked