What are easy to interpret, goodness of fit measures for linear mixed effects models?

Question

I am currently using the R package lme4.

I am using a linear mixed effects models with random effects:

library(lme4)
mod1 <- lmer(r1 ~ (1 | site), data = sample_set) #Only random effects
mod2 <- lmer(r1 ~ p1 + (1 | site), data = sample_set) #One fixed effect + 
            # random effects
mod3 <- lmer(r1 ~ p1 + p2 + (1 | site), data = sample_set) #Two fixed effects + 
            # random effects

To compare models, I am using the anova function and looking at differences in AIC relative to the lowest AIC model:

anova(mod1, mod2, mod3)

The above is fine for comparing models.

However, I also need some simple way to interpret goodness of fit measures for each model. Does anyone have experience with such measures? I have done some research, and there are journal papers on R squared for the fixed effects of mixed effects models:

Cheng, J., Edwards, L. J., Maldonado-Molina, M. M., Komro, K. A., & Muller, K. E. (2010). Real longitudinal data analysis for real people: Building a good enough mixed model. Statistics in Medicine, 29(4), 504-520. doi: 10.1002/sim.3775
Edwards, L. J., Muller, K. E., Wolfinger, R. D., Qaqish, B. F., & Schabenberger, O. (2008). An R2 statistic for fixed effects in the linear mixed model. Statistics in Medicine, 27(29), 6137-6157. doi: 10.1002/sim.3429

It seems however, that there is some criticism surrounding the use of measures such as those proposed in the above papers.

Could someone please suggest a few easy to interpret, goodness of fit measures that could apply to my models?

I really like the question, but using likelihood ratio tests to determine whether or not fixed effects are needed is **not** the recommended strategy, see the [faq](http://glmm.wikidot.com/faq). So the above is **not** fine for comparing models. — Henrik, Sep 25 '12 at 16:39
Thanks Henrik. The FAQ you listed is very helpful. It sounds like Markov chain Monte Carlo sampling could be a good strategy to compare my models. — mjburns, Sep 26 '12 at 00:07
The problem with MCMC is that you can only have simple random effects (as in your example). I would go with kenward-rogers approximation to degrees of freedom as it also applies to more complicated models. Have a look at function `mixed()` in my [afex](http://cran.r-project.org/web/packages/afex/index.html) package ([the development version also has parametric bootstrap](https://r-forge.r-project.org/R/?group_id=1450)). See [here for some references](http://stats.stackexchange.com/q/26855/442). — Henrik, Sep 26 '12 at 10:21
OK Henrik. I managed to get your mixed() function working from the afex package. Could you please advise on how I could use afex to compare models? What measure(s) could I use to decide if one model is more plausible than another? Thanks. — mjburns, Sep 26 '12 at 11:34
This is not easily answered, perhaps you ask a separate question giving more details. But just briefly, afex tries to help you to assess whether certain effects (or better models including this effect) are significant. To this end it uses `KRmodcomp` from package `pbkrtest`. You can also use `KRmodcomp`directly to compare models. — Henrik, Sep 27 '12 at 13:48
Thanks Henrik. To you and others, I came across this potentially useful article: Liu, H., Zheng, Y., & Shen, J. (2008). Goodness-of-fit measures of R 2 for repeated measures mixed effect models. Journal of Applied Statistics, 35(10), 1081-1092. doi: 10.1080/02664760802124422. What are everyones views of potentially using these goodness of fit measures? — mjburns, Oct 03 '12 at 04:20
Do you think my answer [here](https://stats.stackexchange.com/a/439842/136579) is useful? If so, I could provide an answer similar to that. — statmerkur, Dec 22 '19 at 14:55

score 4 · Answer 1 · edited May 11 '20 at 00:00

4

There is nothing such as an easy to interpret goodness of fit measure for linear mixed models :)

Random effect fit (mod1) can be measured by ICC and ICC2 (the ratio between variance accounted by random effects and the residual variance). psychometric R package includes a function to extract them form a lme object.

It is possible to use R2 to assess fixed effect (mod2, mod3), but this can be tricky: When two models show a similar R2 it can be the case that one is more "accurate", but that is masked by its fixed factor "subtracting" a greater variance component to the random effect. On the other hand it is easy to interpret a greater R2 of the highest order model (eg mod3). In Baayen's chapter on mixed models there is a nice discussion about this. Also, it's tutorial is very clear.

A possible solution is to consider each variance component independently, and then use them to compare the models.

edited May 11 '20 at 00:00

Crimc

3
4

answered Oct 12 '12 at 00:03

ajeje

201
2
6

1

Can you tell us what's the reference you are referring to when you say Baayen's chapter? – KH Kim Feb 16 '16 at 18:04
yeah, the reference is broken! – Tomas Jul 11 '19 at 05:20
I have found this citation, not sure if that's it, but cannot get the PDF anywhere: *BaayenR. H., Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press, 2008. Pp. 368. ISBN-13: 978-0-521-70918-7. - Volume 37 Issue 2 - Grzegorz Krajewski, Danielle Matthews* – Tomas Jul 11 '19 at 05:26
2

Come on guys. Where is your google-foo? Do a search on "baayenCUPstats.pdf: first hit: http://www.sfs.uni-tuebingen.de/~hbaayen/publications/baayenCUPstats.pdf – DWin Nov 12 '19 at 21:00

What are easy to interpret, goodness of fit measures for linear mixed effects models?

1 Answers1

Linked