How to assess GLMM performance on a new data set?

Question

I built a generalized linear mixed-effects model (GLMM) using glmer function from the lme4 package in r to model species richness around aquaculture sites based on significant explanatory variables using Zuur et al. (2009) Mixed Effects Models and Extensions in Ecology with R. The dataset used to build the model has ~ 1000 samples and my best model is:

    Mod1 <- glmer(Richness ~ Distance + Depth + Substrate + 
                   Beggiatoa + Distance*Beggiatoa + 
                   (1|Site/transect), family = poisson, 
                   data = mydata)

Now I have a full data set collected at different sites and I want to assess how this model performs on the new data set.

I never assessed model performance on a new data set before and my first approach would be to compare R^2 values between data sets and p-values of each explanatory variables. However, I’m sure there are much better options. What technique would you use to assess model performance on a new data set?

Aside $R^2$ you could consider [RMSE](https://en.wikipedia.org/wiki/Root-mean-square_deviation) and [MAD](https://en.wikipedia.org/wiki/Average_absolute_deviation) as somewhat standard metrics to show. You should also look at relevant residual plots. You mention the use of `glmer` but you don't mention what kind of response variable `Richness` is. Can you please elaborate on this further? I think this will make your question easier to understand. — usεr11852, Aug 13 '16 at 00:20
@user11852 Thank you for your reply and sorry that I forgot to mention the family of the response variable. I've updated my model in my question, I'm using a Poisson distribution for the response `Richness`. I started to look into cross-validation using the `caret` package in `r`, is this something you would advice to pursue in addition to R^2, RMSE and MAD? — Mud Warrior, Aug 13 '16 at 00:45
Sorry for the delay I never saw your comment cause my username was misspelt. Thanks for the clarifications, please see my answer below. — usεr11852, Aug 13 '16 at 23:10

score 2 · Accepted Answer · edited Apr 13 '17 at 12:44

2

Using RMSE or even standard $R^2$ is somewhat unnatural for the case of a count response variable. Median absolute error/deviation (MAD) would be definitely more natural for integer values but would not directly reflect a "variance explained" quantity.

Given you are particularly interested in GLMMs (instead of GLMs) I think it will be appropriate to look at something like specific for GLMMs like using the r.squaredGLMM function implemented in R's MuMIn package. This is essentially the $R^2$ for GLMMs as this is described by Nakagawa & Schielzeth on their paper on A general and simple method for obtaining R² from generalized linear mixed-effects models. Because you are having a mixed model with fixed ($X$) and random ($Z$) covariates it makes sense to have a conditional $R^2$ as well as a marginal $R^2$. You could also report Nagelkerke's $R^2$ (eg. using fmsb::NagelkerkeR2), it is somewhat standard for GL(M)Ms too.

Having mentioned the above $R^2$ quantities, please note that it is debatable whether or not $R^2$ measurements are really relevant for Poisson regression (or GLMMs in general). Pseudo-, generalised- $R^2$ come in many variants; see for example a list here and an excellent discussion in this CV thread here.

My advice would be to use MAD as well as a specialised $R^2$ but do not focus much on the $R^2$. Reporting $p$-values in regards to a model's performance is a bit pointless....

Regarding your latest comment: Using the caret package and cross-validation in general is an excellent idea; you should do it. Notice though that cross-validation is best to select a model's coefficients and not to directly estimate a model's performance for unseen data. Furthermore, irrespective of the $k$ used for your $k$-fold cross-validation scheme, run at least 100 rounds/repeats of your $k$-fold procedure to ensure your results are stable. To estimate out-of-sample performance use hold-out data; a chunk of your data that you never touched during training. Report MAD, (specialized) $R^2$ on the model's performance on that data. See Zack's answer on the (awesome) thread here for more details about this.

edited Apr 13 '17 at 12:44

Community

1

answered Aug 13 '16 at 23:12

usεr11852

33,608
2
75
117

Wow, thank you very much @usεr11852, this is exactly what I was looking for! I'll need a couple days to do all your suggestions, but I can't wait to see the results on the new data set. – Mud Warrior Aug 14 '16 at 00:36
1

Cool, I am glad I could help! Good luck with the rest of your analysis! – usεr11852 Aug 14 '16 at 07:34
I tried `NagelkerkeR2` from `fmsb` and `mad` from `stats` package but I get an error message saying that they don't work for S4 objects. Is it possible that they are not implemented for GLMMs yet? – Mud Warrior Aug 16 '16 at 19:05
I am not sure. I am very busy this week. I am will try to look it up during the weekend but in general methodologically this what you want to check. Try to ensure that the all the packages in your R installation are the latest. Also check you don't have `NA` in your model, maybe these are problematic. If the issues remains, you should raise it in StackOverflow under the tag `R`. It will be a software rather than a Stats issue (if you open this question here it will most likely be closed). – usεr11852 Aug 16 '16 at 23:07
Thank you for your reply and your help. All packages are updated and I verified and there is no `NA` in the model. I'll continue to search for a solution and update here if I find the answer. – Mud Warrior Aug 17 '16 at 15:27
1

I am sorry to read that. If SO doesn't help, try the R-sig-ME mailing list. – usεr11852 Aug 17 '16 at 17:57
I asked how to calculate `MAD` for my GLMM on SO and received a great answer, see link at the end. I still have to figure if I would compare `MAD` from the model on the **original data set** (_within-sample error_) to `MAD` on the **new data set** (_out-of-sample error_) but I'm getting closer to the solution. http://stackoverflow.com/questions/39001178/how-can-i-compute-the-median-absolute-deviation-mad-for-generalized-linear-mix – Mud Warrior Aug 19 '16 at 14:22
1

@MudWarrior: Cool! Prof. Bolker's posts are always very instructive! (He is one of the authors of the current `lme4` package.) – usεr11852 Aug 20 '16 at 06:30
yes I was very happy to receive the excellent answer from Prof. Bolker! By any chance, would you have any reference(s) to suggest that used `mad` in a similar context (i.e. assessing model performance on a new data set)? – Mud Warrior Aug 22 '16 at 14:13
1

Hmmm.. Davies' & Gather's Chapt. 25 "*Robust statistics*" in the *Handbook of Computational Statistics* by Gentle et al.; Hulland's et al. Chapt. 14 "*Modeling Customer Satisfaction: A Comparative Performance Evaluation of Covariance Structure Analysis Versus Partial Least Squares*" in the "*Handbook of Partial Least Squares*" by Vinzi et al. and Berk's *Statistical Learning from a Regression Perspective* (Chapt. 5 on Random Forests). In general MAD is not uncommon to use for Poisson family regressions (granted less common than RMSE). – usεr11852 Aug 22 '16 at 21:58
thank you for the references, these will be very useful! – Mud Warrior Aug 23 '16 at 17:10

How to assess GLMM performance on a new data set?

1 Answers1

Linked