Cross-validation for mixed-effect logistic regression?

Question

I would like to use cross-validation to test how predictive my mixed-effect logistic regression model is (model run with glmer). Is there an easy way to do this using a package in R? I've only seen cross validation functions in R for use with linear models.

This question appears to be off-topic because it is about asking for R packages / code. — gung - Reinstate Monica, Mar 01 '14 at 17:39
Welcome to the site, @user1566200. If you are only asking about how to do this in R, this is be off-topic for CV (see our [help center](http://stats.stackexchange.com/help/on-topic)). R-based programming questions can be on-topic on [Stack Overflow](http://stackoverflow.com/), but this isn't a programming question, & it lacks a [reproducible example](http://stackoverflow.com/q/5963269/), so it would be off-topic on SO as well. The r-help-listserv might be a viable option. If you have a question about the substantive statistical issues here, please edit to clarify, else this may be closed. — gung - Reinstate Monica, Mar 01 '14 at 17:43
Sorry! Didn't realize R package questions aren't allowed here. Where can I repost? — user1566200, Mar 01 '14 at 17:56
@user1566200: no need to repost. If enough users agree that the question is more on-topic at SX, it will be migrated. — cbeleites unhappy with SX, Mar 01 '14 at 17:57
No need to apologize, it's an easy mistake. CV is a Q&A site for statistics (ML, data-viz., etc.) questions, not for how to use software. I would guess the best option would be the r-hlep-listserv, but it's not clear you will need to re-post. The answer for R is already given. — gung - Reinstate Monica, Mar 01 '14 at 17:58
I'm no expert on mixed models, but for testing predictive performance you need to make sure that the splitting is done on the uppermost level of your data hierarchy. Otherwise you have a "leak" between training and test data. — cbeleites unhappy with SX, Mar 01 '14 at 18:10
Also: http://stats.stackexchange.com/questions/18971/cross-validation-for-mixed-models?rq=1 may be relevant. — cbeleites unhappy with SX, Mar 01 '14 at 18:13

score 3 · Accepted Answer · answered Mar 01 '14 at 17:23

3

Check out the caret package. It has utilities to simplify building and comparing models based on really any arbitrary algorithm. The particular function in the package you are looking for is train.

This page gives a demo of how to fit a model using the train function with 10-fold cross-validation.

answered Mar 01 '14 at 17:23

David Marx

6,647
1
25
43

Thanks! Unfortunately I don't see glmer/Mixed Effects models as model options for train. – user1566200 Mar 01 '14 at 17:55
If you look at the documentation for the `train` function, you'll see it directs you to the following instructions on using `train` with user defined (or otherwise unsupported) models: http://caret.r-forge.r-project.org/custom_models.html – David Marx Mar 01 '14 at 18:05
2

Is it possible to tell `caret` that there is a grouping/clustering in the data that must be taken into account when splitting the data - but that otherwise the splitting should be random? – cbeleites unhappy with SX Mar 01 '14 at 18:08
Check out their page on data splitting. http://caret.r-forge.r-project.org/splitting.html – David Marx Mar 01 '14 at 18:45
I know that page, and as far as I understand it there is no indication of splitting by a given grouping. However, I thought you may have further insight (e.g. newly added functionality which is not yet explained on the web page). – cbeleites unhappy with SX Mar 01 '14 at 20:45
1

If I understand what you're asking: set up your grouping as a class variable, and then pass that classification to `createDataPartition` as described on that page. Maybe I don't understand what you're asking, but I'm fairly certain that page holds your answer. – David Marx Mar 01 '14 at 21:06

Cross-validation for mixed-effect logistic regression?

1 Answers1