Cross-validation in multi-level model

Asked Sep 11 '14 at 23:18

Active Jun 02 '19 at 15:01

Viewed 771 times

Suppose I want to estimate the out-of-sample prediction error of a boosted regression model that has random intercepts and slops. There are $G$ groups and $N$ observations. If I want to estimate the out-of-sample prediction error using $k$-fold cross-validation, how do I set up the data partitioning? Is it more complicated that $k$-fold cross-validation? Note: my use case here is the prediction of the data from a new group.

edited Jun 02 '19 at 15:01

kjetil b halvorsen

63,378
26
142
467

asked Sep 11 '14 at 23:18

Brash Equilibrium

3,565
1
25
43

2

I think I have seen people doing some sort of CV within CV, where they first take k of G groups out, and with the remaining data take k2 of N data out from each of k groups. But in my projects I simply do a CV on the group level, that's it. So if G is not very large, I do a G-fold CV. Otherwise k fold CV where k – qoheleth Sep 12 '14 at 04:57
Was totally going to ask if nested folds were the way to go. I mean, that's what it seems like to me, but only if you are guaranteed to have enough to make k_indiv folds for each individual. – Brash Equilibrium Sep 12 '14 at 08:39
I think k-folds derived from the group level alone are problematic, because you are not cross-validating the within-individual predictive accuracy. – Brash Equilibrium Sep 12 '14 at 08:40
yea, but I guess we have a bias-variance trade off here. – qoheleth Sep 16 '14 at 04:07
Not if we can come up with the proper partitioning mold that reflects the sampling model – Brash Equilibrium Sep 16 '14 at 21:24
This paper (Roberts et al 2017) discusses CV stratagies for data with dependence structures (including group based dependence) is worth reading: https://onlinelibrary.wiley.com/doi/10.1111/ecog.02881 Good overview + discussion – adibender Jun 02 '19 at 15:38

Cross-validation in multi-level model

0 Answers0

Linked