I want to run gradient boosting regression on a dataset whose rows are not independent. Specifically, the rows are clustered, and you could consider the clustering variable to be a random effect.
- What is the effect of ignoring the random effect, i.e. simply running the classifier on the target and the other features?
- What open source packages are available that can account for clustered data for gradient boosting?
- Any caveats to using the procedures from 2?
Edit: I saw How can I include random effects into a randomForest. I will now restrict my question to GBMs.