21

I often find myself training several different predictive models using caret in R. I'll train them all on the same cross validation folds, using caret::: createFolds, then choose the best model based on cross-validated error.

However, the median prediction from several models often outperforms the best single model on an independent test set. I'm thinking of writing some functions for stacking/ensembling caret models that were trained with the same cross-validation folds, for example by taking median predictions from each model on each fold, or by training a "meta-model."

Of course, this might require an outer cross-validation loop. Does anyone know of any existing packages/open source code for ensembling caret models (and possibly cross-validating those ensembles)?

Zach
  • 22,308
  • 18
  • 114
  • 158

3 Answers3

19

It looks like Max Kuhn actually started working on a package for ensembleling caret models, but hasn't had time to finish it yet. This is exactly what I was looking for. I hope the project gets finished one day!

edit: I wrote my own package to do this: caretEnsemble

Zach
  • 22,308
  • 18
  • 114
  • 158
8

What you are looking for is called "model ensembling". A simple introductory tutorial with R code can be found here: http://viksalgorithms.blogspot.jp/2012/01/intro-to-ensemble-learning-in-r.html

thiakx
  • 89
  • 1
  • 2
  • 3
    Not to be nit picky, but "ensembling" is right in the title of my post. I'm very specifically looking for an R package for ensembling arbitrary models, which doesn't seem to exist. Thanks for posting the code, though. Maybe I'll write my own package! – Zach Oct 15 '12 at 19:09
1

I'm not quite sure what you are looking for but this might help: http://www.jstatsoft.org/v28/i05/paper

It is how to use multiple models in caret. The part you might be interested is section 5 on pg. 13.

screechOwl
  • 1,677
  • 3
  • 21
  • 32
  • What I'm looking for is a package that would take as an input a list of caret objects, and would then output the median, mean, or weighted mean average of their predictions. More advanced functionality might include optimizing the weights through nested-cross validation. – Zach Oct 15 '12 at 19:11