23

I am using the library caret in R to test various modelling procedures.

The trainControl object allows one to specify a re-sampling method. The methods are described in the documentation section 2.3 and include: boot, boot632, cv, LOOCV, LGOCV, repeatedcv and oob. Although some of these are easy to infer, not all of these methods are clearly defined.

What are the procedures corresponding to these resampling methods?

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
Ram Ahluwalia
  • 3,003
  • 6
  • 27
  • 38
  • documentation link is broken. Use [this](https://cran.r-project.org/web/packages/caret/vignettes/caret.pdf) instead. – vikas Dec 04 '16 at 02:46

2 Answers2

21

Ok, here is my try:

  • boot - bootstrap
  • boot632 -- 0.632 bootstrap
  • cv -- cross-validation, probably this refers to K-fold cross-validation.
  • LOOCV -- leave-one-out cross validation, also known as jacknife.
  • LGOCV -- leave-group-out cross validation, variant of LOOCV for hierarchical data.
  • repeatedcv -- is probably repeated random sub-sampling validation, i.e division to train and test data is done in random way.
  • oob -- refers to out-of-bag estimation proposed by Breiman, which further is related to bootstrap aggregating. (The file in the link is not a ps file, but a ps.Z file, rename it and then try opening.)
VaTa
  • 3
  • 2
mpiktas
  • 33,140
  • 5
  • 82
  • 138
  • 2
    I believe that LGOCV is random splitting between a training set and validation set, repeated n times. So, instead of the ordinary case of splitting data between train and hold-out (build model on train and validate on hold out) once, this process is repeated many times. – B_Miner Nov 11 '11 at 19:30
  • 3
    I also believe that repeatedCV is k-fold cross validation, done multiple times. – B_Miner Nov 11 '11 at 19:31
  • 1
    Hard to believe this isn't documented somewhere. – andrew Oct 16 '16 at 16:49
5

The repeatedcv is repeated 10–fold cross–validation for sure, according to Max Kuhn's presentation. The default resampling scheme is the bootstrap.

A good file that you can look about resampling methods is Predictive Modeling with R and the caret Package (pdf). Max presented this in "useR! 2013".

gung - Reinstate Monica
  • 132,789
  • 81
  • 357
  • 650
tigergopro
  • 51
  • 1
  • 2