7

I have a basic question on using cross-validation for model parameter tuning (model training) and model evaluation (testing) similar to this Model Tuning and Model Evaluation in Machine Learning

I understand that it is suggested to only use the training set (test set remain 'unseen') to tune the model parameter ('mtry', I am using Random Forest (RF)) i.e. training set is split further into training and validation set to do k-fold cross validation to obtain optimum parameter value.

However, I am confused if I then wish to do k-fold cross validation to evaluate the model accuracy (to test the trained model on different test sets sampled from the whole dataset). Is the right model evaluation procedure is:

(1) Simply rerun RF, with the parameter 'mtry' tuned by CV only using training set, to different training-test set partitions? Although only 1 (one) realization/partition of training set is used to tune 'mtry' at the beginning? OR should I tune 'mtry' using different training set realizations to begin with?

(2) Run RF with the tuned 'mtry' on different bootstrap samples from the 1 (one) realization of test set (at the beginning) not used to tune 'mtry'?

Thank you and sorry if my writing is a bit confusing.

HadiITC
  • 73
  • 1
  • 4

1 Answers1

8

The simple rule is that data used for evaluating the performance of a model should not have been used to optimize the model in any way. If you split all of the available data into k disjoint subsets to use to tune the hyper-parameters of a model (e.g. the kernel and regularization parameters of an SVM), then you cannot perform unbiased performance estimation as all of the data has influenced the selection of the hyper-parameters. This means that both (1) and (2) are likely to be optimistically biased.

The solution is to use nested cross-validation, where the outer cross-validation is used for performance evaluation. The key point is that we want to estimate the performance of the whole procedure for fitting the model, which includes tuning the hyper-parameters. So you need to include in each fold of the outer cross-validation all of the steps used to tune the model, which in this case includes using cross-validation to tune the hyper-parameters independently in each fold. I wrote a paper on this topic, which you can find here, section 5.3 gives an example of why performing cross-validation for both model selection and performance evaluation is a bad idea.

Dikran Marsupial
  • 46,962
  • 5
  • 121
  • 178
  • 1
    *nested cross validation* is sometimes also called *double cross validation*. – cbeleites unhappy with SX Jan 12 '15 at 21:51
  • Thank you Dikran and @cbeleites for pointing me to the right direction (nested/double CV). At first, I have further confusion as doing the inner(?) k-fold CV will then give k possibly different optimum models (possibly different optimum 'mtry') trained from different training set realizations in each fold. But http://stats.stackexchange.com/questions/65128/nested-cross-validation-for-model-selection shows that in fact the stability/variation of the optimum parameter and prediction accuracy (from outer CV) are useful to assess the method (RF) performance for my particular dataset. – HadiITC Jan 13 '15 at 10:56