6

I have a very basic question regarding parameter tuning using grid search. Typically some machine learning methods have parameters that need to be tuned using grid search. For example, in the following standard formulation of SVMs:

$$ \min_w\frac{1}{2}\|w\|^2 + C\sum_{i=0}^{N}{\rm{hinge\ loss}}(x_i,y_i,w) $$

We have to tune the model parameter $C$.

If I am given the training and test sets, then I would first split the training set into validation_train and validation_test. Tune the parameters using the validation_train and validation_test and then using the best parameters, retrain on the complete training set and finally perform testing on the test set. So my question is which of the following is better on an average?

  1. Would it be better to use the final model from validation, which was trained on validation_train for final testing, because the parameters were optimized on this training set?

  2. Or would it better to use the entire training set and retrain the model with the best parameters from grid search? Although the parameters were not optimized for this set, we have more final training data in this case.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
user678613
  • 173
  • 1
  • 5
  • possible duplicate of [How to report a SVM model to a 3rd party after cross-validation?](http://stats.stackexchange.com/questions/88535/how-to-report-a-svm-model-to-a-3rd-party-after-cross-validation) – Marc Claesen Mar 29 '14 at 23:38
  • @MarcClaesen it does not seem like the same/similar question. – user678613 Mar 31 '14 at 22:31

1 Answers1

1

Given that you trust your validation setup option 2 is the way to go. You have performed the CV to identify the most general parameter setup (or model selection or whatever you're trying to optimize). These findings should be applied to the entire trainingset and tested (once) on the test set. The picture below illustrates a setup I think works well when evaluating and testing the performance of machine learning algorithms.

Illustration of a rigorous cv setup

Dr. Mike
  • 1,526
  • 11
  • 10