I have a very basic question regarding parameter tuning using grid search. Typically some machine learning methods have parameters that need to be tuned using grid search. For example, in the following standard formulation of SVMs:
$$ \min_w\frac{1}{2}\|w\|^2 + C\sum_{i=0}^{N}{\rm{hinge\ loss}}(x_i,y_i,w) $$
We have to tune the model parameter $C$.
If I am given the training and test sets, then I would first split the training set into validation_train
and validation_test
. Tune the parameters using the validation_train
and validation_test
and then using the best parameters, retrain on the complete training set and finally perform testing on the test set. So my question is which of the following is better on an average?
Would it be better to use the final model from validation, which was trained on
validation_train
for final testing, because the parameters were optimized on this training set?Or would it better to use the entire training set and retrain the model with the best parameters from grid search? Although the parameters were not optimized for this set, we have more final training data in this case.