I read a lots of discussions and articles and I am a bit confused on how to use SVM in the right way with cross-validation.
If we consider 50 samples and 10 features describing them.
First I split my dataset into two parts : the training set (70%) and the "validation" set (30%).
Then, I have to select the best combination of hyperparameters (c, gamma) for my SVM RBF. So I use cross-validation on the trainnig set (5-fold cross-validation) and I use a performance metrics (AUC for example) to select the best couple.
Finally, I use the best hyperparameters on the "validation" set and I measure the performance metrics. My questions are :
- is the ratio 70/30 for splitting the dataset appropriate?
- is it useful to use cross validation on the "validation" set?
- is it better to make a loop on this procedure in order to have randomly different compositions of the training and validation sets?
- if 3 is better, how many loops and which statistics on the performance metrics?
- do we agree that use cross-validation on the full dataset is the worst thing to do?