is this a correct way of using cross validation

Question

I want someone to answer me if what I'm doing is correct. I have a labelled data that I want to train different machine learning models that can predict the outcome. here are the steps that I have went through:

I have divided the data into two sets 80% for training and 20% for testing.
I have cross validated the training set (and only the training set) with 10 folds using different models (Knn, ANN, SVM ,...etc).
I kept tuning the parameters of the models until I got a satisfactory root mean squared error (RMSE) for each model.
I used the parameters that produced the lowest RMSE to build each model using the training set (80% of the data).
I fed the testing set (the remaining 20%) into the each model and got a prediction from each model.
I evaluated the testing set prediction error of each model using MSE, RMSE,MAPE and MAE.
Compare the models and recommend the model that produced the lowest error.

My questions:

is using 10-fold cross validation on the testing set alone is similiar to dividing the data into 70% training, 10 % validation and 20% testing? It would be really helpful if you could provide me with research papers that adopt such technique.
Does this procedure makes sense, or am I doing something wrong?

There are [999 posts](https://stats.stackexchange.com/search?q=[cross-validation]%20how%20to%20use) dealing with "how to use cross validation" on this site. Did you have a look at them? For example [this one](https://stats.stackexchange.com/q/187881/163572) or [this one](https://stats.stackexchange.com/q/250282/163572)? — Jan Kukacka, Mar 16 '18 at 15:49

score 0 · Answer 1 · answered Mar 16 '18 at 15:45

Though I cannot recall having seen a strategy such as this used in the literature, my initial impression is that it is doing a cross-validation on a cross-validated conclusion. A ”second-order“ cross-validation, if you will.

For example, the 80/20 split can be seen as a form of cross-validation; a 1-k unbalanced split. Thus, doing a cross-validation on the 80% subset is essentially just producing a different model that will then be tested on the 20% testing subset.

My suggestion is to just do the 10k CV. Additionally, it may help to think of validation and testing as synonymous in this context.

Hope this helps.

is this a correct way of using cross validation

1 Answers1