2

I split my data into 80% training and 20% testing data set. Using the 80% split and 10 cross-validations, I build the model and get the training accuracy. Then I test my model on the 20% split and get the testing accuracy.

The question is: which is important training or testing accuracy?
If I used 10 different machine learning algorithms on the same split, which accuracy will guide me for the best algorithm, training or testing accuracy?


I searched on CV, but the following similar questions are unanswered:

Which model is better based on test and training accuracy

Should I use training or testing AUC for selecting best classifier?

Should using training datasets or testing datasets for evaluating the performance of the models

Jan Kukacka
  • 10,121
  • 1
  • 36
  • 62
forever
  • 19
  • 6

1 Answers1

3

The testing data in your cross-validation mimics the situation of "true" testing data. So if the performance of your model on new, not-before-seen data is important, then you should go by its performance on the CV testing data. (I have a hard time picturing a situation where training data performance is more important.)

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357