1

I have seen this question asked in one flavor or another, but I'm looking for clarity on a more specific piece. I have two text classification models:

Model A: train score=88%, test score=76%

Model B: train score=76%, test score=75%

The data set contains 1.5 millions observations. Model A maximizes the test score, but model B is only 1% lower and the train score is much closer to the test score. Which would be the preferred model? I'm concerned model A is overfitting the data, but since it still maximizes the test score would it be good practice to use it?

Ryan Boch
  • 103
  • 1
  • 7

1 Answers1

0

I think this answers my question exactly. Probably no hard and fast rule, but as I suspected I would want to maximize the test score. This is exactly what GridsearchCV does in sklearn. It does not look at the training scores when determining best model parameters.

https://stats.stackexchange.com/a/263119/208879

Ryan Boch
  • 103
  • 1
  • 7