I have seen this question asked in one flavor or another, but I'm looking for clarity on a more specific piece. I have two text classification models:
Model A: train score=88%, test score=76%
Model B: train score=76%, test score=75%
The data set contains 1.5 millions observations. Model A maximizes the test score, but model B is only 1% lower and the train score is much closer to the test score. Which would be the preferred model? I'm concerned model A is overfitting the data, but since it still maximizes the test score would it be good practice to use it?