so ive been selecting features for a regression problem and have obtained a list of the best performing feature sets. (note my list is actually several thousand lines long)
188.493 186.989 [379.45, 0.68, 99.51, 102.71, 109.91, 2.07] 50,12,48
188.352 187.391 [465.3, 0.63, 116.43, 134.18, 104.84, 2.3] 42,36,27
188.007 187.506 [443.08, 0.67, 93.73, 116.96, 110.67, 2.26] 50,42,27
185.867 192.012 [398.89, 0.81, 81.6, 99.44, 124.01, 2.41] 72,53,48
The first number is the MSE on 10foldCV on the training set while optimizing hyperparameters. The second number is the MSE on the test set. Third and fourth items are the hyperparameters and feature sets (not important)
My question is: would the best model be the model that performed very best on the test set? or should I also be concerned with how it performed on the training set. For example, my fourth line, performed well on training set but much worse on the test set, while the first line, performed better on the test than the training.
Should I be looking for feature sets that perform similar on both training data CV and test? or just take the best model on the test set?
Or would it be best to use a combination of models? Any help is greatly appreciated. Thanks