I'm trying to optimize the hyperparameters of my model using the Bayesian approach with the hyperopt
library. I have to code a loss
to evaluate each iteration of the optimization, and the classic metrics are usually chosen, like
loss = 1 - accuracy
Now, since I want to consider both a good model on test data and a not-overfitted model, I came to define the loss as this
train_loss = 1 - train_f1_score
test_loss = 1 - test_f1_score
loss = test_loss * 10^{test_loss - train_loss}
where the test f1 is calculated based on the mean f1 on a 3-fold cross-validation. The idea is that the metric becomes higher with an overfitted model, even if the test score is good.
I have a doubt: am I missing some particular feature that a good evaluation metric needs to have?