3

I'm working with unbalanced data (2% of the class yes and 98% of the class no).

Regardless of the evaluation metric chosen in the training, I have obtained low sensitivity and high specificity.

For this reason I am working with the threshold variation to increase the sensitivity without losing much in the specificity. There are many threshold values (and close values) that improve the model, but not all of them return a good performance when applying to the test data.

Is it considered a cheat to use the test data to determine only the threshold of the final model?

If the answer to the question above is yes, would it be a good attempt to divide the test data into two parts, one for setting the threshold and the other for the final test?

I don't know if it is possible to include the threshold variation in the training, if possible it would be a solution?

1 Answers1

2

A three way split (train, validation, and test) is quite common:

https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets

You tune hyper-parameters on the validation set, and once you're happy with the result you evaluate the final performance the held-out test. In your case that would mean tuning the threshold on a validation set, and using the test set to estimate your performance on unseen real world data.

In general if you optimize your hyper-parameters on the test set, you will still end up with the best performing model, but your estimate of the performance on unseen data may be overly optimistic.

Max S.
  • 1,666
  • 8
  • 7
  • I'm sharing 75% for training and 25% for testing. In this 75% of training I can't define what would be the best threshold. So the ideal would be something like 75% (for training) + 10% (setting the threshold) + 15% (final test)? – Marcelo Rodrigues Nov 10 '20 at 02:22
  • 1
    Yes, that's the idea, the exact percentages are somewhat arbitrary – Max S. Nov 10 '20 at 03:28