I'm working with unbalanced data (2% of the class yes and 98% of the class no).
Regardless of the evaluation metric chosen in the training, I have obtained low sensitivity and high specificity.
For this reason I am working with the threshold variation to increase the sensitivity without losing much in the specificity. There are many threshold values (and close values) that improve the model, but not all of them return a good performance when applying to the test data.
Is it considered a cheat to use the test data to determine only the threshold of the final model?
If the answer to the question above is yes, would it be a good attempt to divide the test data into two parts, one for setting the threshold and the other for the final test?
I don't know if it is possible to include the threshold variation in the training, if possible it would be a solution?