Finding optimal F1 threshold for classifier without probabilities (e.g. SVM)

Question

Assume I have a dataset split into train/val/test and I want to compute the optimal threshold value for an F1 score. This threshold value is in [0, 0.5] as described in What is F1 Optimal Threshold? How to calculate it?.

For a classifier that outputs a probability I would select the optimal F1 threshold on the validation set by examining the threshold that yields the best F1. This seems reasonable as selecting the threshold seems similar to selecting the best model, which would also be done using the validation set.

However, assume I have a classifier that does not output probabilities (like an SVM). How would you optimize the F1 on the validation set then?

score 1 · Answer 1 · answered Feb 05 '19 at 14:33

In cases in which you have a classifier which simply outputs, say, a binary value, I think your best bet is to do hyper-parameter optimization on the validation set and then pick the set of parameters which maximize your F1 score. You could choose the parameters via cross-validation - you can be as sophisticated as you want there.

In general cases in which a classifier outputs at least a score, you can turn it into a probability by normalizing it. By the way, since you cited it, SVM use the Platt Scaling (article, wiki page, related question) to return a probability.

In this case, the optimal C values for the positive and negative examples will probably be different, so the search would be over both values. If you want probabilities, don't use Platt scaling, use kernel logistic regression, which is intended for predicting probabilities. — Dikran Marsupial, Nov 14 '21 at 21:21

score 0 · Answer 2 · answered Oct 07 '19 at 09:18

I think that you should use the approach that some framework uses (such as H2O), in the sense that you do not optimize F1, but you simply run a-posteriori research of the optimum threshold (for binary classification is the x where if prediction_proba > x then class 1, else class 0) used to determine the predicted class.

In this case, you can find the optimum to minimise misspredictions, simply calculating a F1 (or F_beta) score with input :

    - Y_true ( 0/1 for example)
    - ( Y_prediction_probabilities > threshold ) (0 or 1, will change when changing threshold )

I don't think that would be a good idea as the SVM concentrates on decision boundary for best accuracy, and it doesn't necessarily give a good model of the class membership away from that boundary. — Dikran Marsupial, Nov 14 '21 at 21:20

Finding optimal F1 threshold for classifier without probabilities (e.g. SVM)

2 Answers2

Linked