2

I'm working with unbalanced class databases and I'm wondering what's the best practice to retrieve the best threshold cutoff value using roc curve ?

I want the best cutoff to maximize my F1 score.

I already have a roc curve I'm wondering what the best practice to generate the best threshold cutoff.

Alvaro Joao
  • 631
  • 1
  • 5
  • 17
  • Slippery slope of a question... Please read [this entry](http://stats.stackexchange.com/a/67442/67822). If you are using R and want to calculate the values suggested [here](http://stats.stackexchange.com/a/29727/67822), I have it summarized for different packages in my notes [here](http://rinterested.github.io/statistics/roc.html). – Antoni Parellada Sep 09 '16 at 15:21

1 Answers1

3

For any input (threshold) you have one output (F1 score), so, you can try to do a grid search, where you try every possible threshold from 0 to 1 in grid (say, seq(0,1,by=0.01)) and see which number maximize the F1 score.

In addition, the finding the best threshold can be also viewed as a one dimensional optimization problem (without using gradient). You can try optimize in R. Details can be found here. The difference between grid search and optimize is optimize is using a "smarter way" to search, e.g., if we see worse results we will not continue on that direction.

Haitao Du
  • 32,885
  • 17
  • 118
  • 213