How do you use a ROC curve to optimize a random forest classifier?

Question

I am using a random forest for a binary classification problem using sklearn. The sklearn implementation outputs both a predicted class, and a probability for each class. The sklearn implementation assigns to class B each event whenever the prob of belonging to class A and B are both 0.5. I believe it is not legitimate to override this decision by changing the threshold of 0.5, but I am not sure of this. I am using the ROC curve to make a better decision. I know that the curve represents the True Positive Rate as a function of the True Negative Rate, but how does one obtain the threshold that belongs to each point of the curve? I mean, each point is given by the x,y coordinates, but not a threshold itself. And, more importantly, is this threshold the threshold applied to the random forest predicted class probabilities for classes A and B, or what does it refer to?

score 0 · Answer 1 · answered Jun 20 '16 at 17:30

0

So the threshold is a parametric parameter of the curve. Where you can find the interval in which it is in by using the scores two points immediately adjacent to them. You can also look into an $F_1$ score to help you pick an optimal threshold.

answered Jun 20 '16 at 17:30

Whispers

31
3

How do you use a ROC curve to optimize a random forest classifier?

1 Answers1