I am using a random forest for a binary classification problem using sklearn. The sklearn implementation outputs both a predicted class, and a probability for each class. The sklearn implementation assigns to class B each event whenever the prob of belonging to class A and B are both 0.5. I believe it is not legitimate to override this decision by changing the threshold of 0.5, but I am not sure of this. I am using the ROC curve to make a better decision. I know that the curve represents the True Positive Rate as a function of the True Negative Rate, but how does one obtain the threshold that belongs to each point of the curve? I mean, each point is given by the x,y coordinates, but not a threshold itself. And, more importantly, is this threshold the threshold applied to the random forest predicted class probabilities for classes A and B, or what does it refer to?
Asked
Active
Viewed 2,211 times