0

I am working on a binary classification problem on an imbalanced data where the majority class is about 90% 'no' and the minority class is about 10% 'yes' of the total data.

Iteration 1: I randomly split my data (about 1200 rows) into 70% training and 30% test data and trained a random forest classified to get the probability for each of the two classes. I wanted to see how the model performance of a 'yes' or 'no' changes with the cutoff probability used to deciding the majority and minority class in test data. So I started with cutoff probability of 0.01, got the 'yes' or 'no' prediction and the corresponding AUC, sensitivity and specificity. In incremented the cutoff probability to 0.02 and noted the AUC, sensitivity and specificity. I repeated this each time incrementing the probability by 0.01 until I reached a cutoff probability of 1.00. This completed the first iteration.

Iteration 2 to 1000: I repeated the above experiment 1000 times by taking random 70% training, 30% test split each time.

Finally for each cutoff probability between 0.01 and 1.00, I took calculated the average values of the AUC, sensitivity and specificity and plotted them on the graph below.

enter image description here

Since the data is imbalanced, the sensitivity is high, lying between 0.95 and 0.99 as shown by the orange line. The AUC is highest when the cutoff probability is about 0.11 where as the specificity is highest when the cutoff probability is about 0.8. I observed similar pattern in graphs with other classifiers such as XGB, GBM, Adaboost etc. Each of these algorithms have their own distinct characteristic curves, but the overall pattern remains the same.

Question 1: The AUC is highest when the cutoff probability is about 0.11 where as the specificity is highest when the cutoff probability is about 0.8. Why does the optimal cutoff for AUC differ from that of specificity.

Question 2: Given the above information, if we are to manually choose a cutoff instead of letting the algorithm automatically decide then, what cutoff should we use for selecting the best model. Assume that no more feature engg. and re-modeling would be done.

Stats IT
  • 456
  • 2
  • 10
  • The AUC I am familiar with, which is AUROC, is calculated for a model over all thresholds, not per threshold. So you seem to be doing something different when you calculate AUC *per threshold*. Can you elaborate? – Stephan Kolassa May 01 '21 at 06:57
  • 1
    Your optimal threshold should not be set based on sensitivity, specificity or accuracy ([Why is accuracy not the best measure for assessing classification models?](https://stats.stackexchange.com/q/312780/1352)), but on the costs of wrong decisions or actions. See [Classification probability threshold](https://stats.stackexchange.com/q/312119/1352). – Stephan Kolassa May 01 '21 at 06:59
  • @StephanKolassa In calculating AUC using pROC package in R. auc(actual, predicted)) – Stats IT May 01 '21 at 06:59
  • @StephanKolassa I am aware of the cost based threshold. However, I want to know why exactly the optimal value of AUC occurs at a different threshold form the optimal value of specificity – Stats IT May 01 '21 at 07:02
  • 1
    Attempting to find a cutoff using an ROC curve is completely divorced from good decision making and the result pertains to group and not individual decisions. – Frank Harrell May 01 '21 at 11:25

0 Answers0