What is the reason for the difference in AUC from probabilities vs AUC from final prediction

Question

I have a binary classification model. The target variable is my_test_data$target_variable and has values 'y' or 'n'. my_test_data$target_variable_numeric is the target variable converted to numeric where 'y' is represented by a 1 and 'n' by a 0. I plotted the ROC curve and computed the AUC using two methods

Case 1: I used type = "prob" and then compared the probabilities with the true 1 or 0 values of my target variable my_test_data$target_variable_numeric. I got an AUC = 91

  pred    = predict(my_model, my_test_data, type = "prob")

  roc_obj = plot.roc(my_test_data$target_variable_numeric, pred$y,
                     main = "ROC curve",
                     percent = TRUE,
                     ci = TRUE,
                     print.auc = TRUE)

Case 2: With the same model (without re-training), I let the model predict the categorical target variable 'y' or 'n', then converted 'y' or 'n' to numeric 1 or 0 and then compared them with the true 1 or 0 values of my target variable my_test_data$target_variable_numeric. I got a ROC of only 75.

  pred        = predict(my_model, my_test_data)
  y           = as.data.frame(pred)
  colnames(y) = 'my_prediction_categorical'
  my_test_data = cbind(test_data,y)
  my_test_data$my_prediction_categorical = ifelse(my_test_data$my_prediction_categorical == 'y',1,0)

  roc_obj = plot.roc(my_test_data$failure, test_data$predicted,
                   main = "ROC curve",
                   percent = TRUE,
                   ci = TRUE,
                   print.auc = TRUE)

Why is there a huge difference in the AUC of the two approaches even though I have used the same model (i.e. without re-training) and the same test data and is it possible to get an AUC close to that of Case 1 after the final prediciton.

I think your second "AUC" hardly counts as an AUC; it seems like it's just the accuracy of your model when you use the software default threshold. Accuracy has known issues: https://stats.stackexchange.com/a/312787/247274. — Dave, Apr 26 '21 at 13:33
@Dave I read the responses in the link and I am aware of some of it. However, my question is specifically with regards to why AUC using probabilities is much higher and in that case how to make the final prediction from them. If case 1 is a better measure of AUC then how do I use it to make a better final prediction? Any suggestion? — Stats IT, Apr 26 '21 at 14:14
Is your comment meant to clarify that this question is asking how to use AUC to choose a threshold for classification? Or something else? Can you edit your question to clarify what problem you're trying to solve and how an AUC helps you solve it? Perhaps this answers your question https://stats.stackexchange.com/questions/517828/why-do-we-look-at-multiple-thresholds-in-auc/517853#517853 ? — Sycorax, Apr 26 '21 at 16:23
@Sycorax I get the feeling that the OP is using the software to "round" the output probability to the 0/1 class label, then use the 0/1 labels as if they were probability values from which a ROC curve can be created. Stats IT, is that what you're doing? — Dave, Apr 26 '21 at 16:28
@Dave Yes you are right, because in my final output for my use case, I need a 1 or 0 and stopping at probabilities complete the solution — Stats IT, Apr 26 '21 at 17:06

Dave · Accepted Answer · 2021-04-26T17:22:55.523

1

You have two different models. Let $X$ be the data and $f$ be some function (logistic regression, neural network, whatever).

Model 1

$\text{probability} = f(X) \in [0, 1]$

Model 2

$\text{probability} = \text{round}(f(X)) \in \{0,1\}$

The second model has a lower ROCAUC. The way I might explain this is that a number of cases are just on the wrong side of the threshold (probably $0.5$), such as being true $0$ cases with a predicted probability of $0.51$ that gets rounded to $1$ in Model 2. Model 1 considers this to be an error but not a gigantic error. Model 2 considers this to be a gigantic error: not only do you lean in the wrong direction, but you confidently assert the wrong class.

edited Apr 26 '21 at 17:22

answered Apr 26 '21 at 17:16

Dave

28,473
4
52
104

1

Somewhat tangential, Frank Harrell's blog posts on classification are suggested reading: https://www.fharrell.com/post/classification/ and https://www.fharrell.com/post/class-damage/. – Dave Apr 26 '21 at 17:17
1

Another way to put it is that rounding discards all of the fine-grained information of the scores, and the purpose of a ROC curve is to characterize the trade-offs of all thresholds. Using round only considers a single threshold at 0.5. – Sycorax Apr 26 '21 at 17:20
@Dave In that case for random forest model 1 and model 2, AUC are 91 and 75 respective where as in case of decision tree they are very close by at 82 and 81 respectively. Based on this info which of these 4 models can we say is a better model – Stats IT Apr 26 '21 at 19:55
@StatsIT I would consider comparing on one of those proper scoring rules that you read about on Harrell's blog, but based on the AUC, the best model is the one with the highest AUC. (Remember that you have four models, not two.) – Dave Apr 26 '21 at 20:10

What is the reason for the difference in AUC from probabilities vs AUC from final prediction

1 Answers1