AUC comparison with a set of common cases

Question

I am trying to prove there is statistical significance when I compare two classifier methods.

My proposed method only makes modifications on certain cases, the rest of them are still the same as the baseline, but the AUC is improved. I tried using DeLong and bootstrapping methods but the p-value I get is very high (my guess is it's because the classifier output is the same for most of the cases).

Is there any method that can take into account that only a few cases are modified for the new classifier?

Any help is appreciated. Thanks!

score 1 · Answer 1 · answered Jul 30 '18 at 05:04

ROC AUC is a statistic about ranks; see What does AUC stand for and what is it?.

Even if your new method only changes the predicted values for a small number of samples, this will still change the rankings. Here's an example: your data are tuples of $(\text{scores}, \text{labels})$. The baseline is $(0.49, 1), (0.51,0)$. This obviously has a ROC AUC of 0. But if you change the first score to $0.52,$ then the ROC AUC is 1.

The results of your statistical test are telling you that even if your new method changes these rankings, the results are not statistically significant.

AUC comparison with a set of common cases

1 Answers1