5

I've designed a 4 classifiers which perform pretty decently (all of them are above 90% in accuracy).

However, they don't have similar AUC for their respective ROC curves (obviously, it doesn't have to be).

If I were to use these classifiers in real-time data, which one do I choose based on the following result

Classifier A: Accuracy: 100%, AUC: 84%

Classifier B: Accuracy: 95%, AUC: 83%

Classifier C: Accuracy: 100%, AUC: 69%

Classifier D: Accuracy: 100%, AUC: 77%

garak
  • 2,033
  • 4
  • 26
  • 31
  • Can you provide some more information about your situation, data, and these classifiers? – gung - Reinstate Monica Jul 11 '12 at 22:10
  • 1
    I'm building a gesture recognition classifiers. Each feature vector is a 49 unit length long. Classifiers A and C are multinomial Naive Bayes Classifiers and Classifiers B and D are LDA classifiers. – garak Jul 11 '12 at 22:15
  • 2
    [ROC](http://en.wikipedia.org/wiki/Receiver_operating_characteristic) curve shows behaviour of a quantitative classifier on various cut-points. Accuracy varies for various cut-points. For what cut-points do you report your accuracies? – ttnphns Jul 11 '12 at 22:47
  • 1
    Unless I am mistaken the answer so far seem to miss that the result does not seem possible. An accuracy of 100% for any cutoff should automatically result in an AUC of 1. So... what gives? Do you have a strange definition of accuracy? Do you really compare the same scenarios? – Erik Jul 12 '12 at 07:52
  • I strongly agree with @Erik that 100 ACC <=> AUC=1. However, I suspect a rounding error here combined with a heavy class skew towards the "negative" class. – mlwida Jul 12 '12 at 11:16
  • is it the same case with Multiclass classifiers? The above is a averaged ROC curve defined by [Hand & Till (2001)](http://www.springerlink.com/content/nn141j42838n7u21/) – garak Jul 12 '12 at 12:41

2 Answers2

2

One thing to keep in mind is that both accuracy and AUC are point estimates. Estimating confidence intervals for both makes comparisons more interpretable. However, it is more challenging to obtain confidence intervals for accuracy (depending on the resampling scheme).

One paper that discusses this is "Calculating confidence intervals for prediction error in microarray classification using resampling" by Jiang and colleagues.

Aliferis and colleagues (Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data) review the perspective that accuracy is an undesirable metric. I think Frank Harrell (coauthor on the above paper) also reviews this in his book "Regression Modeling Strategies".

julieth
  • 2,252
  • 1
  • 15
  • 20
  • I'll add to @julieth: I wouldn't get excited about any "accuracy" measure unless a high-resolution calibration plot accompanies it, and the plot (say, based on loess) is penalized for overfitting using the bootstrap. The R `rms` package makes this easy to do for some problems. – Frank Harrell Jul 12 '12 at 10:48
1

The AUC averages the performance over the whole range of classifier scores, starting from low coverage / low false positive rate and ending at high coverage / high false positive rate. This is not always the best way to compare performance because you may have a stronger emphasis on coverage rather than precision or vise versa.

Once you plot the ROC curves and/or the precision-recall curves (for the relevant R functions see, e.g., this answer), you can compare the classifiers and select the one that provides better precision for a given high recall value (if these are your needs) or vise versa. This approach will also provide you with the cutoff for accepting the selected classifier's predictions.

Itamar
  • 789
  • 4
  • 10