1

I'm doing binary classification on different models, GLM, Random forest and SVM have the same accuracy, recall, specificity, precision and f1 score, however they all have a different AUC-PR curve.

There is prevalence in the data 70% from the positive class. I'm using random seeds before every model.

Is that possible? If so what is the explanation behind this?

bolleke
  • 77
  • 5

1 Answers1

0

AUC, loosely speaking, checks accuracy at all possible cutoff thresholds, not just $0.5$. It happens to be the case that, for your models, all have the same accuracy when you set that threshold to the default of $0.5$. Test the accuracy of your models when you set the cutoff at $0.4$ or $0.7$ (or whatever). Whatever software you’re using will have documentation that explains how to do this.

There’s nothing special about $0.5$ when it comes to a threshold for making a decision.

Additionally, I encourage you to look around Cross Validated for discussions about proper scoring rules, particularly comments by our member Frank Harrell. Accuracy has flaws. Shamelessly, I will mention that I posted a question a few weeks ago that gives an example where accuracy may not be a good performance metric: Proper scoring rule when there is a decision to make (e.g. spam vs ham email).

Dave
  • 28,473
  • 4
  • 52
  • 104
  • Thank you for the insights, I am using different evaluation methods for my model, mainly AUC-PR since there is prevalence in the models. I'll change the cutoff to see if it makes a difference – bolleke May 20 '20 at 10:14
  • I strongly recommend you take the advice of Dave (there's nothing special about the cutoff, look at proper scoring rules) and Frank Harrell. – LSC May 20 '20 at 10:28