0

I have the following evaluation metrics on the test set, after running 6 models for a binary classification problem:

  accuracy logloss   AUC
1   19%      0.45   0.54
2   67%      0.62   0.67
3   66%      0.63   0.68
4   67%      0.62   0.66
5   63%      0.61   0.66
6   65%      0.68   0.42

I have the following questions:

  • How can model 1 be the best in terms of log-loss (the log-loss is the closest to 0) since it performs the worst (in terms of accuracy). What does that mean?
  • How come does model 6 have lower AUC score than e.g. model 5, when model 6 has better accuracy. What does that mean?
  • Is there a way to say which of these 6 models is the best?
Sycorax
  • 76,417
  • 20
  • 189
  • 313
quant
  • 339
  • 2
  • 9
  • 1
    What is the prevalence of each class? What is the threshold used to choose a class when calculating accuracy, and is the same threshold used for all 6 models? Can we see a confusion matrix and a plot of the ROC curve for at least one of the models? How many examples in the test set? Was the exact same test used to evaluate all six models (as opposed to say, cross validation.) All I can say from the information provided is that the log loss reported is very high: a constant model which always predicts p=0.5 will get a log loss of 0.7. Model 6 has an AUC less than 0.5 which is also a red flag. – olooney Oct 29 '19 at 16:18
  • 2
    Accuracy, log-loss and AUC provide different values because they answer different questions. The correct model is the one which produces the best trade-off for your organization. We can't tell you what that trade-off is because we don't know what kinds of costs are involved and we don't know how to compare the severity of the different kinds of errors. See also: https://stats.stackexchange.com/questions/414349/is-my-model-any-good-based-on-the-diagnostic-metric-r2-auc-accuracy-rmse – Sycorax Oct 29 '19 at 16:31
  • 1
    See also: https://stats.stackexchange.com/questions/362982/is-it-possible-for-a-model-to-have-higher-sensitivity-specificity-but-lower-accu/363321#363321 – Sycorax Oct 29 '19 at 16:32
  • 1
    At first your AUC results are very bad which suggest your model generate random outputs. You should be aware that AUC is operating on the likelihoods and the accuracy processes final ouput of a model. Please check a prior distribution of the class over test set. You can easy get high accuracy with low AUC if your model learnt only a prior answer (i.e. always return the same value). – podludek Oct 30 '19 at 11:32

0 Answers0