I have the following evaluation metrics on the test set, after running 6 models for a binary classification problem:
accuracy logloss AUC
1 19% 0.45 0.54
2 67% 0.62 0.67
3 66% 0.63 0.68
4 67% 0.62 0.66
5 63% 0.61 0.66
6 65% 0.68 0.42
I have the following questions:
- How can model 1 be the best in terms of log-loss (the log-loss is the closest to 0) since it performs the worst (in terms of accuracy). What does that mean?
- How come does model 6 have lower AUC score than e.g. model 5, when model 6 has better accuracy. What does that mean?
- Is there a way to say which of these 6 models is the best?