Binary classification, imbalanced dataset optimization: AUC vs logloss

Asked Dec 06 '20 at 14:34

Active Dec 06 '20 at 14:41

Viewed 220 times

I'm running optimization on an imbalanced dataset and need to define my optimization metric. I'm working on disease detection so maximizing AUC might not be the best solution, as the certainty of the prediction is important.

I've defined my objective score as AUC-log_loss, and try to maximize this.

Can someone explain to me if this makes sense and please cite some sources where I can read about optimization using AUC vs log_loss in binary classification?

Thanks,

Edit: I've also read this topic already, and although it is very informative, I'm still not clear as it makes sense to optimize for both statistics.

edited Dec 06 '20 at 14:41

asked Dec 06 '20 at 14:34

beerzy

Start with this: https://stats.stackexchange.com/questions/464636/proper-scoring-rule-when-there-is-a-decision-to-make-e-g-spam-vs-ham-email. Then dive down the rabbit hole (follow the links) to see the advantages of using metrics like log loss and Brier score (both of which are “strictly proper scoring rules”). The gist is that models should be compared on strictly proper scoring rules, while a metric like ROCAUC is appropriate for assessing if a single model is any good but is not appropriate for comparing models. – Dave Dec 06 '20 at 16:17

Binary classification, imbalanced dataset optimization: AUC vs logloss

0 Answers0