Which metric to use to evaluate highly imbalance classification model performance

Question

I have to do classification model to predict the possibilities of person getting cancer based on certain attributes. The data is highly imbalanced. As per client requirement I have to report model performance based on any one of these 4 metrics

AUC_Weighted
normalized root mean square
log loss
Accuracy

Can you suggest me which one should I use?

It's very unlikely that a single metric will have a one-to-one correspondence with the classification error from your model. Why not report them all? — , Dec 13 '20 at 17:46

score 2 · Answer 1 · answered Dec 12 '20 at 19:06

You should use proper scoring-rules for probabilistic predictions. Take a look at the tag wiki for more information.

Log loss is a proper scoring rule. I personally would use this. The mean squared error between probabilistic predictions and the actual is the Brier score, which is also a proper scoring rule, so this would also be a possibility. As a matter of fact, both scoring rules are actually strictly proper. Our thread Why is LogLoss preferred over other proper scoring rules? compares the two.

Taking the root of the MSE will not have an impact on its properness. With normalization, it depends on how you want to normalize.

The AUC, as a scoring rule, is problematic.

Accuracy is an improper scoring rule. More precisely, it's not a scoring rule at all. Don't use it. See also Why is accuracy not the best measure for assessing classification models?

My personal favorite would be the log loss, with the Brier score as a close second, for reasons given in the thread linked above.

In your opinion, why did not the client also suggest the AIC and the BIC as model selection tools ? — Camille Gontier, Dec 12 '20 at 20:22
@CamilleGontier: I don't know. But information criteria are *model selection* tools, as you write, while the KPIs above are *prediction quality* KPIs. The client may simply be more interested in holdout prediction performance. — Stephan Kolassa, Dec 13 '20 at 06:52

Which metric to use to evaluate highly imbalance classification model performance

1 Answers1

Related