Best Eval Metric for Credit Scoring: ROC AUC/PR AUC/F1?

Question

My team is developing a credit scoring model for a situation in which...

The positive class accounts for 10% of the training data
FNs (predicting no default for actual default) costs us ~\$10-15K
FPs (predicting default for no actual default) carries an opportunity cost of ~$2.5K.

Historically, we have used ROC AUC to evaluate our models, but due to the class imbalance, we are exploring other options. Currently, we are using ROC AUC with class weights.

Does anyone have advice on the best eval metric for our situation? Several that we have considered are ROC AUC, PR AUC, and F1; however, we are open to other options and want to make the determination objectively (e.g., not just the one that conveniently happens to favor our individual models). We have also considered cost-sensitive classification, but are hesitant about its potential to hardcode our costs.
Someone on the team has suggested tuning the class weights. Is there a situation in which that would make sense?

I will post the links I usually post i response to [tag:unbalanced-classes] problems. Frank Harrell’s two blog posts will be of particular interest to you. // Do $10\%$ of your cases really default? https://stats.stackexchange.com/questions/357466 https://www.fharrell.com/post/class-damage/ https://www.fharrell.com/post/classification/ https://stats.stackexchange.com/a/359936/247274 https://stats.stackexchange.com/questions/464636/ https://twitter.com/f2harrell/status/1062424969366462473?lang=en — Dave, Feb 21 '22 at 21:48
why not log loss which is a proper scoring rule. do you issue fixed size loans? otherwise I would have thought accurate probability estimates matter not just ranking - don't you want to be able to calculate expected loss ie (Principal * prob default)? — seanv507, Feb 21 '22 at 22:46
@seanv507 A calibrated model seems like the goal here, certainly. I am hopeful the OP will read some of my links, especially Harrell’s blog, and realize the benefits of evaluating the probability predictions. // I’m still concerned about the percentage of cases that default. Is that $10\%$ representative of reality or something you rigged? — Dave, Feb 22 '22 at 03:53
@Dave, thank you for the links. I am still working through them, but to your question, the default rate is correct. We are in subprime — swritchie, Feb 22 '22 at 13:48
@seanv507, we do not issue fixed size loans and do want to calculate expected loss. We currently use the Python library CatBoost, which uses log loss under the hood for optimization and produces probability estimates. However, we use ROC AUC for comparing models against one another. Currently, we estimate expected loss (percentage) by multiplying the results of a PD and LGD model. However, I am not sure whether that approach is correct either, since I am still pretty new to the industry, which is why I am taking to SO to get more perspectives. — swritchie, Feb 22 '22 at 14:01
The Frank Harrell whose blog I linked you is highly opposed to using AUC for model comparisons. Why wouldn't you use log loss for comparing models? I get that it is difficult to interpret, but "number bigger than other number" has a clear interpretation. — Dave, Feb 22 '22 at 14:11
given that you are using the same library for everything and optimising within model with log loss, it can't be too bad to use roc auc, but it would certainly be more consistent to just use logloss everywhere.[ in theory a model could output double the true probabilities and still have a high roc auc - (this is in fact what I have experienced with naive bayes)] — seanv507, Feb 22 '22 at 15:55

Best Eval Metric for Credit Scoring: ROC AUC/PR AUC/F1?

0 Answers0