What is the best metric for machine learning model to predict customer probability to buy

Question

I'm building a machine learning model to predict customer's propensity to buy (the likelihood that a customer buying a product). The purpose is rank the customers with probability score for customer targeting. Performance on binary outcome is not priority.

I'm seeking expert opinion what metric we are supposed to use in this context. (auroc, logloss, f1 ... etc?). I have seen some conflicting opinions online.

What metric should I use if my dataset is highly unbalanced in this case? (buy vs not buy: 1:99).

Detailed explanations is highly appreciated!

It would help to post what arguments you have seen online in favor of threshold-based metrics like $F_1$. Proper scoring rules like log loss and Brier score are good choices for this task, as they seek the “correct” probability values. — Dave, Jun 18 '21 at 02:56
Echoing @Dave: there is *a lot* of actively harmful information floating around on the internet, IMO opinion driven by people who know more about programming than about statistics and probabilities. Tim's answer is very good. Also, [unbalanced data is not a problem](https://stats.stackexchange.com/q/357466/1352) if you use appropriate error measures. That said, +1 for asking here and not just blindly using what you found first! — Stephan Kolassa, Jun 18 '21 at 05:04

Tim · Accepted Answer · 2021-06-18T14:16:34.267

7

Certainly, you shouldn’t use the common classification metrics like accuracy. They don’t do much good about having the probabilities correct.
If you want to estimate the probabilities precisely, you need proper scoring rules (see other questions tagged as scoring-rules), like Brier score (squared error) or log loss (aka cross-entropy loss). There was recently an interesting paper by Hui and Belkin (2020) showing that using squared error as a loss function for a classier may give as good if not better results as compared to the “default” log loss.
On another hand, you are saying that you want to use the probabilities to rank the customers, that’s a different problem. For ranking, you don’t care that much about the probabilities being correct, as far as they’re ordered correctly. There are specialised metrics like mean percentage ranking, mean reciprocal score, top-$k$ accuracy, precission@$k$, etc. Assuming it is a ranking problem, you probably should consider using specialized ranking algorithms as well.

The choice depends on how exactly you want to use the results and details about your data.

edited Jun 18 '21 at 14:16

answered Jun 18 '21 at 04:52

Tim

108,699
20
212
390

AUROC a.k.a. Gini index a.k.a. Somers Dxy is also a rank-based metric. – Scortchi - Reinstate Monica Jun 18 '21 at 14:09
1

@Scortchi-ReinstateMonica I find it very confusing as a metric, so I have my small crusade, where I ignore its existence altogether. – Tim Jun 18 '21 at 14:20
Thank you all for your input. I will look into the resources you provided. using the right metric for a specific type of problem is essential in machine learning task. I also feel lots of incorrect information floating around online. This is why I'm asking this question to clarify. Hope it's beneficial to the ml community. @ Arya McCarthy hope you also learn something new from Tim's post. Please do not simply dismiss the question. – zesla Jun 18 '21 at 14:55
AUROC optimize on the classifier's ability to rank positive cases higher than negative cases. It probably does not have much to do with ranking among positive case, as well as ranking among negative cases. (not sure if that's correct?). – zesla Jun 18 '21 at 15:11

What is the best metric for machine learning model to predict customer probability to buy

1 Answers1

Linked