Custom metrics for multiclass classification when class errors have different weights

Question

I have a multiclass classification problem (eg. the target variable is made by 4 different outcomes: Product A, Product B, Product C and NO Product). Not all the errors are equal: for example, if the true label is "Product A" and the prediction is "NO Product" it is not a big problem, while if the true label is "Product C" the impact of the error is much bigger. Basically, I have to insert this information into the loss function of the algorithm (I am currently using Xg-Boost, Random Forest, etc).

I know that it's possible, for example using scikit-learn to train algorithms using the parameter class_weight in a way that a specific class error is more important.

Is there the possibility to say, for example, that a miss-classification of product C in Product B is more problematic w.r.t a miss-classification of Product C to Product B? I am basically looking at something that penalize the model for making cross predictions.

Any idea? Thank you!

Somewhat related: ["Evaluating classification results when importance of correct classification varies with class"](https://stats.stackexchange.com/questions/429831/), ["Optimal classification rule given data, model and loss function"](https://stats.stackexchange.com/questions/429638/). — Richard Hardy, Apr 08 '20 at 16:11
For an answer that is somewhat in conflict with Stephan's, see my comment under ["When is it appropriate to use an improper scoring rule?"](https://stats.stackexchange.com/questions/208529/) and the answer by Cagdas Ozgenc. — Richard Hardy, Apr 13 '20 at 08:38

Stephan Kolassa · Answer 1 · 2020-04-09T06:53:01.760

0

Use a method that outputs predictive probabilities, as:

My prediction is a probability 0.6 that this instance is product A, 0.2 for product B, 0.15 for product C and 0.05 for NO product.

Then use a proper scoring rule to evaluate these predictive probabilities. If you can tune the objective function you supply to your model fitting algorithm, you should be able to use such a proper scoring rule. (The scoring-rule tag may be helpful.)

Of course, all the usual caveats about overfitting apply. It would be best to do this in a cross-validation setup. Also, note whether the proper scoring rule is "positively or negatively oriented", i.e., whether larger or smaller is better - different people use different conventions.

You can find more information at Why is accuracy not the best measure for assessing classification models?, which refers to two-class classification, but my answer there applies to multiclass cases, as well.

edited Apr 09 '20 at 06:53

answered Apr 08 '20 at 15:49

Stephan Kolassa

95,027
13
197
357

So if the OP is currently using XGBoost and random forest, would you suggest training them to predict probabilities rather than class labels and then using the predicted probabilities for classification based on, say, minimization of expected loss? (Also, your answer stops short of telling how to actually do classification based on estimated probabilities. But that of course is simple if one just aims at minimizing the expected loss.) – Richard Hardy Apr 08 '20 at 16:03
Thank you for the link at your ansewer @Stephan. Is it strinctly mandatory that the output of the classification algorithms is a probability or could it be just a score? Obviously the higher the score the higher the confidence for the selected class, but, for example, the sum of all the scores could be different from 1. Do you believe it could be a problem for the application of a scoring function? – A1010 Apr 08 '20 at 21:47
Well, the point of proper scoring rules is precisely that they will be minimized (or maximized, depending on the formulation) if your predicted possibilities are the true ones. If you work with outputs that measure something different, then you only have a proxy, and you don't know whether optimizing your proper scoring rule will not lead you astray, towards biased predictions. But it will probably be better than trying to optimize accuracy. – Stephan Kolassa Apr 09 '20 at 05:49
@RichardHardy: I agree that this is a very short answer. I would say it's [better to have a short answer than no answer at all.](https://stats.meta.stackexchange.com/a/5326/1352) Anyone who has a better answer can post it. – Stephan Kolassa Apr 09 '20 at 05:50
I do not think the answer is short, no problem there. However, I think it is incomplete. What is your advice for completing the task at hand, the task being not to evaluate predictive probabilities but to optimize a classification algorithm? This brings me back to my first comment, and I am curious to learn you opinion. – Richard Hardy Apr 09 '20 at 06:13
@RichardHardy: Well, the OP asked how to tune the loss function used in their model fitting, and I would propose using a proper scoring rule. Let me edit that into the post... – Stephan Kolassa Apr 09 '20 at 06:50
We are probably talking about different loss functions. Following your suggestion, there would be two loss functions: one for *fitting* the probabilities, another for *making decisions* using the fitted probabilities. Meanwhile, the OP was probably using a setup with a single loss function that played both roles at the same time. My comments tried to address this, as I think you neglected the second loss function without which no decisions are possible. I am still curious about your answer to the very first question of mine, even if the answer is "I don't know". – Richard Hardy Apr 09 '20 at 09:09
@RichardHardy: Hm. We may indeed be talking past each other. I agree that I am only addressing how to fit a probabilistic model. Making decisions based on that model (and costs of mis-decisions) is a separate step. IMO, the common conflation of the two steps is misleading, to say the least. I should probably have linked to [my perennial favorite](https://stats.stackexchange.com/a/312124/1352) for how to get from a probabilistic prediction to an action. – Stephan Kolassa Apr 09 '20 at 09:29
One of my favorite threads is ["When is it appropriate to use an improper scoring rule?"](https://stats.stackexchange.com/questions/208529/). Both my comment there and Cagdas Ozgenc's answer show that when probabilities are not the goal in themselves, proper scoring rules can be harmful. Now, it is usually the case that people are interested in making decisions, not learning probabillities for their own sake. I believe the OPs case is one of them, too. – Richard Hardy Apr 13 '20 at 08:25
@RichardHardy: you won't be surprised to hear that [I still disagree with your position](https://chat.stackexchange.com/rooms/98895/discussion-on-answer-by-matt-krause-when-is-it-appropriate-to-use-an-improper-sc), although I do agree that the thread makes for thought-provoking reading. – Stephan Kolassa Apr 13 '20 at 12:58

Custom metrics for multiclass classification when class errors have different weights

1 Answers1