Loss function for spam detection like problems

Question

I am working on a deep learning problem where wrong classifications of fake events are not problematic, but where the opposite case is very problematic. I suppose this is similar to how spam detectors work.

My question would be: how to choose a suitable penalty function with such problems? In my personal opinion logistic loss or cross entropy would not be optimal solutions here.

How about the best scoring method? Is ROC_AUC valid in that case? — user1877600, Jul 26 '16 at 13:01
Of possible interest: https://stats.stackexchange.com/questions/464636/proper-scoring-rule-when-there-is-a-decision-to-make-e-g-spam-vs-ham-email — Dave, Dec 07 '21 at 13:06

score 1 · Answer 1 · answered Feb 13 '20 at 21:49

AUC will not help you determine how to choose a loss function. AUC can help you to choose between different classifiers irrespective of a decision threshold (e.g., IF p > 0.3 THEN "spam") but it will not help you minimize losses because you haven't defined the nature of those losses. My first question is this:

are you choosing between classifiers (models), or have you already selected a model and need to use the output of that model to make a decision?

The whole point of a loss function is that the most accurate discrimination may not be most 'cost'-effective. In other words, if false positives are expensive but false negatives are cheap, then you MIGHT NOT WANT a model that makes the fewest errors.

You say that false negatives (spam gets labelled as "not-spam") are not costly but that false positives are (not-spam gets labelled as "spam"). In this case, you probably want a relatively conservative model, that only labels something as "spam" when there is strong evidence (e.g., high probability).

Ultimately a loss function isn't something you extract from your training data. The data you use to train a classifier will probably not help you determine the relative costs of mistakes--that part needs to come from your knowledge about the use case/application/etc.

Can you take your question a bit further and describe how costly those alternatives are, relative to one another?

score 0 · Answer 2 · answered Jul 07 '16 at 17:40

I don't think the loss function (log loss, squared loss, etc) would change the rank order of your predictions, so I don't think the learning rate is particularly important for your problem. What matters most is how you set your classes' probability threshold. For example, is a prediction of 49% spam probability considered low enough to not be spam? What about 30%? 10%? If I've interpreted your question correctly, it sounds like you want to catch as much as spam as possible even at the expense of sometimes misclassifying benign cases. In this case you just have to determine how far you're willing to make that tradeoff, which comes down to selecting the probability threshold that you want to use to separate the classes. As you move your probability threshold further from 50% you'll see more false positives, but you'll get more of the spam cases.

score 0 · Answer 3 · answered Jul 07 '16 at 19:28

0

You just need to choose appropriate threshold for your model by inspecting the ROC curve for acceptable true positive/false positive rates.

answered Jul 07 '16 at 19:28

Ashay Tamhane

101
1
4

Thank you for your comment. So the obvious way for dealing with classifier is the best one. – user1877600 Jul 08 '16 at 18:39

score 0 · Answer 4 · answered Feb 13 '20 at 21:35

I suppose you meant to say that Type II error is more costly to you than Type I error. Though the way you formulated the problem is confusing: what is "classification of fake event"? I assumed it's a false positive, i.e. you declared something an event where in fact it wasn't (Type I error).

You need to prioritize optimization of the Type II error, i.e. increase the statistical power of your classifier.

Loss function for spam detection like problems

4 Answers4