Evaluating binary classifier model. What can say precision, recall etc.?

Question

i'm trying to understand wether my model has good performance or not. I have binary classifier for summarization sentences: important or not (extractive approach) on specific corpus.

Dataset is imbalanced: class 1 - 8K samples, 0 - 32K

As i understand accuracy is not valid metric, and AUC ROC too because of imbalance. So i use average precision, F1, F2 metrics. As baseline i've used SVM, it shows AUC ROC 78% but AP - 20%.

I use CNN with words embeddings and have such values:

Accuracy: 27% AUC ROC - 70% Average precision - 40% Precison class 1 - 27%, class 0 - 95% Recall class 1 - 85%, class 0 - 55% F2 - 59% F1 - 41%

In my task it's better that classifier can find positive class. But i'm worrying about false positives.

I wonder when can i say, that model is good enough? I've read some articles where authors says different : F1 should be > 50%, others - it could be 40-50%.

so my question is model have good performance or i should tune it?

if you have questions, don't hesitate)

i'm familiar with that topic, my question is about other metrics — Karmanoid, Jan 13 '19 at 15:18
Everything written at the other thread applies equally to metrics like the F1, F$\beta$ etc. Use proper scoring rules. — Stephan Kolassa, Jan 13 '19 at 15:28
the problem not in a list of proper scoring rules, the problem is in values of metric. If you know some benchmarks for my question i'll be appreciated — Karmanoid, Jan 13 '19 at 15:57
It makes no sense to discuss benchmarks in absolute terms. Each problem is different. 16.7% accuracy is state of the art for predicting a six-sided die, and if you can reach more, you should be earning your money in Las Vegas, but if you can't reach 50% accuracy in predicting a coin toss, you have a problem. Each situation is different. [This article of mine](https://ideas.repec.org/a/for/ijafaa/y2008i11p6-14.html) is on forecasting, but the arguments applies elsewhere. [How to know that your machine learning problem is hopeless?](https://stats.stackexchange.com/q/222179/1352) is related. — Stephan Kolassa, Jan 13 '19 at 16:03
@StephanKolassa, thank you for the links, i definitely read them) — Karmanoid, Jan 13 '19 at 16:44

Evaluating binary classifier model. What can say precision, recall etc.?

0 Answers0