What is the difference of "normal" F1 and macro average F1 score with binary classification

Question

Please note that I always talk about binary classification here. I do not speak about multi class classification.

In case of unbalanced binary datasets it is a good practice to use F1 score. While the positive label is always the rare case.

Now some ppl. are using something called macro average F1 score. This is used in:

Sklean classification_report: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html
As evaluation metric here: https://projects.fzai.h-da.de/iggsa/evaluation-tool/

the official ranking of the systems will be based on the macro-average f-score only

The macro average F1 score is the mean of F1 score regarding positive label and F1 score regarding negative label.

Example from a sklean classification_report of binary classification of hate and no-hate speech:

f1-score Hate-Speech: 0.62
f1-score No-Hate-Speech: 0.76
F1 macro avg: 0.69 = (0.62 + 0.76) / 2

My question is: When is it useful to use "normal F1" and when is it better to use this macro average F1 in binary classification case? From my point of view and from my experience only the "normal" F1 Score is useful and macro average is only useful in case of multi class classification.

"In case of unbalanced binary datasets it is a good practice to use F1 score." I dispute this. [Everything I write here applies equally to the F1 score.](https://stats.stackexchange.com/a/312787/1352) — Stephan Kolassa, Oct 29 '19 at 13:53

What is the difference of "normal" F1 and macro average F1 score with binary classification

0 Answers0