2

In this question I'll differentiate by using lower-case for class-wise scores, e.g. prec, rec, f1, which would be vectors, and the aggregate macro-average Prec, Rec, F1. My formulae below are written mainly from the perspective of R as that's my most used language.

It's been established that the standard macro-average for the F1 score, for a multiclass problem, is not obtained by 2*Prec*Rec/(Prec+Rec) but rather by mean(f1) where f1=2*prec*rec/(prec+rec)-- i.e. you should get class-wise f1 and then take the arithmetic mean.

What I'm wondering is, why is this best? Are there any advantages or drawbacks (or benefits) to using F1 = 2*Prec*Rec/(Prec+Rec) instead?

I was also thinking about using a class-weighted mean , i.e. rather than straight up F1 = mean(f1), which is the unweighted mean, we could weight each value by the class proportion while taking the mean. Any thoughts on the validity, potential benefits/drawbacks of this procedure?

Finally, if (one of) the means are the best procedure for macro-averaging, would you recommend setting f1=0 when prec+rec=0 or is it better to just discount that class altogether by using say na.rm=TRUE in R?

Mobeus Zoom
  • 220
  • 1
  • 5

1 Answers1

2

I just had a similar question. Though I do not yet fully understand it, this paper* performs an in-depth analysis of the difference of the scores.

They shows that 2PrecRec/(Prec+Rec), where Prec and Rec are averages over classes, leads to some issues:

one formula well 'rewards' classifiers which produce a skewed error type distribution

Apparently the average over class-wise F1 scores does not suffer from this issue. Thus it would be preferable.


* Opitz, J. and Burst, S., 2019. Macro F1 and Macro F1. arXiv preprint arXiv:1911.03347.