In this question I'll differentiate by using lower-case for class-wise scores, e.g. prec, rec, f1, which would be vectors, and the aggregate macro-average Prec, Rec, F1. My formulae below are written mainly from the perspective of R as that's my most used language.
It's been established that the standard macro-average for the F1 score, for a multiclass problem, is not obtained by 2*Prec*Rec/(Prec+Rec) but rather by mean(f1) where f1=2*prec*rec/(prec+rec)-- i.e. you should get class-wise f1 and then take the arithmetic mean.
What I'm wondering is, why is this best? Are there any advantages or drawbacks (or benefits) to using F1 = 2*Prec*Rec/(Prec+Rec) instead?
I was also thinking about using a class-weighted mean , i.e. rather than straight up F1 = mean(f1), which is the unweighted mean, we could weight each value by the class proportion while taking the mean. Any thoughts on the validity, potential benefits/drawbacks of this procedure?
Finally, if (one of) the means are the best procedure for macro-averaging, would you recommend setting f1=0 when prec+rec=0 or is it better to just discount that class altogether by using say na.rm=TRUE
in R?