Why does the 'weighted' f1-score result in a score not between precision and recall?

Question

On the F1 score sklearn page there's a section that explains each of the options for the average parameter. Under the weighted option, it says: "it can result in an F-score that is not between precision and recall."

I would like to know why this happens. Thanks

score 0 · Answer 1 · answered Feb 05 '20 at 19:57

0

the F1 score uses a harmonic mean rather than the actual mean, which accounts for the difference

answered Feb 05 '20 at 19:57

Tavi

1
1

1

Hi, this answer doesn’t address the problem because of the [generalized mean inequality](https://en.wikipedia.org/wiki/Generalized_mean). The harmonic mean always falls between the minimum and maximum (inclusive). – Arya McCarthy Apr 06 '21 at 18:09
You can improve this answer by considering the role of the weights. – Arya McCarthy Apr 06 '21 at 18:31

score 0 · Answer 2 · answered Aug 05 '21 at 03:11

It appears this can happen already with the macro average option. The statement needs some clarification, but I assume the precision and recall that are supposed to not bound the averaged F1 are themselves the same type of average.

Here's a simple example: $TP=TN=4$, $FP=1$, $FN=16$. Then $$\begin{align*} \operatorname{precision}(1)&=\frac{TP}{TP+FP}=0.8, \\ \operatorname{recall}(1)&=\frac{TP}{TP+FN}=0.2, \\ \operatorname{precision}(0)&=\frac{TN}{TN+FN}=0.2, \\ \operatorname{recall}(0)&=\frac{TN}{TN+FP}=0.8 \end{align*}$$

and so $F_1(1)=F_1(0)=0.32$, so the macro-average $F_1$ is also $0.32$. But the macro-averaged precision and recall are both $0.5$.

Why does the 'weighted' f1-score result in a score not between precision and recall?

2 Answers2