8

The article for F-measure in Wikipedia says:

The traditional F-measure or balanced F-score (F1 score) is the harmonic mean of precision and recall: $F_1=2\times\frac{precision \times recall}{precision+recall}$

Why is the harmonic mean used in particular, and not the arithmetic mean or geometric mean or any other type of averages?

What exactly does it mean, to calculate an harmonic mean?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
e9t
  • 250
  • 2
  • 7
  • 2
    [From wikipedia](http://en.wikipedia.org/wiki/Harmonic_mean#Harmonic_mean_of_two_numbers): `Typically, it is appropriate for situations when the average of rates is desired`. This thread might also be of interest: ["Which “mean” to use and when?"](http://stats.stackexchange.com/q/23117/10525) –  Sep 19 '12 at 15:59
  • 1
    In fact it is more that F is defined as a harmonic mean of precision and recall rather that F is defined as a mean of precision and recall and for some reason we are using harmonic mean. –  Sep 19 '12 at 18:54
  • [StackOverflow](https://stackoverflow.com/questions/26355942/why-is-the-f-measure-a-harmonic-mean-and-not-an-arithmetic-mean-of-the-precision) also has a nice explanation on this. – CodeBlooded Jan 10 '21 at 07:34

1 Answers1

7

The F-measure is often used in the natural language recognition field for means of evaluation. In particular, the F-measure was employed by the Message Understanding Conference (MUC), in order to evaluate named entity recognition (NER) tasks. Directly quoted from A survey of named entity recognition and classification written by D. Nadeau:

The harmonic mean of two numbers is never higher than the geometrical mean. It also tends towards the least number, minimizing the impact of large outliers and maximizing the impact of small ones. The F-measure therefore tends to privilege balanced systems.

e9t
  • 250
  • 2
  • 7