3

I have a question with regard to the proper way to report F1 scores. Say I am comparing two algorithms one with F1 score of 0.71 and the other of 0.82.

Is it correct to say:

"Algorithm 1 obtained an F1 score 11 points higher than algorithm 2"

or

"Algorithm 1 obtained an F1 score 11 percentage points higher than algorithm 2"

or

"Algorithm 1 obtained an F1 score 0.11 points higher than algorithm 2".

Or none of these? Some other way? I suppose a second question for this is if it is correct to report the scores as 0.71 and 0.82 or more correct to say 71% and 82%.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
astel
  • 1,388
  • 5
  • 17

2 Answers2

4

I would certainly not write about "11 points higher". The F1 score is a number between 0 and 1 and can be interpreted as a percentage, so "11 percentage points" is defensible, but it's certainly not standard to refer to 0.01 as "1 point".

I believe everyone will be quite as comfortable with "71%" as with "0.71".

And of course the F1 (or F$\beta$) score suffers from all the same problems that accuracy as an evaluation metric has.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
0

Yes, you can interpret F1 score as 71% in one model and 82% in another model. Calling them as higher or lower than another model is not the correct way.