7

Alright so I want to calculate the F1 score for four pair of values for Precision P and Recall R. These pairs are:

Precision = (54.54, 60.0, 91.3, 95.23)
Recall = (0.02, 2.10, 0.18, 5.3)

The above is the percentage of these values

Precision1 = ( 0.5454 0.6000 0.9130 0.9523)
Recall1 = ( 0.0002 0.0210 0.0018 0.0530)

The following formula calculates the F1 score

enter image description here

In other words

Fscore = (2*Precision*Recall) / sum(Precision, Recall)

For both pairs of precision and recall values the results are:

Fscore = (0.03998534  4.05797101  0.35929165 10.04116184) #using the Precision/Recall
Fscore1 = (7.067742e-05 8.164059e-03 1.064827e-03 3.270282e-02) #Precision1/Recall1

However I am not sure about those numbers. Which of the two am I supposed to use? Am I missing something? I though the Fscore values range from 1 to 0. Do I have to normalize any of these numbers?

mdewey
  • 16,541
  • 22
  • 30
  • 57
  • Is the range of Fscore from 0 to 1 or is it only when the values are normalized like that? – forgotten_novel_char Feb 22 '15 at 02:23
  • With R you might want to use the `ROCR` library that computes any desired indicator for binary classif. I've set up [an app](https://agenis.shinyapps.io/tutorial-classifier/) to play with a classifier and see how these metrics change here. – agenis Dec 07 '17 at 11:55

3 Answers3

3

F1Score ranges from 0-1, you're right.

Keep in mind how precision and recall is calculated

precision = TP / (TP + FP)
recall = TP / (TP + FN)

with TP = True positives, FP = False positives and FN = false negatives. Therefore, precision is the fraction of retrieved instances that are relevant, while recall is the fraction of relevant instances that are retrieved (like wikipedia puts it).

Given that definition, precision and recall are both percentual values (range from 0%-100% = 0-1). Therefore you should go with your XXX1 versions, because your Precsion (without the 1) does not satisfy that criteria. But please note, that your recall is with values lower than 0,02 extremly low.

Boern
  • 173
  • 8
3

You Need to take the numbers between 0 and 1 and not the percent values. Please check the syntax, however, as I think, there is an error hidden. Precision and Recall are two vectors. You are computing sum(Precision, Recall) where I think you should compute Precision + Recall. Note, that these are not the same in R. The sum function will add all the values in both vectors to one large number, whilst the + will add element wise:

> a <- c(1, 1, 1, 1)
> b <- c(1, 1, 1, 1)
> sum(a,b)
[1] 8
> a+b
[1] 2 2 2 2

The more Precision/Recall pairs you have, the smaller the results of your function (using sum) will get, as they all have a growing denominator.

To come back to your example data, that would be:

Precision1 <- c(0.5454, 0.6000, 0.9130, 0.9523)
Recall1 <- c(0.0002, 0.0210, 0.0018, 0.0530)
Fscore_rev <- 2 * Precision1 * Recall1 / (Precision1 + Recall1)

and yield

> round(Fscore_rev, 4)
[1] 0.0004 0.0406 0.0036 0.1004
Bernhard
  • 7,419
  • 14
  • 36
2

By definition, Precision and Recall should range form 0 to 1. Use the decimal representation of Precision and Recall:

Precision <- c( 0.5454 0.6000 0.9130 0.9523 )
Recall <- c( 0.0002 0.0210 0.0018 0.0530 )
numerator <- 2*Precision*Recall
print(numerator)
[1] 0.00021816 0.02520000 0.00328680 0.10094380

denominator <- (Precision + Recall)
print(denominator)
[1] 0.5456 0.6210 0.9148 1.0053

Fscore <- numerator/denominator

The answer is:

print( Fscore)
[1] 0.0003998534 0.0405797101 0.0035929165 0.1004116184
Sandeep S. Sandhu
  • 1,602
  • 13
  • 13