Can F1-Score be higher than accuracy?

Question

I'm using sklearn's confusion_matrix and classification_report methods to compute the confusion matrix and F1-Score of a simple multiclass classification project I'm doing.

For some classes the F1-Score that I'm getting is higher than the accuracy and this seems strange to me. Is it possible or am I doing something wrong?

score 5 · Accepted Answer · answered Nov 12 '20 at 18:02

This is definitely possible, and not strange at all.

Recall how accuracy and the F1 score are defined:

$$\text{Accuracy}=\frac{TP+TN}{TP+TN+FP+FN}\quad\text{and}\quad \text{F1}=\frac{2TP}{2TP+FP+FN}. $$

Now, probably the simplest possible way your F1 score can be greater than your accuracy is if you have just two observations, one TRUE and one FALSE. Suppose you classify both as TRUE. Then

$$ TP=1,\quad TN=0,\quad FP=1,\quad FN=0, $$

so

$$\text{Accuracy}=\frac{TP+TN}{TP+TN+FP+FN}=\frac{1+0}{1+0+1+0}=\frac{1}{2}$$

and

$$\text{F1}=\frac{2TP}{2TP+FP+FN}=\frac{2\times 1}{2\times 1+1+0}=\frac{2}{3} $$

and there you are.

Actually, you can create slightly bigger examples with the following R script. It considers pp TRUE and nn FALSE observations, and rolls out all the different combinations of predictions you might have. (It actually double-counts some constellations, but reducing these out would have made the script less legible.)

pp <- 2
nn <- 2
actuals <- c(rep(T,pp),rep(F,nn))
predictions <- expand.grid(lapply(1:(pp+nn),function(xx)c(T,F)))

accuracy_and_F1 <- function(actuals,pred) {
    TP <- sum(actuals & pred)
    TN <- sum(!actuals & !pred)
    FP <- sum(!actuals & pred)
    FN <- sum(actuals & !pred)
    structure(c((TP+TN)/(TP+TN+FP+FN),2*TP/(2*TP+FP+FN)),.Names=c("Accuracy","F1"))
}

result <- cbind(predictions,
    t(apply(predictions,1,function(xx)accuracy_and_F1(actuals,xx))))
result$F1_greater_than_Accuracy <- with(result,F1>Accuracy)
result

In this example with pp=2 and nn=2, the result is

    Var1  Var2  Var3  Var4 Accuracy        F1 F1_greater_than_Accuracy
1   TRUE  TRUE  TRUE  TRUE     0.50 0.6666667                     TRUE
2  FALSE  TRUE  TRUE  TRUE     0.25 0.4000000                     TRUE
3   TRUE FALSE  TRUE  TRUE     0.25 0.4000000                     TRUE
4  FALSE FALSE  TRUE  TRUE     0.00 0.0000000                    FALSE
5   TRUE  TRUE FALSE  TRUE     0.75 0.8000000                     TRUE
6  FALSE  TRUE FALSE  TRUE     0.50 0.5000000                    FALSE
7   TRUE FALSE FALSE  TRUE     0.50 0.5000000                    FALSE
8  FALSE FALSE FALSE  TRUE     0.25 0.0000000                    FALSE
9   TRUE  TRUE  TRUE FALSE     0.75 0.8000000                     TRUE
10 FALSE  TRUE  TRUE FALSE     0.50 0.5000000                    FALSE
11  TRUE FALSE  TRUE FALSE     0.50 0.5000000                    FALSE
12 FALSE FALSE  TRUE FALSE     0.25 0.0000000                    FALSE
13  TRUE  TRUE FALSE FALSE     1.00 1.0000000                    FALSE
14 FALSE  TRUE FALSE FALSE     0.75 0.6666667                    FALSE
15  TRUE FALSE FALSE FALSE     0.75 0.6666667                    FALSE
16 FALSE FALSE FALSE FALSE     0.50 0.0000000                    FALSE

So we see that having an F1 score greater than accuracy is a pretty common occurrence in this simple example already.

That said, accuracy is not a very good measure of predictive power: Why is accuracy not the best measure for assessing classification models? And every criticism against accuracy there applies equally to the F1 (and every other F$\beta$) score.

Can F1-Score be higher than accuracy?

1 Answers1