This is definitely possible, and not strange at all.
Recall how accuracy and the F1 score are defined:
$$\text{Accuracy}=\frac{TP+TN}{TP+TN+FP+FN}\quad\text{and}\quad \text{F1}=\frac{2TP}{2TP+FP+FN}. $$
Now, probably the simplest possible way your F1 score can be greater than your accuracy is if you have just two observations, one TRUE and one FALSE. Suppose you classify both as TRUE. Then
$$ TP=1,\quad TN=0,\quad FP=1,\quad FN=0, $$
so
$$\text{Accuracy}=\frac{TP+TN}{TP+TN+FP+FN}=\frac{1+0}{1+0+1+0}=\frac{1}{2}$$
and
$$\text{F1}=\frac{2TP}{2TP+FP+FN}=\frac{2\times 1}{2\times 1+1+0}=\frac{2}{3} $$
and there you are.
Actually, you can create slightly bigger examples with the following R script. It considers pp
TRUE and nn
FALSE observations, and rolls out all the different combinations of predictions you might have. (It actually double-counts some constellations, but reducing these out would have made the script less legible.)
pp <- 2
nn <- 2
actuals <- c(rep(T,pp),rep(F,nn))
predictions <- expand.grid(lapply(1:(pp+nn),function(xx)c(T,F)))
accuracy_and_F1 <- function(actuals,pred) {
TP <- sum(actuals & pred)
TN <- sum(!actuals & !pred)
FP <- sum(!actuals & pred)
FN <- sum(actuals & !pred)
structure(c((TP+TN)/(TP+TN+FP+FN),2*TP/(2*TP+FP+FN)),.Names=c("Accuracy","F1"))
}
result <- cbind(predictions,
t(apply(predictions,1,function(xx)accuracy_and_F1(actuals,xx))))
result$F1_greater_than_Accuracy <- with(result,F1>Accuracy)
result
In this example with pp=2
and nn=2
, the result is
Var1 Var2 Var3 Var4 Accuracy F1 F1_greater_than_Accuracy
1 TRUE TRUE TRUE TRUE 0.50 0.6666667 TRUE
2 FALSE TRUE TRUE TRUE 0.25 0.4000000 TRUE
3 TRUE FALSE TRUE TRUE 0.25 0.4000000 TRUE
4 FALSE FALSE TRUE TRUE 0.00 0.0000000 FALSE
5 TRUE TRUE FALSE TRUE 0.75 0.8000000 TRUE
6 FALSE TRUE FALSE TRUE 0.50 0.5000000 FALSE
7 TRUE FALSE FALSE TRUE 0.50 0.5000000 FALSE
8 FALSE FALSE FALSE TRUE 0.25 0.0000000 FALSE
9 TRUE TRUE TRUE FALSE 0.75 0.8000000 TRUE
10 FALSE TRUE TRUE FALSE 0.50 0.5000000 FALSE
11 TRUE FALSE TRUE FALSE 0.50 0.5000000 FALSE
12 FALSE FALSE TRUE FALSE 0.25 0.0000000 FALSE
13 TRUE TRUE FALSE FALSE 1.00 1.0000000 FALSE
14 FALSE TRUE FALSE FALSE 0.75 0.6666667 FALSE
15 TRUE FALSE FALSE FALSE 0.75 0.6666667 FALSE
16 FALSE FALSE FALSE FALSE 0.50 0.0000000 FALSE
So we see that having an F1 score greater than accuracy is a pretty common occurrence in this simple example already.
That said, accuracy is not a very good measure of predictive power: Why is accuracy not the best measure for assessing classification models? And every criticism against accuracy there applies equally to the F1 (and every other F$\beta$) score.