1

If I have an imbalanced dataset that consists of 90% positive points and 10% negative points. Now I created a "dumb" model which always predicts every point as a positive point. The confusion matrix of this problem will be - enter image description here

Now the Precision of the above confusion matrix will be -

Precision = No. of true positive out of the number of points predicted positive by the model.

So, Precision = TP / (TP + FP) = 90 / (90 + 10) = 0.9

And Recall will be -

Recall = No. of true positive out of the number of points that are actually positive.

So, Recall = TP / (TP + FN) = 90 / (90 + 0) = 1.0

As we can see both precision and recall are high and F1score of this model is,

F1-Score = PR / P + R * 2 = 0.94

This is a high score, so according to this, the model is very good. How can the f1 score give good results on an imbalanced dataset?

1 Answers1

1

In a situation like this, it probably makes more sense to treat your minority class as your positive class. That way your precision number (and f1 score) is more meaningful.

Ryan Epp
  • 111
  • 4