0

Normal F1-score using binarized prediction can be described like this:

$$F_1 = \frac{2 \cdot TP }{2 \cdot TP + FP + FN}$$

But in a loss function for a Machine Learning model, you will typically need to consider the class probabilities given by the model in order to calculate gradients, and Soft F1-loss lets us do that.

I want to describe Soft F1 loss in a mathematical way and the Python code looks like this:

y = tf.cast(y_true, tf.float32)
y_hat = tf.cast(y_pred, tf.float32)
tp = tf.reduce_sum(y_hat * y, axis=0)
fp = tf.reduce_sum(y_hat * (1 - y), axis=0)
fn = tf.reduce_sum((1 - y_hat) * y, axis=0)
tn = tf.reduce_sum((1 - y_hat) * (1 - y), axis=0)
soft_f1_class1 = 2*tp / (2*tp + fn + fp + 1e-16)
soft_f1_class0 = 2*tn / (2*tn + fn + fp + 1e-16)
cost_class1 = 1 - soft_f1_class1 
cost_class0 = 1 - soft_f1_class0 
cost = 0.5 * (cost_class1 + cost_class0) 
macro_cost = tf.reduce_mean(cost)

To demonstrate how things work

y_true = np.array([1,0,1,0])
y_pred = np.array([0.8,0.3,0.4,0.7])

gives

tp = 1.2,

tn = 1.0,

fp = 1.0,

fn = 0.8

So how can I formulate this F1 soft loss with mathematical formulas?

bjornsing
  • 159
  • 10
  • 2
    Even if you take class probabilities into account, your F1 score will be misleading for all the same reasons as accuracy is. [Why is accuracy not the best measure for assessing classification models?](https://stats.stackexchange.com/q/312780/1352) Better to use proper scoring rules, which will immediately give you simple gradients. – Stephan Kolassa Dec 05 '21 at 20:38
  • You might have a point here @StephanKolassa , but in my case, I have actually compared BCE, weighted BCE, Focal loss and F1-loss and found that F1-loss actually outperformed the other loss functions. On the other hand, it is important to mention that the data set used was highly imbalanced and the classification task was to classify 30 different classes. – bjornsing Dec 05 '21 at 20:53
  • 2
    “Outperformed” in what sense? I think the way that you and Stephan mean the term are very different, so it would be good to be precise here. – Arya McCarthy Dec 05 '21 at 20:57
  • I mean outperformed in terms of the metrics being used to measure the performance of the models predictions on the validation set. – bjornsing Dec 05 '21 at 21:01
  • 2
    Many metrics used in classification are misleading. Among them accuracy, F1, precision, recall etc. As in: optimizing these metrics may feel rewarding, but may not advance the goal we actually have in classifying. – Stephan Kolassa Dec 05 '21 at 21:09

0 Answers0