7

Binary cross entropy for multi-label classification can be defined by the following loss function:

$$-\frac{1}{N}\sum_{i=1}^N [y_i \log(\hat{y}_i)+(1-y_i) \log(1-\hat{y}_i)]$$

Why does keras binary_crossentropy loss function return different values? What is formula bellow them? I tried to read source code but it's not easy to understand.

Updated

The code that gives approximately the same result like Keras:

import keras.backend as K
def binary_crossentropy(y_true, y_pred):
    result = []
    for i in range(len(y_pred)):
        y_pred[i] = [max(min(x, 1 - K.epsilon()), K.epsilon()) for x in y_pred[i]]
        result.append(-np.mean([y_true[i][j] * math.log(y_pred[i][j]) + (1 - y_true[i][j]) * math.log(1 - y_pred[i][j]) for j in range(len(y_pred[i]))]))
    return np.mean(result)
Dmitry
  • 181
  • 1
  • 1
  • 6

1 Answers1

10

A mistake in your code:

$$-\frac{1}{N}\sum_{i=1}^N [\color{red}{\hat{y}_i} \log(\hat{y}_i)+(1-y_i) \log(1-\hat{y}_i)]$$

It should be

$$-\frac{1}{N}\sum_{i=1}^N [\color{blue}{y_i} \log(\hat{y}_i)+(1-y_i) \log(1-\hat{y}_i)]$$

Your code:

result.append([y_pred[i][j] * math.log(y_pred[i][j]) + (1 - y_true[i][j]) * math.log(1 - y_pred[i][j]) for j in range(len(y_pred[i]))])

should be changed to

result.append([y_true[i][j] * math.log(y_pred[i][j]) + (1 - y_true[i][j]) * math.log(1 - y_pred[i][j]) for j in range(len(y_pred[i]))])

where I have change your first y_pred to y_true.

Edit: Also from keras documentation, we have

binary_crossentropy(y_true, y_pred)

rather than

binary_crossentropy(y_pred, y_true)
Siong Thye Goh
  • 6,431
  • 3
  • 17
  • 28