I am training a neural network for multilabel classification, with a large number of classes (1000). Which means more than one output can be active for every input. On an average, I have two classes active per output frame. On training with a cross entropy loss the neural network resorts to outputting only zeros, because it gets the least loss with this output since 99.8% of my labels are zeros. Any suggestions on how I can push the network to give more weight to the positive classes?
-
What are you using as software? Python + Keras? – Tommaso Guerrini Feb 10 '17 at 14:49
-
Btw: 99.8% is just a number, you know that a 0.2% of error on average corresponds to 0.002*1000, so 2 wrong labels per training instance on average. BTW are you using categorical cross_entropy or binary_crossentropy with sigmoids on the last layer? – Tommaso Guerrini Feb 10 '17 at 14:52
-
@TommasoGuerrini used python+ keras, sigmoid and binary_crossentropy. Now testing with categorical_crossentropy, the network is outputting values closer to 1 now. But the loss is too high for now. Waiting to see how it trains over more epochs now. – Yakku Feb 10 '17 at 15:14
-
@TommasoGuerrini I did not understand the purpose of the callback. – Yakku Feb 10 '17 at 15:27
-
my bad, just an example on which value of loss makes sense – Tommaso Guerrini Feb 10 '17 at 15:29
-
@TommasoGuerrini just fyi, got a loss of less than 0.01 in just 3 epochs with binary, and it continues to stay around 0.01 forever. – Yakku Feb 10 '17 at 15:37
-
how many training instances you got? batch size? – Tommaso Guerrini Feb 10 '17 at 15:42
-
200,000 instances, tried batch sizes of 8 and 64 can't go beyond due to memory constraints. The network has approximately the same number of parameters as the instances. – Yakku Feb 10 '17 at 15:51
-
*inputsize* = $2*10^5$ right? Uhm, you may look for someone with more expertise than me.. I could just think about Dropout to fasten the training with so many input parameters.. Or create your custom loss function where you give weights according to the class distribution (you don't solve the *overzeros* problem, but it may help) – Tommaso Guerrini Feb 10 '17 at 16:05
-
you may try sparse_categorical_crossentropy .. By the way: when training don't just look at the loss function, look also at the binary_accuracy ok? I have a similar case to yours and using mean squared error as loss function I obtained a better binary accuracy than when using binary logloss :) – Tommaso Guerrini Feb 10 '17 at 16:10
-
@TommasoGuerrini I have a multilabel loss function which i calculate for every epoch. I could not convert it to the keras format, so cant use it for backpropagation. Though the mse loss was 0.01, my metric was really high, thats how i figured the network was outputting only zeros, inorder to reduce the mse. – Yakku Feb 10 '17 at 16:13
-
post the function I'll try to convert it to the keras format for you – Tommaso Guerrini Feb 10 '17 at 16:15
-
Ahh thanks, but It needs some external data to measure the loss. So need to store them in the memory and stuff. It might need some workaround. I posted the problem here wondering if someone else also had faced similar problems and wanted to know what methods worked for them. – Yakku Feb 10 '17 at 16:25
2 Answers
Tensorflow has a loss function weighted_cross_entropy_with_logits
, which can be used to give more weight to the 1's. So it should be applicable to a sparse multi-label classification setting like yours.
From the documentation:
This is like sigmoid_cross_entropy_with_logits() except that pos_weight, allows one to trade off recall and precision by up- or down-weighting the cost of a positive error relative to a negative error.
The argument pos_weight is used as a multiplier for the positive targets
If you use the tensorflow backend in Keras, you can use the loss function like this (Keras 2.1.1):
import tensorflow as tf
import keras.backend.tensorflow_backend as tfb
POS_WEIGHT = 10 # multiplier for positive targets, needs to be tuned
def weighted_binary_crossentropy(target, output):
"""
Weighted binary crossentropy between an output tensor
and a target tensor. POS_WEIGHT is used as a multiplier
for the positive targets.
Combination of the following functions:
* keras.losses.binary_crossentropy
* keras.backend.tensorflow_backend.binary_crossentropy
* tf.nn.weighted_cross_entropy_with_logits
"""
# transform back to logits
_epsilon = tfb._to_tensor(tfb.epsilon(), output.dtype.base_dtype)
output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
output = tf.log(output / (1 - output))
# compute weighted loss
loss = tf.nn.weighted_cross_entropy_with_logits(targets=target,
logits=output,
pos_weight=POS_WEIGHT)
return tf.reduce_mean(loss, axis=-1)
Then in your model:
model.compile(loss=weighted_binary_crossentropy, ...)
I have not found many resources yet which report well working values for the pos_weight
in relation to the number of classes, average active classes, etc.
-
1Also, it might be a good idea to evaluate the f-measure in a callback after each epoch when tuning the hyperparameters (such as pos_weights). – tobigue Nov 15 '17 at 16:56
-
1Is there a corresponding `weighted_binary_accuracy` metric that can be used for the model as well? – CMCDragonkai Oct 21 '19 at 08:20
-
Lifesaver, but I could also use something like `weighted_binary_accuracy` – David Cian Jun 16 '20 at 17:26
-
You can just use [binary accuracy](https://stackoverflow.com/questions/57331013/custom-keras-binary-crossentropy-loss-function-not-working) actually, unless you really want to weigh the accuracy as well – David Cian Jun 16 '20 at 17:50
-
about the proper values for `pos_weight`, documenation suggests that any value above 1 increase recall, while any value less than 1 increase precision. – Naveen Reddy Marthala Oct 27 '21 at 12:08
-
i am using tf.keras. i have dense as my final layer, with number of units equal to number of unique labels. should i use no activation or sigmoid activation in my final layer, while using this loss? i shouldn't, correct? – Naveen Reddy Marthala Nov 09 '21 at 07:46
Update for tensorflow 2.6.0:
I was going to write a comment but there are many things that needs to be changed for @tobigue answer to work. And I am not entirely sure if everything is correct with my answer. To make things work:
- You need to replace
import keras.backend.tensorflow_backend as tfb
withimport keras.backend as tfb
- The
target
parameter intf.nn.weighted_cross_entropy_with_logits
needs to be changed tolabels
tf.log
needs to be called like this:tf.math.log
- To make this custom loss function to work with keras, you need to import
get_custom_objects
and define the custom loss function as a loss function. So,from keras.utils.generic_utils import get_custom_objects
and then before you compile the model you need to:get_custom_objects().update({"weighted_binary_crossentropy": weighted_binary_crossentropy})
- I also encountered this error but it may not be the same for everyone. The error is:
TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type int32 of argument 'x'.
To fix this error, I have converted thetarget
tofloat32
like this:target = tf.cast(target, tf.float32)
So, the final code that I am using is this:
import tensorflow as tf
import keras.backend as tfb
from keras.utils.generic_utils import get_custom_objects
POS_WEIGHT = 10 # multiplier for positive targets, needs to be tuned
def weighted_binary_crossentropy(target, output):
"""
Weighted binary crossentropy between an output tensor
and a target tensor. POS_WEIGHT is used as a multiplier
for the positive targets.
Combination of the following functions:
* keras.losses.binary_crossentropy
* keras.backend.tensorflow_backend.binary_crossentropy
* tf.nn.weighted_cross_entropy_with_logits
"""
# transform back to logits
_epsilon = tfb._to_tensor(tfb.epsilon(), output.dtype.base_dtype)
output = tf.clip_by_value(output, _epsilon, 1 - _epsilon)
output = tf.math.log(output / (1 - output))
# compute weighted loss
target = tf.cast(target, tf.float32)
loss = tf.nn.weighted_cross_entropy_with_logits(labels=target,
logits=output,
pos_weight=POS_WEIGHT)
return tf.reduce_mean(loss, axis=-1)
Then in your model
get_custom_objects().update({"weighted_binary_crossentropy": weighted_binary_crossentropy})
model.compile(loss='weighted_binary_crossentropy', ...)

- 111
- 3
-
i am using tf.keras. i have dense as my final layer, with number of units equal to number of unique labels. should i use no activation or sigmoid activation in my final layer, while using this loss? i shouldn't, correct? – Naveen Reddy Marthala Nov 09 '21 at 07:48