Fully Convolutional Neural Network Exploding Logits and Loss

Question

I am trying to train a fully convolutional neural network for 3D medical image segmentation, I have started from the architecture of this paper with the differences being that I have images of varying sizes so I train the network one image at a time (no batching) and I use relus instead of prelus as the non-linearities.

The problem I am having is that the outputs of the model before the softmax/sigmoid are too large (around 1e32 each logit) and when calculating the cross entropy loss the calculation blows up and returns infinity or nan.

At first I thought this might be due to exploding gradients so I tried gradient clipping and the problem remained. After this I just took the outputs and divided them by a large number (1e32) and I started to get real values for the loss function.

My question is, what it the correct (certainly more elegant way) of achieving reasonable values for the logits , perhaps some sort of local normalisation at the end of each convolution layer?

What package are you using to construct the model? Also, how are you initializing your weights? — Alex R., Jul 12 '17 at 20:15
I am using tensorflow 1.2 and initialising the weights with [truncated_normal](https://www.tensorflow.org/api_docs/python/tf/truncated_normal). I want to try the xavier initialiser but am still trying to figure out how it translates to 3D convolutions (tensorflow does have it for [2D convolutions](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/xavier_initializer)). Also, all my biases start at the value of 1.0. — Miguel, Jul 13 '17 at 08:30
P.S I have made a post regarding [Xavier initialisation for 3D convolutions](https://stats.stackexchange.com/questions/291321/xavier-initialisation-of-weights-for-3d-convolutions). — Miguel, Jul 13 '17 at 09:03

prometeu · Accepted Answer · 2017-08-04T15:21:27.423

3

Try either removing some layers or reducing the learning rate. If explosion happens before calculating the first or second loss, reducing the LR won't help.

I had the same problem and now I'm stuck with LR=0.001. Tell me if you found something better, so I can try it too.

edited Aug 04 '17 at 15:21

answered Aug 04 '17 at 14:45

prometeu

146
5

1

I did find a solution afterwards, initialising the weights to smaller values. Before I sampled the initial weights from uniform distribution with zero mean and unit variance. Now I sample them from a distribution with zero mean and variance calculated with the Xavier Glorot method. I can use whatever learning rate I want, it doesn't explode. – Miguel Aug 05 '17 at 17:06
reducing the learning rate worked for me – liang Apr 16 '18 at 06:15

Fully Convolutional Neural Network Exploding Logits and Loss

1 Answers1