3

I am trying to train a fully convolutional neural network for 3D medical image segmentation, I have started from the architecture of this paper with the differences being that I have images of varying sizes so I train the network one image at a time (no batching) and I use relus instead of prelus as the non-linearities.

The problem I am having is that the outputs of the model before the softmax/sigmoid are too large (around 1e32 each logit) and when calculating the cross entropy loss the calculation blows up and returns infinity or nan.

At first I thought this might be due to exploding gradients so I tried gradient clipping and the problem remained. After this I just took the outputs and divided them by a large number (1e32) and I started to get real values for the loss function.

My question is, what it the correct (certainly more elegant way) of achieving reasonable values for the logits , perhaps some sort of local normalisation at the end of each convolution layer?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
Miguel
  • 1,323
  • 8
  • 22
  • What package are you using to construct the model? Also, how are you initializing your weights? – Alex R. Jul 12 '17 at 20:15
  • I am using tensorflow 1.2 and initialising the weights with [truncated_normal](https://www.tensorflow.org/api_docs/python/tf/truncated_normal). I want to try the xavier initialiser but am still trying to figure out how it translates to 3D convolutions (tensorflow does have it for [2D convolutions](https://www.tensorflow.org/api_docs/python/tf/contrib/layers/xavier_initializer)). Also, all my biases start at the value of 1.0. – Miguel Jul 13 '17 at 08:30
  • P.S I have made a post regarding [Xavier initialisation for 3D convolutions](https://stats.stackexchange.com/questions/291321/xavier-initialisation-of-weights-for-3d-convolutions). – Miguel Jul 13 '17 at 09:03

1 Answers1

3

Try either removing some layers or reducing the learning rate. If explosion happens before calculating the first or second loss, reducing the LR won't help.

I had the same problem and now I'm stuck with LR=0.001. Tell me if you found something better, so I can try it too.

prometeu
  • 146
  • 5
  • 1
    I did find a solution afterwards, initialising the weights to smaller values. Before I sampled the initial weights from uniform distribution with zero mean and unit variance. Now I sample them from a distribution with zero mean and variance calculated with the Xavier Glorot method. I can use whatever learning rate I want, it doesn't explode. – Miguel Aug 05 '17 at 17:06
  • reducing the learning rate worked for me – liang Apr 16 '18 at 06:15