3

I am trying to make this network but here are the problems I find:

  1. In case of more than 1 hidden layer, I get nans as losses.
  2. In case of a single hidden layer, the loss first increase and then decreases.

    Code(starts line 58 onwards, before that is data generation): https://github.com/abhigenie92/multiple_class_NN/blob/master/multiple_class_NN.py Slides based on(23,24,25): http://www.deeplearningforcomputervision.com/uploads/9/6/6/6/96660590/lecture_7.pdf I am using numerically stable computations. Any help would be great.

Abhishek Bhatia
  • 461
  • 4
  • 13

1 Answers1

2
  1. Have you tried regularizing the network by penalizing large weights (slides 30-40 in the linked PDF)? I didn't see anything like that when I glanced at your code, but it should be very straightforward to add it: all you need to do is add up the squares of your weights, multiply the result by a small constant, and add it to your loss function.

  2. If your loss increases, and you don't have a bug somewhere, the most likely explanation is that your learning rate is too high.

This question addresses both regularization ("weight decay") and learning rates.

David J. Harris
  • 11,178
  • 2
  • 30
  • 53