Is there any rule of thumb between depth of a neural network and learning rate? I have been noticing that the deeper the network is, the lower the learning rate must be.
If that's correct, why is that?
Is there any rule of thumb between depth of a neural network and learning rate? I have been noticing that the deeper the network is, the lower the learning rate must be.
If that's correct, why is that?
This question has been answered here:
Short answer is yes, there is a relation. Though, the relation is not this trivial, all I can tell you that what you see is because the optimization surface becomes more complex as the the number of hidden layers increase, therefore smaller learning rates are generally better. While stucking in local minima is a possibility with low learning rate, it's much better than complex surface and high learning rate.