Relation between learning rate and number of hidden layers?

Question

Is there any rule of thumb between depth of a neural network and learning rate? I have been noticing that the deeper the network is, the lower the learning rate must be.

If that's correct, why is that?

a related discussion for a boosting model. http://stats.stackexchange.com/questions/168666/boosting-why-is-the-learning-rate-called-a-regularization-parameter — Haitao Du, Sep 02 '16 at 03:47
The discussion provides useful information but It doesn't answer my question. Could you please comment on it? — user_1177868, Sep 02 '16 at 12:20
ye, that's why i put it in comment but not answer, and upvoted your questions. — Haitao Du, Sep 02 '16 at 12:45

score 6 · Answer 1 · answered Aug 01 '17 at 11:39

This question has been answered here:

With neural networks, should the learning rate be in some way proportional to hidden layer sizes? Should they affect each other?

Short answer is yes, there is a relation. Though, the relation is not this trivial, all I can tell you that what you see is because the optimization surface becomes more complex as the the number of hidden layers increase, therefore smaller learning rates are generally better. While stucking in local minima is a possibility with low learning rate, it's much better than complex surface and high learning rate.

Relation between learning rate and number of hidden layers?

1 Answers1