I am trying to implement an exponential learning rate decay with the Adam optimizer for a LSTM. I do not want the 'staircase = true' version. The decay_steps for me feels like the number of steps that the learning rate keeps constant. But I am not sure about this and Tensorflow has not stated it in their documentation. Any help is much appreciated.
Asked
Active
Viewed 4,816 times
7
-
Just so you know, Adam already handles learning rate optimization. – ARAT Jan 10 '19 at 21:17
1 Answers
5
As mentioned in the code of the function the relation of decay_steps
with decayed_learning_rate
is the following:
decayed_learning_rate = learning_rate *
decay_rate ^ (global_step / decay_steps)
Hence, you should set the decay_steps
proportional to the global_step
of the algorithm.

OmG
- 1,039
- 10
- 13