what is the mistake of convergence proof in Adam

Asked Sep 30 '18 at 11:34

Active Sep 30 '18 at 16:36

Viewed 690 times

Sashank J. Reddi et. al in their paper "On the convergence of Adam and beyond" say that, Adam's proof of convergence as stated in original paper is wrong. More than that, they point out that the value

$Г_{t + 1} = \frac{\sqrt{V_{t+1}}}{a_{t+1}} - \frac{\sqrt{V_t}}{a_t}$, where $V$ is moving average of squared gradients and $a$ is learning rate,

is presumed to be positive, however, it's only true for SGD and AdaGrad, while for RMSProp and Adam can be anything. I can't find a place where this property would be presumed and used in adam's origin convergence proof, can someone point it out to me ?

edited Sep 30 '18 at 16:36

Alexis

26,219
5
78
131

asked Sep 30 '18 at 11:34

Виталик Бушаев

what is the mistake of convergence proof in Adam

0 Answers0