10

Sashank J. Reddi et. al in their paper "On the convergence of Adam and beyond" say that, Adam's proof of convergence as stated in original paper is wrong. More than that, they point out that the value

$Г_{t + 1} = \frac{\sqrt{V_{t+1}}}{a_{t+1}} - \frac{\sqrt{V_t}}{a_t}$, where $V$ is moving average of squared gradients and $a$ is learning rate,

is presumed to be positive, however, it's only true for SGD and AdaGrad, while for RMSProp and Adam can be anything. I can't find a place where this property would be presumed and used in adam's origin convergence proof, can someone point it out to me ?

Alexis
  • 26,219
  • 5
  • 78
  • 131

0 Answers0