4

I implemented these two methods in a deep learning project where I am using theano. I understand the mathematical difference between these two methods, and my conceptual understanding is that nesterov is an improvement over momentum.

My question is: are there practical situations where momentum descent would be preferred over nesterov? My experience is that nesterov is always better. What would be a situation in which I would use momentum?

thc
  • 388
  • 2
  • 16
  • related: [What's the difference between momentum based gradient descent, and Nesterov's accelerated gradient descent?](https://stats.stackexchange.com/q/179915/215801) – Oren Milman Sep 21 '18 at 15:26

0 Answers0