Nesterov vs. momentum gradient descent

Asked Nov 26 '15 at 00:39

Active Nov 26 '15 at 00:39

Viewed 1,595 times

I implemented these two methods in a deep learning project where I am using theano. I understand the mathematical difference between these two methods, and my conceptual understanding is that nesterov is an improvement over momentum.

My question is: are there practical situations where momentum descent would be preferred over nesterov? My experience is that nesterov is always better. What would be a situation in which I would use momentum?

asked Nov 26 '15 at 00:39

thc

related: [What's the difference between momentum based gradient descent, and Nesterov's accelerated gradient descent?](https://stats.stackexchange.com/q/179915/215801) – Oren Milman Sep 21 '18 at 15:26

Nesterov vs. momentum gradient descent

0 Answers0

Linked