Why does Nesterov momentum not improve the rate of convergence in the stochastic gradient case?

Asked Sep 11 '18 at 00:07

Active Sep 11 '18 at 00:07

Viewed 1,374 times

I have been reading Deep Learning book by Ian Goodfellow, where they wrote in chapter 8 (section 8.3.3) that

Nesterov momentum does not improve the rate of convergence in stochastic gradient case.

I do not understand why this does not improve in case of stochastic gradient case? I am pasting the paragraph for the reference:

asked Sep 11 '18 at 00:07

samra irshad

3

Sorry I don't have time to write a full-fledged answer, but the stochastic case is a random sample - your gradient's direction might not actually be correct. Nesterov takes a big jump along the gradient without correcting first, so it *could* compound this type of mistake. Check out https://stats.stackexchange.com/questions/179915/whats-the-difference-between-momentum-based-gradient-descent-and-nesterovs-ac – Don Walpola Sep 11 '18 at 00:21

0 Answers0