4

I have been reading Deep Learning book by Ian Goodfellow, where they wrote in chapter 8 (section 8.3.3) that

Nesterov momentum does not improve the rate of convergence in stochastic gradient case.

I do not understand why this does not improve in case of stochastic gradient case? I am pasting the paragraph for the reference:enter image description here

samra irshad
  • 571
  • 3
  • 12
  • 3
    Sorry I don't have time to write a full-fledged answer, but the stochastic case is a random sample - your gradient's direction might not actually be correct. Nesterov takes a big jump along the gradient without correcting first, so it *could* compound this type of mistake. Check out https://stats.stackexchange.com/questions/179915/whats-the-difference-between-momentum-based-gradient-descent-and-nesterovs-ac – Don Walpola Sep 11 '18 at 00:21

0 Answers0