I have been reading Deep Learning book by Ian Goodfellow, where they wrote in chapter 8 (section 8.3.3) that
Nesterov momentum does not improve the rate of convergence in stochastic gradient case.
I do not understand why this does not improve in case of stochastic gradient case? I am pasting the paragraph for the reference: