2

This might be a silly question, but here it is anyway. I'm trying to implement Nesterov's Momentum to extend the gradient descent algorithm that I'm currently using for my neural network, where I'm currently using momentum. Now, I know that applying Nesterov's momentum simply amounts to evaluating the gradient at a shifted point, that is, W_shifted = W_current + α * ΔW_old (where W_current are the current weights, ΔW_old is the weight update at last iteration and α is the momentum) and then go on with the usual steps of gradient descent.

The question is: when evaluating the loss function at each iteration, should I compute the network's output at W_shifted or at W_current?

  • You should use the current iteration of the weights to calculate the loss function. Momentum is just to keep the gradient in check, and decide on the next position of the parameters. SO if you had implemented it on an Excel sheet, it would be a separate column running in parallel. – Zhubarb Jan 02 '18 at 18:09

0 Answers0