Should you use optimization algorithms like Adagrad and ADAM for neural network online training?

Question

Optimization algorithms like Adagrad and ADAM decay your learning rate over time. To me this sounds like a bad idea for online training since you're always getting new data as opposed to retraining on the same data for multiple epochs in offline.

Suppose I could use Adagrad or ADAM for online training, would the learning rate I find using grid search for offline training be suitable for online training? I'd imagine not.

dontloo · Answer 1 · 2016-12-06T11:51:48.620

1

If I understood correctly Adagrad will decay the learning rate as there's a matrix $G_t=\sum g_\tau g_\tau^T$ whose value is always increasing, while in Adam a similar matrix is estimated by moving average to avoid such decay, and it seems the idea of moving average fits well the context of online learning.

edited Dec 06 '16 at 11:51

answered Dec 06 '16 at 09:56

dontloo

13,692
7
51
80

Should you use optimization algorithms like Adagrad and ADAM for neural network online training?

1 Answers1