5

There are many papers on how Adagrad is used in SGD, but I have not seen any where it is applied in batch descent.

I have a situation wherein batch gradient descent is faster than SGD (unique to my problem).

So far I am simply using a optimization package that does LBFGS optimization. This works ok, but LBFGS only does line search for a scalar learning rate. With Adagrad i could get learning rates per dimension of my parameter vector which seems better than a scalar learning rate.

My question is - is there any reason NOT to use Adagrad in batch gradient descent?

Franck Dernoncourt
  • 42,093
  • 30
  • 155
  • 271
A.D
  • 2,114
  • 3
  • 17
  • 27
  • I am not completely clear by 'batch' descent, but in principle you can apply it. Any reason you think why not? – Daniel May 20 '15 at 03:48

0 Answers0