Adagrad for batch gradient descent

Asked May 06 '15 at 18:01

Active Dec 14 '15 at 00:33

Viewed 471 times

There are many papers on how Adagrad is used in SGD, but I have not seen any where it is applied in batch descent.

I have a situation wherein batch gradient descent is faster than SGD (unique to my problem).

So far I am simply using a optimization package that does LBFGS optimization. This works ok, but LBFGS only does line search for a scalar learning rate. With Adagrad i could get learning rates per dimension of my parameter vector which seems better than a scalar learning rate.

My question is - is there any reason NOT to use Adagrad in batch gradient descent?

edited Dec 14 '15 at 00:33

Franck Dernoncourt

42,093
30
155
271

asked May 06 '15 at 18:01

A.D

2,114
3
17
27

I am not completely clear by 'batch' descent, but in principle you can apply it. Any reason you think why not? – Daniel May 20 '15 at 03:48

Adagrad for batch gradient descent

0 Answers0