Is that true Newton's Method / Quasi Newton Method are not widely used in deep neutral network training?

Asked Aug 29 '16 at 14:24

Active Aug 29 '16 at 14:35

Viewed 310 times

In recent years, people build huge neural networks with millions of parameters to learn.

I have seen many discussions about gradient based training, but not too much for Newton's Method / Quasi Newton Method.

Is that true Newton's Method / Quasi Newton Method are not widely used in deep neutral network training?

Is this because the Hessian is too large, so even an approximation of that such as BFGS would not work? But the gradient can be weakly approximated in different way?

Any review papers about the optimization methods used in deep learning neural network?

edited Aug 29 '16 at 14:35

asked Aug 29 '16 at 14:24

Haitao Du

32,885
17
118
213

1

looks fine to me. – Franck Dernoncourt Aug 29 '16 at 14:26
3

The L, as in L-BFGS, can deal with a larger number of variables than BFGS. L stands for limited memory, and the Hessian approximation is never explicitly formed with L-BFGS., rather, some small number of the most recent gradients are retained to allow the computations needed with the Hessian approximation. But L eventually runs out of gas as well. – Mark L. Stone Aug 29 '16 at 15:04

Is that true Newton's Method / Quasi Newton Method are not widely used in deep neutral network training?

0 Answers0

Linked