Adam optimizer's escape from local minima

Asked Dec 17 '18 at 09:49

Active Dec 17 '18 at 12:56

Viewed 716 times

I noticed some interesting behaviour in my loss history while training my model.

Please note the sudden change in test loss at around epoch 106. A similar drop will appear around epoch ~1000.

It seems to me that the optimizer is able to escape a local minima, is this correct?

Can anyone explain this behaviour to me?

This is with keras version 2.2.4 and the loss function is mean absolute error

edited Dec 17 '18 at 12:56

asked Dec 17 '18 at 09:49

VegardKT

Single plot is not enough to say why did the drop happen. There can be a number of reasons, some mentioned in the linked thread. In general, the optimizers in deep learning *by design* need to escape local minimas, otherwise the whole method wouldn't work because the loss landscape is full of such minimas. – Tim Dec 17 '18 at 13:13
Thanks Tim, sorry for the duplicate post. The saddle point example makes total sense, but what if it would have been an actual local minima? What effect would cause the gradient descent to be able to escape this minima? – VegardKT Dec 18 '18 at 07:44
This is not a duplicate post at all - the post linked answers an entirely different question. Why are people on SO so eager to mark things as a duplicate? – Dylan Kerler May 02 '21 at 09:21

0 Answers0