Training loss decreases, then suddenly increases, then decreases lower than the first time

Question

I get the following loss behavior when training multilayer perceptron with mean squared error loss on some synthetic data using Adam with learning rate 1e-1.

As far as I can say from reading, for example, Training loss increases with time, the increase can be attributed to the learning rate being too large such that optimizer pushes the model outside the minimum. However, what I do not understand, is that the second minimum (around 450th epoch) is significantly lower than the 1st minimum (at ~200th epoch).

Could you please point me where should I read about such behavior and why it occurs? Thank you!

score 1 · Accepted Answer · answered Mar 03 '21 at 18:15

Here's one possible interpretation of your loss function's behavior:

At the beginning, loss decreases healthily.
Optimizer accidentaly pushes the network out of the minimum (you identified this too). Loss function is now high.
Loss decreases healthily again, but towards a different local minimum which might actually be lower than the previous. Remember the optimizer's path is random, so it can go down a different path this time.
Back to step 2 and the cycle repeats itself.

I think the key observation here is that the loss function of almost every neural net (or perceptron) has several minima, and we're usually happy if our optimizer finds one that is low enough. The following link explains the concept well. https://www.allaboutcircuits.com/technical-articles/understanding-local-minima-in-neural-network-training/

I see, thanks for confirming the behavior. What is not clear to me is how the validation loss is not affected by all these different minima of the training loss. — Dmitry Kabanov, Mar 03 '21 at 18:56

Training loss decreases, then suddenly increases, then decreases lower than the first time

1 Answers1