How is it possible? Training and validation loss curves were decreasing while training data size was increasing

Question

I'm really puzzled... I’ve learned and observed that training loss / error increases with training data size as stated in Dr Andrew Ng’s ML course.

I’ve recently experienced an anomaly. Training loss and validation loss curves were decreasing while training data size was increasing. The code was plotting after all the training was entirely done. In fact, I’ve trained 10 different DNN models with increasing quantity of data.

I suspect that Dr Ng’s drawing assumes that the number of data points is already pretty large. The drawing does not show the training error curve for small training sets. Maybe in that area, for high capacity DNN both training loss and validation loss can decrease together as the training data size increases. Maybe, maybe not... I don't really know...

That said, I’ve got the Dr Ng’s described behavior with small classical machine learning models using the same code but different data for both experiments, I just have to replace the model.

Jan Kukacka · Answer 1 · 2018-10-18T07:27:34.477

3

I think your example behaves differently due to regularization.

In the case of a very small dataset, an unregularized model should be able to "memorize" the training set and thus reach zero training error. This is what Andrew Ng talks about.

However, if you add some regularization to your model (such as weight decay), especially in the cases of very small dataset, the regularizer will prevent the model to overfit and instead it will push its parameters towards the regularized solution. In a well regularized model, the training and the validation curve should behave similarly.

Update:

If you encounter this in an unregularized network, there may be two further problems:

There is a bug in your code. Reaching zero training error on a tiny dataset (or even a single element) is a good sanity check. See What should I do when my neural network doesn't learn?
Your data is inconsistent: There are multiple samples with the same value but different label. You can never reach zero training error on such data.

edited Oct 18 '18 at 07:27

answered Oct 17 '18 at 08:28

Jan Kukacka

10,121
1
36
62

Nice try! But my NN architecture has no specific regularization (no L1, L2 or Elastic Net), no early stopping, neither dropout. It's just regularized by data. – Claude COULOMBE Oct 17 '18 at 18:41
After a short night's sleep, I think that the answer, as always with Machine Learning, is in the data. Sadly I cannot comment on these data for the moment. I asked the question to know: 1) Is anyone else encounter this anomaly? 2) Are there any theoretical obstacles? 3) In fact, is it possible? – Claude COULOMBE Oct 17 '18 at 18:41
@ClaudeCOULOMBE check the updated answer. – Jan Kukacka Oct 18 '18 at 07:27
I'll check it! I tend to think it's the latter, data inconsistency. Is there any literature about that problem? Thanks! – Claude COULOMBE Oct 19 '18 at 01:45
The answer, as so often in Machine Learning, is in the data. Examining my data, I've found that they were ordered, but I was sure that was not the case. So, the key point is that we should ALWAYS SHUFFLE THE DATA but keeping inputs and labels associated. Point 7 of the [37 Reasons why your Neural Network is not working](https://blog.slavv.com/37-reasons-why-your-neural-network-is-not-working-4020854bd607) – Claude COULOMBE Oct 28 '18 at 21:26

How is it possible? Training and validation loss curves were decreasing while training data size was increasing

1 Answers1