How few training examples is too few when training a neural network?

Question

I'm a beginner trying to put together my first project. I had a song classification project in mind, but since I would be manually labeling, I could only reasonably put together about 1000 songs, or 60 hours of music.

I would be classifying with several classes, so it's possible that one class would have as few as 50-100 songs in the training set- this seems like too few! Is there a general rule of thumb for how much data is needed to train a neural network to give it a shot at working?

Edit: I was thinking of using a vanilla LSTM. The input features will have dimension 39, output dimension 6, my first attempt for hidden layer dimension would be 100.

This isn't really answerable because not all tasks are easy, and different network architectures and hyperparameter selections will improve/hurt different models in different ways. — Sycorax, Aug 01 '16 at 13:27
At a minimum, you need to specify your network structure & how many links there will be to train. — gung - Reinstate Monica, Aug 01 '16 at 13:29
[Minimum viable dataset](https://medium.com/appanion/the-minimum-viable-data-set-5deb45524726) might be what you wanted. — Lerner Zhang, Jan 16 '21 at 12:08

Franck Dernoncourt · Accepted Answer · 2016-10-03T17:07:30.777

It really depends on your dataset, and network architecture. One rule of thumb I have read (2) was a few thousand samples per class for the neural network to start to perform very well.

In practice, people try and see. It's not rare to find studies showing decent results with a training set smaller than 1000 samples.

A good way to roughly assess to what extent it could be beneficial to have more training samples is to plot the performance of the neural network based against the size of the training set, e.g. from (1):

(1) Dernoncourt, Franck, Ji Young Lee, Ozlem Uzuner, and Peter Szolovits. "De-identification of Patient Notes with Recurrent Neural Networks" arXiv preprint arXiv:1606.03475 (2016).
(2) Cireşan, Dan C., Ueli Meier, and Jürgen Schmidhuber. "Transfer learning for Latin and Chinese characters with deep neural networks." In The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1-6. IEEE, 2012. https://scholar.google.com/scholar?cluster=7452424507909578812&hl=en&as_sdt=0,22 ; http://people.idsia.ch/~ciresan/data/ijcnn2012_v9.pdf:

For classification tasks with a few thousand samples per class, the benefit of (unsupervised or supervised) pretraining is not easy to demonstrate.

Let me make a simple strongly related question. I have a (full) sample of 68 observations only, one target variable and 33 eligible predictors (trendless economic time series). Try to predict the target with neural networks, even in simple specifications, is a waste of time? Or can make sense? More in general maybe you can give me an opinion there https://stats.stackexchange.com/questions/491065/machine-learning-with-few-observations it would be appreciated. — markowitz, Oct 09 '20 at 08:35
I added more data to the dataset that was overfitting after 5 epochs and performance on the validation data is roughly the same with a significant bias towards one category or the other (binary image classification). The training data is close to balanced with 47% / 53% split. Would more data be helpful or is this a model problem? — jth_92, Feb 19 '22 at 16:51

How few training examples is too few when training a neural network?

1 Answers1

Linked