How to get the data set size required for neural network training?

Question

I have some convolutional neural network and I want to know how to get the data set size required for training?

How to get the data set size if I want to train a perceptron? Which math tool can be used to calculate the estimate?

I think the question as stated is the title is an interesting one. However note that in the question body, a [convnet](https://en.wikipedia.org/wiki/Convolutional_neural_network) and a [perceptron](https://en.wikipedia.org/wiki/Perceptron) are two different *particular types* of neural network. Hopefully @FranckDernoncourt or one of the other neural network experts on CV can give a good answer. I believe it is common to have an under-determined network but then use regularization (e.g. dropout) to prevent over-fitting. So e.g. the simple math of # weights = #examples may not apply. — GeoMatt22, Sep 05 '16 at 20:05
[Minimum viable dataset](https://medium.com/appanion/the-minimum-viable-data-set-5deb45524726) might be what you wanted. — Lerner Zhang, Jan 16 '21 at 12:08

score 2 · Answer 1 · answered Sep 05 '16 at 20:56

There's really no fixed rule that you can apply here. The number of training samples for training depends on the nature of the problem, the number of features, and the complexity of your network architecture. Try "simple" architectures first, i.e., fewer layers, fewer units per layer and experiment a bit with different training sizes and architectures to get a feeling for that. I know, the answer may be a bit disappointing, but as far as I know, it's all empirical for now.

Also, maybe learning curves could help (although, be aware that it's expensive; it's useful for developing a "feeling" for the dataset and model complexity though) E.g., I did this one for a MNIST subset using a simple softmax algorithm (1-layer) some time ago. I used 1500 samples for testing for the different training set sizes, and I would conclude from this figure that more training data may help to fit a "more accurate" model.

For your example, did you use the same size `Test` dataset for evaluating the accuracy of each model with varying `Train` data sizes? — ryanjdillon, Apr 25 '17 at 19:57
Yap I did. It would be better to draw a new test sample each time, but except for synthetic benchmarks with a known distribution, it'd be a bit tricky to get your hands on those. — , Apr 26 '17 at 00:15

score 2 · Answer 2 · edited Apr 13 '17 at 12:44

I'll copy my answer from the very related question How few training examples is too few when training a neural network? (any update will be performed there):

It really depends on your dataset, and network architecture. One rule of thumb I have read (e.g., in (2)) was a few thousand samples per class for the neural network to start to perform very well.

In practice, people try and see. It's not rare to find studies showing decent results with a training set smaller than 1000 samples.

(2) Cireşan, Dan C., Ueli Meier, and Jürgen Schmidhuber. "Transfer learning for Latin and Chinese characters with deep neural networks." In The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1-6. IEEE, 2012. https://scholar.google.com/scholar?cluster=7452424507909578812&hl=en&as_sdt=0,22 ; http://people.idsia.ch/~ciresan/data/ijcnn2012_v9.pdf:

For classification tasks with a few thousand samples per class, the benefit of (unsupervised or supervised) pretraining is not easy to demonstrate.

score 0 · Answer 3 · answered Sep 05 '16 at 19:10

0

The "data set size" is property of the data set, not of the NN. If you are working with MNIST data set - the full data set is 60,000 images. If you split 10% for validation, you'd have 54,000 images for training. The training data set size will be 54,000.

answered Sep 05 '16 at 19:10

Iliyan Bobev

179
1
8

1

Can you spell out the connection to the original question a bit more? (i.e. for this example, what size would the network be that can be trained?) – GeoMatt22 Sep 05 '16 at 19:57
I think the "size" of the network (I guess that would be number of layers & number of nodes) would depend on the number of features in the data, rather than the number of samples in the data set. It would also depend on the number of categories and how "mixed" they are. In the end you can train a network as small or as big as you like. The goal is to find the one that is big enough to perform well, and small enough as not to waste time and FLOPS. Unfortunately there isn't a formula to find such a optimum. – Iliyan Bobev Sep 05 '16 at 20:23
I was thinking in terms of training, that the objective function would typically be a sum over examples, and the number of parameters would be the total number of weights in the network. So yes, the "size" of the network would generally scale with the number of features. However similar to least squares (or, more generally, [M-estimation](https://en.wikipedia.org/wiki/M-estimator)), the number of residual equations would be the number training examples. So the "Residual Jacobian" would be square if the two #'s are equal. – GeoMatt22 Sep 05 '16 at 20:30

How to get the data set size required for neural network training?

3 Answers3

Linked