What does VC dimension tell us about deep learning?

Question

In basic machine learning we are taught the following "rules of thumb":

a) the size of your data should be at least 10 times the size of the VC dimension of your hypothesis set.

b) a neural network with N connections has a VC dimension of approximately N.

So when a deep learning neural network has say, millions of units, does this mean we should have, say, billions of data points? Can you please shed some light on this?

A deep neural network won't have millions of units as you state. However, it will have millions of connections. I would assume that your second rule of thumb doesn't hold for these networks, primarily due to their regularized nature (e.g. CNN with dropout). — pir, Jun 14 '15 at 21:52
I think the key is that VC bound is not infinite. If it is finite, then PAC theory tells us that learning is feasible. How much data, that is another question. — Vladislavs Dovgalecs, Sep 27 '15 at 06:49

score 6 · Answer 1 · edited Nov 05 '19 at 18:30

6

The rule of thumb you talk about cannot be applied to a neural network.

A neural network has some basic parameters, i.e. its weights and biases. The number of weights are dependent on the number of connections between the network layers and the number of biases are dependent on the number of neurons.

The size of data required highly depends on -

The type of neural network used.
The regularization techniques used in the net.
The learning rate used in training the net.

This being said, the more proper and sure way to know whether the model is overfitting is to check if the validation error is close to the training error. If yes, then the model is working fine. If no, then the model is most likely overfitting and that means that you need to reduce the size of your model or introduce regularization techniques.

edited Nov 05 '19 at 18:30

answered Jun 27 '15 at 22:44

Azrael

831
6
7

7

@nbro, if you have a proper hold-out set to check validation error on, that's a far more reliable measure of overfitting for your particular trained network than going through usually-very-loose VC bounds. – Danica Nov 05 '19 at 18:42
3

Not my answer @nbro. But given a validation set, you can get a trivial high-probability bound on the true generalization error with Hoeffding or similar, while going through VC bounds involves a lot of loose upper bounds that are not specific to the particular dataset and network you have at hand. – Danica Nov 05 '19 at 18:43

What does VC dimension tell us about deep learning?

1 Answers1