94
90
The Vapnik–Chervonenkis (VC)-dimension formula for neural networks ranges from $O(E)$ to $O(E^2)$, with $O(E^2V^2)$ in the worst case, where $E$ is the number of edges and $V$ is the number of nodes. The number of training samples needed to have a strong guarantee of generalization is linear with the VC-dimension.
This means that for a network with billions of edges, as in the case of successful deep learning models, the training dataset needs billions of training samples in the best case, to quadrillions in the worst case. The largest training sets currently have about a hundred billion samples. Since there is not enough training data, it is unlikely deep learning models are generalizing. Instead, they are overfitting the training data. This means the models will not perform well on data that is dissimilar to the training data, which is an undesirable property for machine learning.
Given the inability of deep learning to generalize, according to VC dimensional analysis, why are deep learning results so hyped? Merely having a high accuracy on some dataset does not mean much in itself. Is there something special about deep learning architectures that reduces the VC-dimension significantly?
If you do not think the VC-dimension analysis is relevant, please provide evidence/explanation that deep learning is generalizing and is not overfitting. I.e. does it have good recall AND precision, or just good recall? 100% recall is trivial to achieve, as is 100% precision. Getting both close to 100% is very difficult.
As a contrary example, here is evidence that deep learning is overfitting. An overfit model is easy to fool since it has incorporated deterministic/stochastic noise. See the following image for an example of overfitting.
Also, see lower ranked answers to this question to understand the problems with an overfit model despite good accuracy on test data.
Some have responded that regularization solves the problem of a large VC dimension. See this question for further discussion.

Comments are not for extended discussion; this conversation has been moved to chat.
– D.W. – 2017-05-15T03:37:15.1208I don't think questions why is something "hyped" are good. The answer is "because people". People take interest in things because of plethora of reasons, including marketing. – luk32 – 2017-05-16T13:49:26.787
3Deep learning works in practice. It might be overfiting. It might be completely unjustified. It might be learning secrets of the universe from an eldritch deity. But the hype is coming from practioners who are suddenly able to write 30 lines on code and teach a camera to scan signatures and match them with stored ones to validate bank transactions. Or tag unknown people in photographs. Etc. Maybe you've heard the line "it's not an insult if its true"? Well it's not hype if it works. There are lots of problems it doesn't work on and excessive popular hype. But it works in real life application. – Stella Biderman – 2019-06-27T15:51:43.697
@StellaBiderman ease of tooling around standard machine learning techniques is good and all. But the interest seems more to do with the supposed learning ability of DNNs that perhaps rivals human ability, which appears to be overhype given the VC analysis of the model. Such a high VC dimension implies the models will not generalize, and are instead memorizing the datasets, making them very brittle. All the adversarial example papers appear to demonstrate this point. – yters – 2019-06-27T15:57:45.773
1@yters The fact that adversarial examples exist is somewhat to the side of generalization. I can train a neural network on 10k data points and then test it on 100k data points and it'll do amazingly. That is generalization. The fact that we can deliberately concoct adversarial examples doesn't mean that the network doesn't work any more than the fact that we can concoct optical illusions doesn't mean that your vision doesn't work. Your entire question is predicated on the idea that neural networks don't generalize. But it's an empirical fact that they do. – Stella Biderman – 2019-06-27T20:07:08.547
1@StellaBiderman You are conflating good out-of-sample accuracy with generalization. While generalization implies the former, the former does not imply the latter. One example I've heard about is when a neural network "learned" to distinguish criminals from non-criminals by memorizing the white background of the prison photos. While the NN had great accuracy on out-of-sample dataset, it did not learn some general principle about criminal facial features. This is why we can concoct such absurd adversarial examples for NNs, because the NN is identifying 'cat' with incidental features. – yters – 2019-06-30T20:39:08.933