For image classification problem, let's say, and given a neural network to train on,
if you were to run too many iterations for a single image of a cat would not generalize well into other images of cats. But then, if you were to run only 1 iteration for a single image of a cat, then using the same weights of the network, you go through another iteration using another picture of a cat, then it would simply not converge fast enough since you wouldn't be able to use RMSprop....etc
So one way to prevent that is by using dropout regularization but is there a proof that even with so many iterations per example, it makes the network "difficult" to overfit for that each example?