Deep Learning: Why does increase batch_size cause overfitting and how does one reduce it?

Question

I used to train my model on my local machine, where the memory is only sufficient for 10 examples per batch. However, when I migrated my model to AWS and used a bigger GPU (Tesla K80), I could accomodate a batch size of 32. However, the AWS models all performed very, very poorly with a large indication of overfitting. Why does this happen?

The model I am currently using is the inception-resnet-v2 model, and the problem I'm targeting is a computer vision one. One explanation I can think of is that it is probably the batch-norm process that makes it more used to the batch images. As a mitigation, I reduced the batch_norm decay moving average.

Also, should I use dropout together with batch_norm? Is this practice common?

My training images are around 5000, but I trained for around 60 epochs. Is this considered a lot or should I stop the training earlier?

I think this a duplicate of: http://stats.stackexchange.com/questions/164876 — usεr11852, Mar 09 '17 at 07:16
Possible duplicate of [Tradeoff batch size vs. number of iterations to train a neural network](http://stats.stackexchange.com/questions/164876/tradeoff-batch-size-vs-number-of-iterations-to-train-a-neural-network) — Sentry, Mar 09 '17 at 07:29

score 6 · Answer 1 · answered May 31 '19 at 19:45

Chapter 6 of Goodfellow's book:

Small batches can oﬀer a regularizing eﬀect (Wilson and Martinez, 2003), perhaps due to the noise they add to the learning process. Generalization error is often best for a batch size of 1. Training with such a small batch size might require a small learning rate to maintain stability because of the high variance in the estimate of the gradient. The total runtime can be very high as a result of the need to make more steps, both because of the reduced learning rate and because it takes more steps to observe the entire training set.

Deep Learning: Why does increase batch_size cause overfitting and how does one reduce it?

1 Answers1