9

I used to train my model on my local machine, where the memory is only sufficient for 10 examples per batch. However, when I migrated my model to AWS and used a bigger GPU (Tesla K80), I could accomodate a batch size of 32. However, the AWS models all performed very, very poorly with a large indication of overfitting. Why does this happen?

The model I am currently using is the inception-resnet-v2 model, and the problem I'm targeting is a computer vision one. One explanation I can think of is that it is probably the batch-norm process that makes it more used to the batch images. As a mitigation, I reduced the batch_norm decay moving average.

Also, should I use dropout together with batch_norm? Is this practice common?

My training images are around 5000, but I trained for around 60 epochs. Is this considered a lot or should I stop the training earlier?

infomin101
  • 1,363
  • 4
  • 14
  • 20
  • 3
    I think this a duplicate of: http://stats.stackexchange.com/questions/164876 – usεr11852 Mar 09 '17 at 07:16
  • Possible duplicate of [Tradeoff batch size vs. number of iterations to train a neural network](http://stats.stackexchange.com/questions/164876/tradeoff-batch-size-vs-number-of-iterations-to-train-a-neural-network) – Sentry Mar 09 '17 at 07:29

1 Answers1

6

Chapter 6 of Goodfellow's book:

Small batches can offer a regularizing effect (Wilson and Martinez, 2003), perhaps due to the noise they add to the learning process. Generalization error is often best for a batch size of 1. Training with such a small batch size might require a small learning rate to maintain stability because of the high variance in the estimate of the gradient. The total runtime can be very high as a result of the need to make more steps, both because of the reduced learning rate and because it takes more steps to observe the entire training set.

Erfan
  • 61
  • 1
  • 3