8

I experiment with CIFA10 datasets. With my model I found that the larger the batch size, the better the model can learn the dataset. From what I see on the internet the typical size is 32 to 128, and my optimal size is 512-1024. Is it ok? Or are there any things which I should take a look at to improve the model. Which indicators should I use to debug it?

P.S. It seems that the gradient is too noisy and if we have a larger sample size, it reduces the noise.

Konstantin Solomatov
  • 1,203
  • 2
  • 10
  • 8
  • FYI [Tradeoff batch size vs. number of iterations to train a neural network](https://stats.stackexchange.com/q/164876/12359) – Franck Dernoncourt May 01 '17 at 00:27
  • @FranckDernoncourt Thanks for the link but it seems that there's a bug in my model somewhere – Konstantin Solomatov May 01 '17 at 00:45
  • This link gives a good overview of the whole point of minibatching. It'll be slower and generally less efficient, but can help with huge datasets and help jolt you out of poor local minima: http://sebastianruder.com/optimizing-gradient-descent/index.html – generic_user Jun 02 '17 at 15:09

2 Answers2

7

Read the following paper. It's a great read. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima, Nitish Shirish Keska et al, ICLR 2017.

There are many great discussions and empirical results on benchmark datasets comparing the effect of different batchsizes. As they conclude, large batchsize causes over-fitting and they explain it as it converges to a sharp minima.

The code is also available here.

PickleRick
  • 688
  • 6
  • 19
  • Link only answers are not welcome on CV. Furthermore you do not have to write in capital letters. – Ferdi Oct 24 '17 at 12:48
2

The too-large batch size can introduce numerical instability and the Layer-wise Adaptive Learning Rates would help stabilize the training.

Lerner Zhang
  • 5,017
  • 1
  • 31
  • 52