3

Batch size and the number of iterations are considered as a tradeoff.

It has been observed in practice that when using a larger batch there is a significant degradation in the quality of the model, as measured by its ability to generalize.

(...)

In contrast, small-batch methods consistently converge to flat minimizers, and our experiments support a commonly held view that this is due to the inherent noise in the gradient estimation.

source, found on this thread answer by Frank Dernoncourt.

What if, like when we reduce the learning rate when approaching local minima, we increase the batch size over the number of epochs ? Or increasing it when the loss isn't moving significantly (Netsterov like) ?

Totem
  • 141
  • 4

1 Answers1

2

It turns out that increasing batch size during training (in every or alternate epoch) keeping learning rate constant works exactly the same as if batch size was constant and learning rate was decreasing. You can clearly see that in the image below taken from Samuel L. Smith et al. The only difference you'd see is the decrease in the number of parameter updates.

You can get the paper, Don't Decay the Learning Rate, Increase the Batch Size, here.

enter image description here

Sycorax
  • 76,417
  • 20
  • 189
  • 313
Sadaf Shafi
  • 121
  • 4