0

I understand that stochastic gradient descent has a batch size of 1, but while reading inception v2 paper, I found this text in training methodology "We have trained our networks with stochastic gradient utilizing the TensorFlow [1] distributed machine learning system using 50 replicas running each on a NVidia Kepler GPU with batch size 32 for 100 epochs." can anyone help me out with this?

Sycorax
  • 76,417
  • 20
  • 189
  • 313
  • https://stats.stackexchange.com/questions/140811/how-large-should-the-batch-size-be-for-stochastic-gradient-descent – Sycorax Oct 22 '19 at 18:46
  • https://stats.stackexchange.com/search?q=stochastic+gradient+descent+batch+size – Sycorax Oct 22 '19 at 18:46

1 Answers1

3

There are two common meanings of “stochastic gradient decent” in the literature. One holds that SGD uses a single example per iteration. The second holds that using “small” batches of 1 or more samples is SGD.

There’s no particular reason that either definition is “more correct” than the other; however, there are often large practical and experimental differences for using more than one example per iteration.

Sycorax
  • 76,417
  • 20
  • 189
  • 313