batch size of stochastic gradient descent

Question

I understand that stochastic gradient descent has a batch size of 1, but while reading inception v2 paper, I found this text in training methodology "We have trained our networks with stochastic gradient utilizing the TensorFlow [1] distributed machine learning system using 50 replicas running each on a NVidia Kepler GPU with batch size 32 for 100 epochs." can anyone help me out with this?

https://stats.stackexchange.com/questions/140811/how-large-should-the-batch-size-be-for-stochastic-gradient-descent — Sycorax, Oct 22 '19 at 18:46
https://stats.stackexchange.com/search?q=stochastic+gradient+descent+batch+size — Sycorax, Oct 22 '19 at 18:46

score 3 · Answer 1 · answered Oct 22 '19 at 16:16

There are two common meanings of “stochastic gradient decent” in the literature. One holds that SGD uses a single example per iteration. The second holds that using “small” batches of 1 or more samples is SGD.

There’s no particular reason that either definition is “more correct” than the other; however, there are often large practical and experimental differences for using more than one example per iteration.

batch size of stochastic gradient descent

1 Answers1