I understand that stochastic gradient descent has a batch size of 1, but while reading inception v2 paper, I found this text in training methodology "We have trained our networks with stochastic gradient utilizing the TensorFlow [1] distributed machine learning system using 50 replicas running each on a NVidia Kepler GPU with batch size 32 for 100 epochs." can anyone help me out with this?
Asked
Active
Viewed 92 times
0

Sycorax
- 76,417
- 20
- 189
- 313

AKSHAY KAPOOR
- 11
- 1
-
https://stats.stackexchange.com/questions/140811/how-large-should-the-batch-size-be-for-stochastic-gradient-descent – Sycorax Oct 22 '19 at 18:46
-
https://stats.stackexchange.com/search?q=stochastic+gradient+descent+batch+size – Sycorax Oct 22 '19 at 18:46
1 Answers
3
There are two common meanings of “stochastic gradient decent” in the literature. One holds that SGD uses a single example per iteration. The second holds that using “small” batches of 1 or more samples is SGD.
There’s no particular reason that either definition is “more correct” than the other; however, there are often large practical and experimental differences for using more than one example per iteration.

Sycorax
- 76,417
- 20
- 189
- 313