Is there any other reason for using Stochastic Gradient Descent than reducing time until convergence? In other words, does it ever make sense to try out SGD when regular Gradient Descent runs fairly quickly?
Asked
Active
Viewed 37 times
1
-
1In the case of neural networks, see [Tradeoff batch size vs. number of iterations to train a neural network](http://stats.stackexchange.com/q/164876/12359). – Franck Dernoncourt Dec 23 '16 at 19:09
-
1Incremental/online training is another reason. – Sycorax Dec 23 '16 at 23:42
1 Answers
2
Memory (RAM) becomes a big issue if you are training with lots of data and that is another reason why SGD is preferred.

A.D
- 2,114
- 3
- 17
- 27
-
Overfitting can also be an issue, see answer of @FranckDernoncourt linked in his comment above. – GeoMatt22 Dec 23 '16 at 19:21