Is there any other reason for using SGD than reducing time until convergence?

Question

Is there any other reason for using Stochastic Gradient Descent than reducing time until convergence? In other words, does it ever make sense to try out SGD when regular Gradient Descent runs fairly quickly?

In the case of neural networks, see [Tradeoff batch size vs. number of iterations to train a neural network](http://stats.stackexchange.com/q/164876/12359). — Franck Dernoncourt, Dec 23 '16 at 19:09

score 2 · Answer 1 · answered Dec 23 '16 at 18:27

2

Memory (RAM) becomes a big issue if you are training with lots of data and that is another reason why SGD is preferred.

answered Dec 23 '16 at 18:27

A.D

2,114
3
17
27

Overfitting can also be an issue, see answer of @FranckDernoncourt linked in his comment above. – GeoMatt22 Dec 23 '16 at 19:21

Is there any other reason for using SGD than reducing time until convergence?

1 Answers1