1

Is there any other reason for using Stochastic Gradient Descent than reducing time until convergence? In other words, does it ever make sense to try out SGD when regular Gradient Descent runs fairly quickly?

Mariusz
  • 171
  • 5

1 Answers1

2

Memory (RAM) becomes a big issue if you are training with lots of data and that is another reason why SGD is preferred.

A.D
  • 2,114
  • 3
  • 17
  • 27