What's the difference between simulated annealing and stochastic gradient descent with restarts? They both seem like they are occasionally going backwards at a decreasing rate. Also what is the difference between SGD with restarts and with warm restarts?
Asked
Active
Viewed 387 times