Highest Voted 'stochastic-gradient-descent' Questions - Statistical Analysis Stack Exchange

139

votes

5 answers

Batch gradient descent versus stochastic gradient descent

Suppose we have some training set $(x_{(i)}, y_{(i)})$ for $i = 1, \dots, m$. Also suppose we run some type of supervised learning algorithm on the training set. Hypotheses are represented as $h_{\theta}(x_{(i)}) = \theta_0+\theta_{1}x_{(i)1} +…

optimization gradient-descent stochastic-gradient-descent

asked Feb 07 '13 at 19:34

user20616

1,431
3
11
7

43

votes

2 answers

Who invented stochastic gradient descent?

I'm trying to understand the history of Gradient descent and Stochastic gradient descent. Gradient descent was invented in Cauchy in 1847.Méthode générale pour la résolution des systèmes d'équations simultanées. pp. 536–538 For more information…

references gradient-descent history stochastic-gradient-descent

asked Nov 14 '17 at 13:49

DaL

4,462
3
16
27

32

votes

4 answers

How does batch size affect convergence of SGD and why?

I've seen similar conclusion from many discussions, that as the minibatch size gets larger the convergence of SGD actually gets harder/worse, for example this paper and this answer. Also I've heard of people using tricks like small learning rates or…

machine-learning neural-networks optimization gradient-descent stochastic-gradient-descent

asked Nov 30 '17 at 12:35

dontloo

13,692
7
51
80

30

votes

6 answers

For convex problems, does gradient in Stochastic Gradient Descent (SGD) always point at the global extreme value?

Given a convex cost function, using SGD for optimization, we will have a gradient (vector) at a certain point during the optimization process. My question is, given the point on the convex, does the gradient only point at the direction at which the…

neural-networks optimization gradient-descent stochastic-gradient-descent convex

asked Sep 18 '18 at 07:28

CyberPlayerOne

2,009
3
22
30

27

votes

2 answers

How could stochastic gradient descent save time compared to standard gradient descent?

Standard Gradient Descent would compute gradient for the entire training dataset. for i in range(nb_epochs): params_grad = evaluate_gradient(loss_function, data, params) params = params - learning_rate * params_grad For a pre-defined number of…

machine-learning optimization gradient-descent computational-statistics stochastic-gradient-descent

asked Aug 27 '16 at 15:25

Alina

915
2
10
21

22

votes

2 answers

Why second order SGD convergence methods are unpopular for deep learning?

It seems that, especially for deep learning, there are dominating very simple methods for optimizing SGD convergence like ADAM - nice overview: http://ruder.io/optimizing-gradient-descent/ They trace only single direction - discarding information…

neural-networks optimization convergence gradient-descent stochastic-gradient-descent

asked Feb 24 '19 at 08:57

Jarek Duda

331
2
14

19

votes

1 answer

RMSProp and Adam vs SGD

I am performing experiments on the EMNIST validation set using networks with RMSProp, Adam and SGD. I am achieving 87% accuracy with SGD(learning rate of 0.1) and dropout (0.1 dropout prob) as well as L2 regularisation (1e-05 penalty). When testing…

machine-learning optimization stochastic-gradient-descent adam

asked Nov 26 '17 at 16:07

Alk

291
1
2
3

17

votes

4 answers

How can it be trapped in a saddle point?

I am currently a bit puzzled by how mini-batch gradient descent can be trapped in a saddle point. The solution might be too trivial that I don't get it. You get an new sample every epoch, and it computes a new error based on a new batch, so the…

gradient-descent stochastic-gradient-descent

asked May 07 '17 at 15:24

Fixining_ranges

171
1
1
5

15

votes

2 answers

How to set mini-batch size in SGD in keras

I am new to Keras and need your help. I am training a neural net in Keras and my loss function is Squared Difference b/w net's output and target value. I want to optimize this using Gradient Descent. After going through some links on the net, I have…

neural-networks python gradient-descent keras stochastic-gradient-descent

asked Jul 03 '16 at 08:41

Iceflame007

161
1
2
5

13

votes

1 answer

How to choose between SGD with Nesterov momentum and Adam?

I'm currently implementing a neural network architecture on Keras. I would like to optimize the training time, and I'm considering using alternative optimizers such as SGD with Nesterov Momentum and Adam. I've read several things about the pros and…

neural-networks stochastic-gradient-descent adam nesterov

asked Jun 02 '17 at 13:07

Clément F

1,717
4
12
13

12

votes

3 answers

Gradient descent on non-convex functions

What situations do we know of where gradient descent can be shown to converge (either to a critical point or to a local/global minima) for non-convex functions? For SGD on non-convex functions, one kind of proof has been reviewed here,…

gradient-descent gradient stochastic-gradient-descent non-convex

asked Feb 07 '18 at 02:50

gradstudent

271
2
9

10

votes

1 answer

What is the difference between VAE and Stochastic Backpropagation for Deep Generative Models?

What is the difference between Auto-encoding Variational Bayes and Stochastic Backpropagation for Deep Generative Models? Does inference in both methods lead to the same results? I'm not aware of any explicit comparisons between the two methods,…

deep-learning inference latent-variable variational-bayes stochastic-gradient-descent

asked Jun 20 '18 at 18:07

Dionysis M

794
6
17

8

votes

1 answer

Does Keras SGD optimizer implement batch, mini-batch, or stochastic gradient descent?

I am a newbie in Deep Learning libraries and thus decided to go with Keras. While implementing a NN model, I saw the batch_size parameter in model.fit(). Now, I was wondering if I use the SGD optimizer, and then set the batch_size = 1, m and b,…

neural-networks keras stochastic-gradient-descent

asked May 02 '19 at 06:33

Rajdeep Dutta

195
2
5

8

votes

2 answers

Dealing with small batch size in SGD training

I am trying to train a large model (deep net using caffe) using stochastic gradient descent (SGD). The problem is I am constraint by my GPU memory capacity and thus cannot process large mini-batches for each stochastic gradient estimation. How can I…

machine-learning neural-networks deep-learning gradient-descent stochastic-gradient-descent

asked Mar 15 '16 at 08:48

Shai

258
2
9

8

votes

2 answers

comparison of SGD and ALS in collaborative filtering

Matrix factorization is widely applied in collaborative filtering, and briefly speaking, it tries to learn the following parameters: $$\min_{q_u,p_i}\sum_{\{u,i\}}(r_{ui} - q_u^Tp_i)^2$$ And we could apply SGD and ALS as the learning algorithm,…

model-evaluation stochastic-gradient-descent

asked Mar 12 '16 at 07:29

avocado

3,045
5
32
45

Questions tagged [stochastic-gradient-descent]