Gradient Descent Rule in feedforward ANN

Question

I am having a hard time understanding the Gradient Descent Rule for learning in a feedforward ANN. In particular, how do we determine the initial weight vector, and how is this weight vector adjusted after each epoch?

From what I've read, I know that we first define some error function depending on the weights, and I think that we choose the initial weight to be the minimizer of this error function. Is this right?

Sycorax · Accepted Answer · 2019-06-04T15:02:48.360

2

Typically neural network weights are initialized at random (for example: Xavier Initialization - Formula Clarification) while the biases are initialized at 0.

Gradient descent applies updates of the form $$x^{(k+1)} = x^{(k)} - \eta \nabla f(x^{(k)})$$ where ${}^{(k)}$ indicates that this is the $k$th iteration of the procedure and $\eta$ is the learning rate. Stochastic gradient descent only uses a fraction of the data to estimate $\nabla f(x^{(k)})$.

Gradient descent is an imperfect tool. Some discussion:

edited Jun 04 '19 at 15:02

answered May 24 '19 at 16:24

Sycorax

76,417
20
189
313

So nu is the learning rate, and the weights generally move toward the minimizing weight? – DavidSilverberg May 24 '19 at 16:30
The learning rate is $\eta$ (eta); $\nu$ (nu) doesn't appear in that equation. We hope that the update is closer to the minimum than when we started; however, there are lots of ways that this can go wrong. One example: https://stats.stackexchange.com/questions/367397/for-convex-problems-does-gradient-in-stochastic-gradient-descent-sgd-always-p/367459#367459 Another example: https://stats.stackexchange.com/questions/364360/how-can-change-in-cost-function-be-positive – Sycorax May 24 '19 at 16:37

Gradient Descent Rule in feedforward ANN

1 Answers1