Backpropagation, an abbreviation for "backward propagation of errors", is a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent.
Questions tagged [backpropagation]
442 questions
112
votes
6 answers
Is it possible to train a neural network without backpropagation?
Many neural network books and tutorials spend a lot of time on the backpropagation algorithm, which is essentially a tool to compute the gradient.
Let's assume we are building a model with ~10K parameters / weights. Is it possible to run the…

Haitao Du
- 32,885
- 17
- 118
- 213
64
votes
5 answers
Why is tanh almost always better than sigmoid as an activation function?
In Andrew Ng's Neural Networks and Deep Learning course on Coursera he says that using $tanh$ is almost always preferable to using $sigmoid$.
The reason he gives is that the outputs using $tanh$ centre around 0 rather than $sigmoid$'s 0.5, and this…

Tom Hale
- 2,231
- 3
- 13
- 31
60
votes
5 answers
Backpropagation with Softmax / Cross Entropy
I'm trying to understand how backpropagation works for a softmax/cross-entropy output layer.
The cross entropy error function is
$$E(t,o)=-\sum_j t_j \log o_j$$
with $t$ and $o$ as the target and output at neuron $j$, respectively. The sum is over…

micha
- 703
- 1
- 6
- 5
54
votes
1 answer
How large should the batch size be for stochastic gradient descent?
I understand that stochastic gradient descent may be used to optimize a neural network using backpropagation by updating each iteration with a different sample of the training dataset. How large should the batch size be?

Simon Kuang
- 2,051
- 3
- 17
- 18
49
votes
8 answers
Danger of setting all initial weights to zero in Backpropagation
Why is it dangerous to initialize weights with zeros? Is there any simple example that demonstrates it?

user8078
- 593
- 1
- 5
- 4
46
votes
1 answer
How is softmax_cross_entropy_with_logits different from softmax_cross_entropy_with_logits_v2?
Specifically, I suppose I wonder about this statement:
Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.
Which is shown when I use tf.nn.softmax_cross_entropy_with_logits. In the…

Christian Eriksson
- 573
- 1
- 4
- 10
41
votes
1 answer
Why are non zero-centered activation functions a problem in backpropagation?
I read here the following:
Sigmoid outputs are not zero-centered. This is undesirable since neurons in later layers of processing in a Neural Network (more on
this soon) would be receiving data that is not zero-centered. This has
implications…

Amelio Vazquez-Reina
- 17,546
- 26
- 74
- 110
36
votes
6 answers
Backpropagation vs Genetic Algorithm for Neural Network training
I've read a few papers discussing pros and cons of each method, some arguing that GA doesn't give any improvement in finding the optimal solution while others show that it is more effective. It seems GA is generally preferred in literature (although…

sashkello
- 2,198
- 1
- 20
- 26
29
votes
2 answers
Gradient backpropagation through ResNet skip connections
I'm curious about how gradients are back-propagated through a neural network using ResNet modules/skip connections. I've seen a couple of questions about ResNet (e.g. Neural network with skip-layer connections) but this one is asking specifically…

Simon
- 1,741
- 3
- 26
- 38
29
votes
3 answers
Why use gradient descent with neural networks?
When training a neural network using the back-propagation algorithm, the gradient descent method is used to determine the weight updates. My question is: Rather than using gradient descent method to slowly locate the minimum point with respect to a…

Minaj
- 1,201
- 1
- 12
- 21
27
votes
2 answers
Why doesn't backpropagation work when you initialize the weights the same value?
Why doesn't backpropagation work when you initialize all the weight the same value (say 0.5), but works fine when given random numbers?
Shouldn't the algorithm calculate the error and work from there, despite the fact that the weights are initially…

user1724140
- 435
- 4
- 7
26
votes
1 answer
Backpropagation on a convolutional layer
Online tutorials describe in depth the convolution of an image with a filter, etc; However, I have not seen one that describes the backpropagation on the filter (at least visually).
First let me try to explain how I understand backpropagation on a…

Edv Beq
- 721
- 1
- 8
- 21
22
votes
2 answers
In neural nets, why use gradient methods rather than other metaheuristics?
In training deep and shallow neural networks, why are gradient methods (e.g. gradient descent, Nesterov, Newton-Raphson) commonly used, as opposed to other metaheuristics?
By metaheuristics I mean methods such as simulated annealing, ant colony…

Lior
- 517
- 3
- 14
21
votes
3 answers
Backpropagation algorithm and error in hidden layer
I got a slight confusion on the backpropagation algorithm used in multilayer perceptron (MLP).
The error is adjusted by the cost function. In backpropagation, we are trying to adjust the weight of the hidden layers. The output error I can…

HIGGINS
- 479
- 8
- 12
21
votes
1 answer
What are the practical uses of Neural ODEs?
"Neural Ordinary Differential Equations", by Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt and David Duvenaud, was awarded the best-paper award in NeurIPS in 2018
There, authors propose the NeuralODE, which is a method that fuses concepts of…

Firebug
- 15,262
- 5
- 60
- 127