Questions tagged [backpropagation]

Backpropagation, an abbreviation for "backward propagation of errors", is a common method of training artificial neural networks used in conjunction with an optimization method such as gradient descent.

442 questions
112
votes
6 answers

Is it possible to train a neural network without backpropagation?

Many neural network books and tutorials spend a lot of time on the backpropagation algorithm, which is essentially a tool to compute the gradient. Let's assume we are building a model with ~10K parameters / weights. Is it possible to run the…
Haitao Du
  • 32,885
  • 17
  • 118
  • 213
64
votes
5 answers

Why is tanh almost always better than sigmoid as an activation function?

In Andrew Ng's Neural Networks and Deep Learning course on Coursera he says that using $tanh$ is almost always preferable to using $sigmoid$. The reason he gives is that the outputs using $tanh$ centre around 0 rather than $sigmoid$'s 0.5, and this…
60
votes
5 answers

Backpropagation with Softmax / Cross Entropy

I'm trying to understand how backpropagation works for a softmax/cross-entropy output layer. The cross entropy error function is $$E(t,o)=-\sum_j t_j \log o_j$$ with $t$ and $o$ as the target and output at neuron $j$, respectively. The sum is over…
micha
  • 703
  • 1
  • 6
  • 5
54
votes
1 answer

How large should the batch size be for stochastic gradient descent?

I understand that stochastic gradient descent may be used to optimize a neural network using backpropagation by updating each iteration with a different sample of the training dataset. How large should the batch size be?
49
votes
8 answers

Danger of setting all initial weights to zero in Backpropagation

Why is it dangerous to initialize weights with zeros? Is there any simple example that demonstrates it?
user8078
  • 593
  • 1
  • 5
  • 4
46
votes
1 answer

How is softmax_cross_entropy_with_logits different from softmax_cross_entropy_with_logits_v2?

Specifically, I suppose I wonder about this statement: Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default. Which is shown when I use tf.nn.softmax_cross_entropy_with_logits. In the…
41
votes
1 answer

Why are non zero-centered activation functions a problem in backpropagation?

I read here the following: Sigmoid outputs are not zero-centered. This is undesirable since neurons in later layers of processing in a Neural Network (more on this soon) would be receiving data that is not zero-centered. This has implications…
Amelio Vazquez-Reina
  • 17,546
  • 26
  • 74
  • 110
36
votes
6 answers

Backpropagation vs Genetic Algorithm for Neural Network training

I've read a few papers discussing pros and cons of each method, some arguing that GA doesn't give any improvement in finding the optimal solution while others show that it is more effective. It seems GA is generally preferred in literature (although…
sashkello
  • 2,198
  • 1
  • 20
  • 26
29
votes
2 answers

Gradient backpropagation through ResNet skip connections

I'm curious about how gradients are back-propagated through a neural network using ResNet modules/skip connections. I've seen a couple of questions about ResNet (e.g. Neural network with skip-layer connections) but this one is asking specifically…
29
votes
3 answers

Why use gradient descent with neural networks?

When training a neural network using the back-propagation algorithm, the gradient descent method is used to determine the weight updates. My question is: Rather than using gradient descent method to slowly locate the minimum point with respect to a…
Minaj
  • 1,201
  • 1
  • 12
  • 21
27
votes
2 answers

Why doesn't backpropagation work when you initialize the weights the same value?

Why doesn't backpropagation work when you initialize all the weight the same value (say 0.5), but works fine when given random numbers? Shouldn't the algorithm calculate the error and work from there, despite the fact that the weights are initially…
26
votes
1 answer

Backpropagation on a convolutional layer

Online tutorials describe in depth the convolution of an image with a filter, etc; However, I have not seen one that describes the backpropagation on the filter (at least visually). First let me try to explain how I understand backpropagation on a…
22
votes
2 answers

In neural nets, why use gradient methods rather than other metaheuristics?

In training deep and shallow neural networks, why are gradient methods (e.g. gradient descent, Nesterov, Newton-Raphson) commonly used, as opposed to other metaheuristics? By metaheuristics I mean methods such as simulated annealing, ant colony…
21
votes
3 answers

Backpropagation algorithm and error in hidden layer

I got a slight confusion on the backpropagation algorithm used in multilayer perceptron (MLP). The error is adjusted by the cost function. In backpropagation, we are trying to adjust the weight of the hidden layers. The output error I can…
HIGGINS
  • 479
  • 8
  • 12
21
votes
1 answer

What are the practical uses of Neural ODEs?

"Neural Ordinary Differential Equations", by Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt and David Duvenaud, was awarded the best-paper award in NeurIPS in 2018 There, authors propose the NeuralODE, which is a method that fuses concepts of…
1
2 3
29 30