Highest Voted 'backpropagation' Questions - Statistical Analysis Stack Exchange

112

votes

6 answers

Is it possible to train a neural network without backpropagation?

Many neural network books and tutorials spend a lot of time on the backpropagation algorithm, which is essentially a tool to compute the gradient. Let's assume we are building a model with ~10K parameters / weights. Is it possible to run the…

asked Sep 20 '16 at 01:48

Haitao Du

32,885
17
118
213

64

votes

5 answers

Why is tanh almost always better than sigmoid as an activation function?

In Andrew Ng's Neural Networks and Deep Learning course on Coursera he says that using $tanh$ is almost always preferable to using $sigmoid$. The reason he gives is that the outputs using $tanh$ centre around 0 rather than $sigmoid$'s 0.5, and this…

machine-learning neural-networks backpropagation sigmoid-curve

asked Feb 26 '18 at 08:45

Tom Hale

2,231
3
13
31

60

votes

5 answers

Backpropagation with Softmax / Cross Entropy

I'm trying to understand how backpropagation works for a softmax/cross-entropy output layer. The cross entropy error function is $$E(t,o)=-\sum_j t_j \log o_j$$ with $t$ and $o$ as the target and output at neuron $j$, respectively. The sum is over…

backpropagation derivative softmax cross-entropy

asked Sep 17 '16 at 23:32

micha

703
1
6
5

54

votes

1 answer

How large should the batch size be for stochastic gradient descent?

I understand that stochastic gradient descent may be used to optimize a neural network using backpropagation by updating each iteration with a different sample of the training dataset. How large should the batch size be?

machine-learning neural-networks gradient-descent backpropagation

asked Mar 07 '15 at 21:18

Simon Kuang

2,051
3
17
18

49

votes

8 answers

Danger of setting all initial weights to zero in Backpropagation

Why is it dangerous to initialize weights with zeros? Is there any simple example that demonstrates it?

neural-networks backpropagation

asked Apr 25 '12 at 18:21

user8078

593
1
5
4

46

votes

1 answer

How is softmax_cross_entropy_with_logits different from softmax_cross_entropy_with_logits_v2?

Specifically, I suppose I wonder about this statement: Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default. Which is shown when I use tf.nn.softmax_cross_entropy_with_logits. In the…

machine-learning supervised-learning tensorflow backpropagation

asked Feb 07 '18 at 16:35

Christian Eriksson

573
1
4
10

41

votes

1 answer

Why are non zero-centered activation functions a problem in backpropagation?

I read here the following: Sigmoid outputs are not zero-centered. This is undesirable since neurons in later layers of processing in a Neural Network (more on this soon) would be receiving data that is not zero-centered. This has implications…

neural-networks deep-learning backpropagation

asked Sep 27 '16 at 14:12

Amelio Vazquez-Reina

17,546
26
74
110

36

votes

6 answers

Backpropagation vs Genetic Algorithm for Neural Network training

I've read a few papers discussing pros and cons of each method, some arguing that GA doesn't give any improvement in finding the optimal solution while others show that it is more effective. It seems GA is generally preferred in literature (although…

neural-networks genetic-algorithms backpropagation

asked Apr 11 '13 at 23:42

sashkello

2,198
1
20
26

29

votes

2 answers

Gradient backpropagation through ResNet skip connections

I'm curious about how gradients are back-propagated through a neural network using ResNet modules/skip connections. I've seen a couple of questions about ResNet (e.g. Neural network with skip-layer connections) but this one is asking specifically…

machine-learning neural-networks conv-neural-network gradient-descent backpropagation

asked Mar 21 '17 at 08:09

Simon

1,741
3
26
38

29

votes

3 answers

Why use gradient descent with neural networks?

When training a neural network using the back-propagation algorithm, the gradient descent method is used to determine the weight updates. My question is: Rather than using gradient descent method to slowly locate the minimum point with respect to a…

neural-networks gradient-descent backpropagation

asked Nov 13 '15 at 14:47

Minaj

1,201
1
12
21

27

votes

2 answers

Why doesn't backpropagation work when you initialize the weights the same value?

Why doesn't backpropagation work when you initialize all the weight the same value (say 0.5), but works fine when given random numbers? Shouldn't the algorithm calculate the error and work from there, despite the fact that the weights are initially…

machine-learning neural-networks backpropagation

asked Dec 04 '12 at 12:25

user1724140

435
4
7

26

votes

1 answer

Backpropagation on a convolutional layer

Online tutorials describe in depth the convolution of an image with a filter, etc; However, I have not seen one that describes the backpropagation on the filter (at least visually). First let me try to explain how I understand backpropagation on a…

machine-learning neural-networks conv-neural-network backpropagation

asked Feb 02 '18 at 01:11

Edv Beq

721
1
8
21

22

votes

2 answers

In neural nets, why use gradient methods rather than other metaheuristics?

In training deep and shallow neural networks, why are gradient methods (e.g. gradient descent, Nesterov, Newton-Raphson) commonly used, as opposed to other metaheuristics? By metaheuristics I mean methods such as simulated annealing, ant colony…

neural-networks optimization deep-learning gradient-descent backpropagation

asked Apr 15 '16 at 07:14

Lior

517
3
14

21

votes

3 answers

Backpropagation algorithm and error in hidden layer

I got a slight confusion on the backpropagation algorithm used in multilayer perceptron (MLP). The error is adjusted by the cost function. In backpropagation, we are trying to adjust the weight of the hidden layers. The output error I can…

machine-learning neural-networks backpropagation

asked Dec 10 '10 at 19:21

HIGGINS

479
8
12

21

votes

1 answer

What are the practical uses of Neural ODEs?

"Neural Ordinary Differential Equations", by Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt and David Duvenaud, was awarded the best-paper award in NeurIPS in 2018 There, authors propose the NeuralODE, which is a method that fuses concepts of…

machine-learning neural-networks backpropagation differential-equations neural-odes

asked Jan 20 '20 at 03:27

Firebug

15,262
5
60
127

Questions tagged [backpropagation]