Why does gradient descent work faster with ReLU compared to using with Signoid?

Asked Jan 02 '19 at 14:21

Active Jan 02 '19 at 14:26

Viewed 79 times

As far as I understand, Signoid function is used for mapping the outputs of neural network to the values between 0 and 1. Why is using rectified linear unit(ReLU) as activation function in deep neural networks, works faster? Can you please explain the mathematical concept behind it?

edited Jan 02 '19 at 14:26

asked Jan 02 '19 at 14:21

Htut Lin Aung

"Faster" in what sense? If you ask about computation, `max` in ReLU just compares two numbers, while `exp` does a number of different computations. – Tim Jan 02 '19 at 14:28
I am now learning from Andrew Ng's deep learning specialization and I heard that using ReLU instead of Signoid was one of the great breakthroughs for deep learning because it helps make the gradient descent works much faster for large deep neural networks My understanding of activation functions is that it maps the outputs of a neural net to a range of values so I am kinda lost with the concept of ReLU making the gradient descent process faster. – Htut Lin Aung Jan 02 '19 at 14:37
1

Long story short, the answer is "steeper gradients" -- this is described precisely in my answer here: https://stats.stackexchange.com/questions/226923/why-do-we-use-relu-in-neural-networks-and-how-do-we-use-it/226927#226927 The key detail is that the *network trains faster*, not that the gradient descent algorithm is any different. – Sycorax Jan 02 '19 at 15:11

Why does gradient descent work faster with ReLU compared to using with Signoid?

0 Answers0