Why do CNN's use ReLU over Sigmoid function?

Asked Feb 15 '18 at 15:16

Active May 09 '18 at 05:39

Viewed 2,104 times

I am trying to map my basic understanding of MLP's to CNN's. Why does a CNN sacrifice all negative inputs with the ReLU over the sigmoid. Is it because:

The sigmoid has a range of between zero and 1 which is worse for CNN's than 0 to infinity.

Is it to do with the fact the function only has one horizontal asymptote?

The negative inputs are meaningless? Or to speed up computation? Implied here on this question.

Any help is a appreciated.

asked Feb 15 '18 at 15:16

Tom Snow

1

https://www.cs.toronto.edu/~hinton/absps/reluICML.pdf – Sycorax Feb 15 '18 at 16:21
@Sycorax you know how to break that up? that's pretty complex tbh not sure if people will get it – theonlygusti Feb 28 '18 at 10:11
this question is closely related, and should help to get you started https://stats.stackexchange.com/questions/330559/why-is-tanh-almost-always-better-than-sigmoid-as-an-activation-function/330885#comment626799_330885 and this one https://stats.stackexchange.com/questions/101560/tanh-activation-function-vs-sigmoid-activation-function – Sycorax Feb 28 '18 at 16:43

Why do CNN's use ReLU over Sigmoid function?

0 Answers0