4

I am trying to map my basic understanding of MLP's to CNN's. Why does a CNN sacrifice all negative inputs with the ReLU over the sigmoid. Is it because:

The sigmoid has a range of between zero and 1 which is worse for CNN's than 0 to infinity.

Is it to do with the fact the function only has one horizontal asymptote?

The negative inputs are meaningless? Or to speed up computation? Implied here on this question.

Any help is a appreciated.

Tom Snow
  • 41
  • 2
  • 1
    https://www.cs.toronto.edu/~hinton/absps/reluICML.pdf – Sycorax Feb 15 '18 at 16:21
  • @Sycorax you know how to break that up? that's pretty complex tbh not sure if people will get it – theonlygusti Feb 28 '18 at 10:11
  • this question is closely related, and should help to get you started https://stats.stackexchange.com/questions/330559/why-is-tanh-almost-always-better-than-sigmoid-as-an-activation-function/330885#comment626799_330885 and this one https://stats.stackexchange.com/questions/101560/tanh-activation-function-vs-sigmoid-activation-function – Sycorax Feb 28 '18 at 16:43

0 Answers0