Why do people use tanh more often than ReLU in vanilla recurrent neural networks?

Asked Jul 23 '20 at 20:52

Active Jul 23 '20 at 20:52

Viewed 236 times

For instance, the default activation function of tf.keras.layers.SimpleRNN is tanh. My doubt is because tanh activation functions may also cause (like sigmoids) the vanishing gradient problem.

asked Jul 23 '20 at 20:52

DanielTheRocketMan

1,400
11
20

1

It's not directly about RNNs but I hope the discussions here will give you some direction: https://stats.stackexchange.com/questions/330559/why-is-tanh-almost-always-better-than-sigmoid-as-an-activation-function – gunes Jul 23 '20 at 21:19
Yes. But the vanish gradient problem with the sigmoid is very explicit. – DanielTheRocketMan Jul 23 '20 at 21:39

Why do people use tanh more often than ReLU in vanilla recurrent neural networks?

0 Answers0