For instance, the default activation function of tf.keras.layers.SimpleRNN
is tanh.
My doubt is because tanh activation functions may also cause (like sigmoids) the vanishing gradient problem.
Asked
Active
Viewed 236 times
0

DanielTheRocketMan
- 1,400
- 11
- 20
-
1It's not directly about RNNs but I hope the discussions here will give you some direction: https://stats.stackexchange.com/questions/330559/why-is-tanh-almost-always-better-than-sigmoid-as-an-activation-function – gunes Jul 23 '20 at 21:19
-
Yes. But the vanish gradient problem with the sigmoid is very explicit. – DanielTheRocketMan Jul 23 '20 at 21:39