0

For instance, the default activation function of tf.keras.layers.SimpleRNN is tanh. My doubt is because tanh activation functions may also cause (like sigmoids) the vanishing gradient problem.

DanielTheRocketMan
  • 1,400
  • 11
  • 20
  • 1
    It's not directly about RNNs but I hope the discussions here will give you some direction: https://stats.stackexchange.com/questions/330559/why-is-tanh-almost-always-better-than-sigmoid-as-an-activation-function – gunes Jul 23 '20 at 21:19
  • Yes. But the vanish gradient problem with the sigmoid is very explicit. – DanielTheRocketMan Jul 23 '20 at 21:39

0 Answers0