Is anyone stacking LSTM and GRU cells together and why?

Question

TensorFlow allows you to create MultiRNNCell composed sequentially of multiple simple cells (LSTM and GRU). I usually use same type of cell when creating MultiRNNCell but I was wondering if there could be some benefits in using both LSTM and GRU? Does anyone have some experience with it or theoretical insights?

Would be pretty interesting to see. I might be testing the combination of LSTM and NARX cells. — Thomas Wagenaar, May 11 '17 at 08:41

Lerner Zhang · Answer 1 · 2020-11-04T15:38:17.670

2

There are some thousands of variants of RNN cell(kernel) and both LSTM and GRU are for processing the input $x_i$ and the output of the previous state $s_{i-1}$ and producing the output and the current state. Even thought LSTM preceded GRU and GRU contains less computation, LSTM is just on a par with GRU in performance. So, I think stacking LSTM and GRU or any other cells might just interesting but would not make any big difference in improving the performance(compared to simply stacking either LSTM cells or GRU cells).

As George E. P. Box put it, all models are wrong, but some are useful, you can just have a try to see if it is useful or not.

edited Nov 04 '20 at 15:38

answered May 26 '17 at 06:19

Lerner Zhang

5,017
1
31
52

+1 for the last sentence: they really don't do things that are that different so there's not that much advantage you'd get from combining them. – Wayne Feb 08 '18 at 14:18
When you say that LSTM is on par with GRU, are you making the comparison on a per-unit basis or a per-parameter basis? – Sycorax Feb 08 '18 at 14:59

Is anyone stacking LSTM and GRU cells together and why?

1 Answers1

Linked