What is the point of putting two lstm cells one after another?

Question

I've seen something like this a lot in literature : "we used x lstms cells in our implementation". I don't understand the point of using several stacked lstms : indeed, why isn't a single cell enough as it already takes the cell state and the hidden state from the previous time step ?

For example page 4 of this paper : https://arxiv.org/pdf/1612.04928.pdf

I see the advantage of parallelizing two cells but not the one of stacking.

Your question is automatically flagged as low-quality because it is so short. Can you extend your question please? — Ferdi, Sep 07 '17 at 12:07
Thank you for extending your question. Now it looks much better. If you still remember the paper where you read this sentence it would be awesome if you provide a link. — Ferdi, Sep 07 '17 at 12:20

Lerner Zhang · Accepted Answer · 2017-09-08T11:24:50.200

1

One layer only has one cell. For more information read this. And the stacked multi-layer LSTM model is for extracting more abstract information. I think this question and this answer have explained this issue in detail.

edited Sep 08 '17 at 11:24

answered Sep 08 '17 at 08:43

Lerner Zhang

5,017
1
31
52

1

Also might want to point to Graves' [seminal paper on stacked LSTMs for speech recognition](https://arxiv.org/pdf/1303.5778.pdf): "If LSTM is used for the hidden layers we get deep bidirectional LSTM, the main architecture used in this paper. As far as we are aware this is the first time *deep* LSTM has been applied to speech recognition, and we find that it yields a dramatic improvement over single-layer LSTM." ([Graves et al., 2013](http://ieeexplore.ieee.org/abstract/document/6638947/)) – fnl Sep 08 '17 at 09:38

What is the point of putting two lstm cells one after another?

1 Answers1