What is a multilayer LSTM?

Question

Apologies, this question is quite long.

I am trying to implement a paper on optimising the working of multilayer LSTM. The optimisation process works as follows:

First I wrote a sequential code for an LSTM network.However, I didn't use the concept of multi-layer, as I was not aware of this concept.

I implemented the basic optimisations, but when I reached the later steps, I found that I need to use the concept of hidden layers.

Below is my current understanding of hidden layers via diagram drawn by me. I wanted to ask if my understanding is correct or not, and if it is correct, What would be the values in place of '?'. I am a beginner in RNNs, Hence, I am deeply grateful for your time.

score 1 · Accepted Answer · answered May 02 '20 at 07:52

Each layer is a cell and each time step in each layer generates a state and at the end of the time step(max length) each layer generates a state(or we can view it two states: h and c).

Let's consider a two layer model. The input of the first layer in each time step is one of the characters you want to encode and cell just keeps the same(it would be updated after the back propagation) but the output state changes according to the input in that time step, and the state of the first layer is the input of the second layer in the respective time step and generates the output state for the third layer and so on and so forth.

For more details please refer to this answer: Understanding LSTM units vs. cells.

score 0 · Answer 2 · answered May 02 '20 at 07:15

A single LSTM layer is typically used to turn sequences into dense, non-sequential features. These are the states at the end of the RNN loop. This step basically turns sequence data into tabular data.

Sometimes, one LSTM layer is not capable to compress the sequential information well enough. In such cases, you just let the LSTM layer not only return the last state in its loop but the states at all iterations. In this case, you get a dense representation of your input for each time step. This new, sequential output can be fed into the next LSTM layer.

What is a multilayer LSTM?

2 Answers2