6

I have two questions regarding LSTMs:

1) Are LSTMs outputs' have shape/size exactly similar to the input?
2) Can we use LSTMs' intermediate outputs to deduce some sort of predictions?

Context:

I have an input as sequence of image frames of say 10 frames length. I am feeding them to an LSTM, and want to predict if each frame is one of the two classes. Now the output I get is similar size frame at each LSTM unit. Can I use these outputs to deduce some predictions (e.g. like adding some dense layers on top of them)? I am asking this, because I got impression that LSTMs can only be use in a different sense for example to predict the (t+1)th frame given 1-t frames as input.

Thanks in advance!

Isam Abdullah
  • 103
  • 1
  • 1
  • 7

1 Answers1

8

The basic recurrent neural network (RNN) cell is something that takes as input previous hidden state $h_{t-1}$ and current input $x_t$ and returns current hidden state

$$ h_t = \tanh(W_{hh}h_{t-1} + W_{xh}x_t) $$

Same applies to LSTM, but it is just a little bit more complicated as described in this great blog post. So answering your second question, at each step the RNN cell returns an output that can be used to make predictions. There are two ways of using RNN's, you can either process whole input sequence and look only at the last output state (e.g. process a whole sentence and then classify the sentiment of the sentence), or use the intermediate outcomes (in Keras this is the return_sequence=True parameter) and process them further, or make some kind of predictions (e.g. named-entity recognition per each word of a sentence). The only difference in here is that in the first case you simply ignore the intermediate states. If this is too abstract, the following figure (from the blog post referred above) may be helpful.

simple RNN network

As you can see, at each step you have some output $h_t$ that is a function of current input $x_t$ and all the history, as passed through the previous hidden state $h_{t-1}$.

As about shape of the hidden state, this is a matrix algebra, so the shape will depend on the shape of the inputs and weights. If you use some pre-build software, like Keras, then this is controlled by the parameters of LSTM cell (number of hidden units). If you code it by hand, this will depend on the shape of the weights.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • Thank you so much for the detailed answer. +1 for just the hardwork. ;) – Isam Abdullah Apr 26 '19 at 08:42
  • @Tim: Thanks Tim for your answer. I read through it several time but I could not figure out what the answer is to the 1st question "Are LSTMs outputs' have shape/size exactly similar to the input?". So if `return_sequence=True` and if you have as input e.g. (100, 10,2), does the output for the training than also have to have the shape (100,10, None). The last dimension can be different because you can map 2 (last dimension of the input) to any number of dimensions. But what about the first 2 dimensions 100 and 10. Do they have to be equal? – PeterBe Feb 03 '22 at 09:23