Understanding LSTM units vs. cells

Question

I have been studying LSTMs for a while. I understand at a high level how everything works. However, going to implement them using Tensorflow I've noticed that BasicLSTMCell requires a number of units (i.e. num_units) parameter.

From this very thorough explanation of LSTMs, I've gathered that a single LSTM unit is one of the following

which is actually a GRU unit.

I assume that parameter num_units of the BasicLSTMCell is referring to how many of these we want to hook up to each other in a layer.

That leaves the question - what is a "cell" in this context? Is a "cell" equivalent to a layer in a normal feed-forward neural network?

I am still confused, I was reading http://colah.github.io/posts/2015-08-Understanding-LSTMs/ and I understand that well. How does the term cell apply with respect to that article? It seems that an LSTM cell in the article is a vector as in Tensorflow, right? — Charlie Parker, Apr 19 '17 at 04:44
That units in Keras is the dimension of the output space, which is equal to the length of the delay (time_step) the network is recurring to. keras.layers.LSTM(units, activation='tanh', ....) https://keras.io/layers/recurrent/ — notilas, May 28 '19 at 23:12

Franck Dernoncourt · Accepted Answer · 2016-10-24T01:36:56.953

25

The terminology is unfortunately inconsistent. num_units in TensorFlow is the number of hidden states, i.e. the dimension of $h_t$ in the equations you gave.

Also, from https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/api_docs/python/functions_and_classes/shard9/tf.nn.rnn_cell.RNNCell.md :

The definition of cell in this package differs from the definition used in the literature. In the literature, cell refers to an object with a single scalar output. The definition in this package refers to a horizontal array of such units.

"LSTM layer" is probably more explicit, example:

def lstm_layer(tparams, state_below, options, prefix='lstm', mask=None):
    nsteps = state_below.shape[0]
    if state_below.ndim == 3:
        n_samples = state_below.shape[1]
    else:
        n_samples = 1

    assert mask is not None
    […]

edited Oct 24 '16 at 01:36

answered Oct 24 '16 at 01:30

Franck Dernoncourt

42,093
30
155
271

Ah I see, so then a "cell" is a `num_unit` sized horizontal array of interconnected LSTM cells. Makes sense. So then it would be analogous to a hidden layer in a standard feed-forward network then? – Oct 24 '16 at 01:32
*LSTM state units – Oct 24 '16 at 01:33
@rec That's correct – Franck Dernoncourt Oct 24 '16 at 01:33
Excellent, I'll go ahead and mark this solved. Thank you! – Oct 24 '16 at 01:34
Could you clarify the meaning of "horizontal" (perhaps in contrast to "vertical")? I think I understand from your answer that if num_unit=2 means that there are two separate LSTM progressions for each input (each with its own memory state and weight parameters) producing two separate outputs. (By "separate" I mean that only inputs, but not parameters, weights or hidden states, are shared between them.) – Sycorax Oct 24 '16 at 02:15
1

@Sycorax for example, if the input of the neural network is a timeseries with 10 time steps, the horizontal dimension has 10 elements. – Franck Dernoncourt Oct 24 '16 at 03:36
Are those 10 time-steps `num_units`? Or something else? I have this same question about TensorFlow but it's still a bit opaque to me. – Sycorax Oct 24 '16 at 03:46
@Sycorax in the diagram the OP gave, there would be $x_1, … , x_{10}$ inputs, and $h_1, …, h_{\text{num_units}}$ hidden states for each input. – Franck Dernoncourt Oct 24 '16 at 03:52
1

I am still confused, I was reading http://colah.github.io/posts/2015-08-Understanding-LSTMs/ and I understand that well. How does the term cell apply with respect to that article? It seems that an LSTM cell in the article is a vector as in Tensorflow, right? – Charlie Parker Apr 19 '17 at 04:44

score 8 · Answer 2 · answered Jan 30 '19 at 10:10

8

Most LSTM/RNN diagrams just show the hidden cells but never the units of those cells. Hence, the confusion. Each hidden layer has hidden cells, as many as the number of time steps. And further, each hidden cell is made up of multiple hidden units, like in the diagram below. Therefore, the dimensionality of a hidden layer matrix in RNN is (number of time steps, number of hidden units).

answered Jan 30 '19 at 10:10

Garima Jain

179
1
1

Seems confusing. So, what's in the diagram? A hidden cell that has multiple hidden units? But you just said hidden cells are correlated with time steps. Ah, I am confused as well. – Holmes Queen Jan 27 '22 at 17:40

Lerner Zhang · Answer 3 · 2020-03-15T07:49:15.227

Although the issue is almost the same as I answered in this answer, I'd like to illustrate this issue, which also confused me a bit today in the seq2seq model (thanks to @Franck Dernoncourt's answer), in the graph. In this simple encoder diagram:

Each $h_i$ above is the same cell in different time-step (cell either GRU or LSTM as that in your question) and the weight vectors(not bias) in the cell are of the same size of (num_units/num_hidden or state_size or output_size).

RNN is a special type of graphical model where nodes form a directed list as explained in section 4 of this paper: Supervised Neural Networks for the Classication of Structures. We can think of num_units as the number of tags in CRF(although CRF is undirected), and the matrices($W$'s in graph in the question) are all shared across all time steps like the transition matrix in CRF.

@notilas No, please don't. num_units is the dimension of the $h_i$. — Lerner Zhang, Mar 05 '20 at 04:25

Utpal Mattoo · Answer 4 · 2020-11-09T05:00:21.327

In keras.layers.LSTM(units, activation='tanh', ....), the units refers to the dimensionality or length of the hidden state or the length of the activation vector passed on the next LSTM cell/unit - the next LSTM cell/unit is the "green picture above with the gates etc from http://colah.github.io/posts/2015-08-Understanding-LSTMs/

The next LSTM cell/unit (i.e. the green box with gates etc from http://colah.github.io/posts/2015-08-Understanding-LSTMs/) is NOT the same as the units in keras.layers.LSTM(units, activation='tanh', ....)

The units are also sometimes called the latent dimensions. Here is a detailed explanation of the units LSTM parameter:

https://zhuanlan.zhihu.com/p/58854907

Thank you. Your last link has the clearest explanation I've seen thus far. I held many of the same misconceptions that the author clears up. — Kevin D., May 15 '21 at 20:23

score 0 · Answer 5 · answered Jun 29 '20 at 17:26

0

Quoting from TF's tutorial on RNNs:

In addition to the built-in RNN layers, the RNN API also provides cell-level APIs. Unlike RNN layers, which processes whole batches of input sequences, the RNN cell only processes a single timestep.

answered Jun 29 '20 at 17:26

mostafa.elhoushi

139
4

score -1 · Answer 6 · answered Jun 07 '18 at 08:41

-1

In my opinion, cell means a node such as hidden cell which is also called hidden node, for multilayer LSTM model,the number of cell can be computed by time_steps*num_layers, and the num_units is equal to time_steps

answered Jun 07 '18 at 08:41

user210864

1

Understanding LSTM units vs. cells

6 Answers6

Linked