What is the output of a tf.nn.dynamic_rnn()?

Question

I am not sure about what I understand from the official documentation, which says:

Returns: A pair (outputs, state) where:

outputs: The RNN output Tensor.

If time_major == False (default), this will be a Tensor shaped: [batch_size, max_time, cell.output_size].

If time_major == True, this will be a Tensor shaped: [max_time, batch_size, cell.output_size].

Note, if cell.output_size is a (possibly nested) tuple of integers or TensorShape objects, then outputs will be a tuple having the same structure as cell.output_size, containing Tensors having shapes corresponding to the shape data in cell.output_size.

state: The final state. If cell.state_size is an int, this will be shaped [batch_size, cell.state_size]. If it is a TensorShape, this will be shaped [batch_size] + cell.state_size. If it is a (possibly nested) tuple of ints or TensorShape, this will be a tuple having the corresponding shapes. If cells are LSTMCells state will be a tuple containing a LSTMStateTuple for each cell.

Is output[-1] always (in all three cell types i.e. RNN, GRU, LSTM) equal to state (second element of return tuple)? I guess the literature everywhere is too liberal in the use of the term hidden state. Is hidden state in all three cells the score coming out (why it is called hidden is beyond me, it would appear cell state in LSTM should be called the hidden state as it is not exposed)?

score 10 · Accepted Answer · answered Feb 26 '18 at 21:12

Yes, cell output equals to the hidden state. In case of LSTM, it's the short-term part of the tuple (second element of LSTMStateTuple), as can be seen in this picture:

But for tf.nn.dynamic_rnn, the returned state may be different when the sequence is shorter (sequence_length argument). Take a look at this example:

n_steps = 2
n_inputs = 3
n_neurons = 5

X = tf.placeholder(dtype=tf.float32, shape=[None, n_steps, n_inputs])
seq_length = tf.placeholder(tf.int32, [None])

basic_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons)
outputs, states = tf.nn.dynamic_rnn(basic_cell, X, sequence_length=seq_length, dtype=tf.float32)

X_batch = np.array([
  # t = 0      t = 1
  [[0, 1, 2], [9, 8, 7]], # instance 0
  [[3, 4, 5], [0, 0, 0]], # instance 1
  [[6, 7, 8], [6, 5, 4]], # instance 2
  [[9, 0, 1], [3, 2, 1]], # instance 3
])
seq_length_batch = np.array([2, 1, 2, 2])

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  outputs_val, states_val = sess.run([outputs, states], 
                                     feed_dict={X: X_batch, seq_length: seq_length_batch})

  print(outputs_val)
  print()
  print(states_val)

Here the input batch contains 4 sequences and one of them is short and padded with zeros. Upon running you should something like this:

[[[ 0.2315362  -0.37939444 -0.625332   -0.80235624  0.2288385 ]
  [ 0.9999524   0.99987394  0.33580178 -0.9981791   0.99975705]]

 [[ 0.97374666  0.8373545  -0.7455188  -0.98751736  0.9658986 ]
  [ 0.          0.          0.          0.          0.        ]]

 [[ 0.9994331   0.9929737  -0.8311569  -0.99928087  0.9990415 ]
  [ 0.9984355   0.9936006   0.3662448  -0.87244385  0.993848  ]]

 [[ 0.9962312   0.99659646  0.98880637  0.99548346  0.9997809 ]
  [ 0.9915743   0.9936939   0.4348318   0.8798458   0.95265496]]]

[[ 0.9999524   0.99987394  0.33580178 -0.9981791   0.99975705]
 [ 0.97374666  0.8373545  -0.7455188  -0.98751736  0.9658986 ]
 [ 0.9984355   0.9936006   0.3662448  -0.87244385  0.993848  ]
 [ 0.9915743   0.9936939   0.4348318   0.8798458   0.95265496]]

... which indeed shows that state == output[1] for full sequences and state == output[0] for the short one. Also output[1] is a zero vector for this sequence. The same holds for LSTM and GRU cells.

So the state is a convenient tensor that holds the last actual RNN state, ignoring the zeros. The output tensor holds the outputs of all cells, so it doesn't ignore the zeros. That's the reason for returning both of them.

score 2 · Answer 2 · answered May 02 '18 at 12:52

Possible copy of https://stackoverflow.com/questions/36817596/get-last-output-of-dynamic-rnn-in-tensorflow/49705930#49705930

Anyway let's go ahead with the answer.

This code snip might help understand what's really being returned by the dynamic_rnn layer

=> Tuple of (outputs, final_output_state).

So for an input with max sequence length of T time steps outputs is of the shape [Batch_size, T, num_inputs] (given time_major=False; default value) and it contains the output state at each timestep h1, h2.....hT.

And final_output_state is of the shape [Batch_size,num_inputs] and has the final cell state cT and output state hT of each batch sequence.

But since the dynamic_rnn is being used my guess is your sequence lengths vary for each batch.

    import tensorflow as tf
    import numpy as np
    from tensorflow.contrib import rnn
    tf.reset_default_graph()

    # Create input data
    X = np.random.randn(2, 10, 8)

    # The second example is of length 6 
    X[1,6:] = 0
    X_lengths = [10, 6]

    cell = tf.nn.rnn_cell.LSTMCell(num_units=64, state_is_tuple=True)

    outputs, states  = tf.nn.dynamic_rnn(cell=cell,
                                         dtype=tf.float64,
                                         sequence_length=X_lengths,
                                         inputs=X)

    result = tf.contrib.learn.run_n({"outputs": outputs, "states":states},
                                    n=1,
                                    feed_dict=None)
    assert result[0]["outputs"].shape == (2, 10, 64)
    print result[0]["outputs"].shape
    print result[0]["states"].h.shape
    # the final outputs state and states returned must be equal for each      
    # sequence
    assert(result[0]["outputs"][0][-1]==result[0]["states"].h[0]).all()
    assert(result[0]["outputs"][-1][5]==result[0]["states"].h[-1]).all()
    assert(result[0]["outputs"][-1][-1]==result[0]["states"].h[-1]).all()

The final assertion will fail as the final state for the 2nd sequence is at 6th time step ie. the index 5 and the rest of the outputs from [6:9] are all 0s in the 2nd timestep

What is the output of a tf.nn.dynamic_rnn()?

2 Answers2