1

When reading papers, a common concept is delaying the output of RNNs to after seeing all of the input. E.g., the neural Turing machine paper uses this technique, together with a simple identity function on the input sequence, to gauge how long-term the network's memory is. (To illustrate why the delay is necessary, note that even a memory-less network can implement the identity function if it outputs each element of the output vector right after seeing each input element.)

But I have not seen any details on how this technique is implemented. Do they pad the start of the target output sequence with some constant marker, and then pad the end of the input sequence the same? E.g.,

Real input: [1,2,3]
Real output: [1,2,3]

Input fed to the RNN: [1,2,3,0,0,0]
Target fed to the RNN: [0,0,0,1,2,3]
```
HappyFace
  • 121
  • 3
  • If `0` is what you're using to pad, then this is generally right—padding is just an implementation detail, really. – Arya McCarthy Nov 28 '21 at 03:04
  • @AryaMcCarthy How will a variable length output be implemented? Should we use an end of sequence marker, and keep feeding the model until it produces that marker, or should we feed the model as many input tokens as needed for the correct length of output, and no more? – HappyFace Nov 28 '21 at 08:17
  • You should use and end-of-sequence marker. – Arya McCarthy Nov 28 '21 at 14:05

0 Answers0