When reading papers, a common concept is delaying the output of RNNs to after seeing all of the input. E.g., the neural Turing machine paper uses this technique, together with a simple identity function on the input sequence, to gauge how long-term the network's memory is. (To illustrate why the delay is necessary, note that even a memory-less network can implement the identity function if it outputs each element of the output vector right after seeing each input element.)
But I have not seen any details on how this technique is implemented. Do they pad the start of the target output sequence with some constant marker, and then pad the end of the input sequence the same? E.g.,
Real input: [1,2,3]
Real output: [1,2,3]
Input fed to the RNN: [1,2,3,0,0,0]
Target fed to the RNN: [0,0,0,1,2,3]
```