I have a neural network I am training on some time series data. Naturally I want to sequentially mini-batch this data if at all possible.
However, it seems that if the data size isn't a multiple of the mini-batch size it's recommended to sample from other batches (here). This works for data without time dependency, but if we sample from other batches to fill in the last "runt" batch, we potentially introduce autocorrelation into the data where it didn't exist before.
For sampling from time series we have block bootstrapping. However my fear is of course, as mentioned above, introducing erroneous dependency.
In this case would it be better to train the model using all of the data at once and accept the performance penalty? Or only train using batches that divide the data size evenly?