3

I have a neural network I am training on some time series data. Naturally I want to sequentially mini-batch this data if at all possible.

However, it seems that if the data size isn't a multiple of the mini-batch size it's recommended to sample from other batches (here). This works for data without time dependency, but if we sample from other batches to fill in the last "runt" batch, we potentially introduce autocorrelation into the data where it didn't exist before.

For sampling from time series we have block bootstrapping. However my fear is of course, as mentioned above, introducing erroneous dependency.

In this case would it be better to train the model using all of the data at once and accept the performance penalty? Or only train using batches that divide the data size evenly?

1 Answers1

0

I'm going to propose a potential answer here, though I don't know enough to validate if it would be a possible solution.

Since we're trying to capture batches of time series data, we could consider each batch a "window" into the time series, and select a large enough batch size to capture the majority of the autocorrelation in a window. Say we have weather data, and the last 100 days of data seems to be about the time autocorrelation begins to decay, we could make our batch size 100 and sample randomly windows of 100 from the total time series each iteration. You could probably pick up this extra information by examining a ACF graph and seeing where the correlation tails off.

This gives us equal probability of seeing each sample again, and each epoch we'll be looking at the same data so we will likely see many of these "windows" multiple times as we're training.

I feel like this works, but I don't have the tools at my disposal to prove it.

Can someone who is much better at this sanity check this idea for me?