How to determine moving window size?

Question

I am using moving window technique for data analysis...

For example I compute the mean, the standard deviation and etc. for a given window.

And I wonder if there's any good criterion to determine window size...

I thought of computing multivariate variance of mean, sd, and etc. and

select the window size that minimize the variance...

But for the most of cases, the longer the window size, the smaller the variance...

So I am stuck... Is there any criterion that can solve the problem of determine the

window size or I have to go with my eyes and pick one that best entertain my eyes?

What kind of "data analysis" are you performing? What are you estimating, predicting, or deciding? (The answer depends on that context.) — whuber, Oct 15 '14 at 18:25

score 2 · Answer 1 · answered Mar 26 '16 at 01:28

I agree with whuber. You should have some methodology, for example bootstrapping and stability criterion (that is perturb your data a little bit and check that your estimates do not change too much), or better sharp theoretical results, to help you decide if you have enough samples to compute your estimates correctly.

Then, you should use the minimum amount of data that provides you strong guarantee (theoretical or empirical) on the relevance of your results, but no more as you will smooth the signal (or violate even more the stationary hypothesis).

As long as you stay with estimating mean, variance, and so on, you should be able to find theoretical results and guidelines. If you want to determine the minimum length of the window for a complex processing (say machine learning algorithms), you should go for an empirical study, cf. this study for an example on the clustering of correlated random variables.

This is probably a good answer for some but ignores the context clues that point to what a useful answer would look like to the asker. The question is asked by someone who is relatively new. They know how to calculate a rolling window with code and may even know how to do the math on paper. However, they likely not equipped to understand a lot of technical terminologies found in research papers, or the deeper aspects of statistics. This sort of answer is prohibitive to those who may deeply desire to learn more rather than being bombarded with resources they can't use. I.e: Me, and the OP. — rocksNwaves, Nov 01 '20 at 20:04

score 1 · Answer 2 · answered Oct 15 '14 at 18:29

Generally one picks the size of a sliding window that captures enough of information. Pick it too big, you will get more irrelevant information (loss of resolution). Pick too small, you will loose details.

You can see this following way. Suppose you have some real-valued function as a mixture of sinusoids with different periods. Picking a window size of length L will restrict you to a subset of functions that you will be able to extract.

sanity · Answer 3 · 2017-07-05T03:19:01.727

I've done some thinking about this in a different context and came up with an approach that seemed reasonable intuitively, although I have a compsci rather than stats background.

The motivation for a smaller window size is increased sensitivity to changes in the underlying process from which you are sampling. I'll call this "predictive value".

Let's say they are both expressed in units of the predictive value or predictive error we expect due to the bias or variance respectively.

The motivation for a larger window size is decreased noise due to small sample size. This is the sample standard deviation:

standard_deviation(samples_in_window) / sqrt(size(samples_in_window))

On the predictive value side, this is the difference between the mean of all samples and the mean of samples within the window.

So, our task is to select the window size that maximizes predictive accuracy, which is the predictive value minus the predictive error.

This can be implemented quite efficiently with a little thought, and if so the window size could be re-computed every time we receive a new sample - allowing it to dynamically adapt the window size over time.

Note that this entire approach is nonparametric, so it doesn't just substitute one parameter, window size, for another parameter or parameters.

How to determine moving window size?

3 Answers3