I've done some thinking about this in a different context and came up with an approach that seemed reasonable intuitively, although I have a compsci rather than stats background.
The motivation for a smaller window size is increased sensitivity to changes in the underlying process from which you are sampling. I'll call this "predictive value".
Let's say they are both expressed in units of the predictive value or predictive error we expect due to the bias or variance respectively.
The motivation for a larger window size is decreased noise due to small sample size. This is the sample standard deviation:
standard_deviation(samples_in_window) / sqrt(size(samples_in_window))
On the predictive value side, this is the difference between the mean of all samples and the mean of samples within the window.
So, our task is to select the window size that maximizes predictive accuracy, which is the predictive value minus the predictive error.
This can be implemented quite efficiently with a little thought, and if so the window size could be re-computed every time we receive a new sample - allowing it to dynamically adapt the window size over time.
Note that this entire approach is nonparametric, so it doesn't just substitute one parameter, window size, for another parameter or parameters.