R2 value lower on all subsets of data than entire dataset

Question

I have a little bit of a conundrum that I am trying to work through.

I'm fitting a time series model with a rolling window of around ~300 samples and then predicting the next sample that comes up.

I fit this on about 7 years of data, and when I am finished my overall R2 value is around 0.85. However, when I calculate my R2 value for each individual year within my data, every year is lower than that overall value.

This struck me as odd, since a priori, I would expect the scores for individual years to be a combination of higher and lower and average out to be about the same. But the average yearly score is 0.65.

Further more, this pattern seems to be fractal. My monthly scores have an average R2 value that's negative, with a max value of 0.65 (my average monthly R2).

My R2 value seems to go up the bigger the time series (keep in mind though that I'm only ever using 300 data points for a single fit).

Is this behavior typical? Is this a common pathology of the R2 value or is it unique to my data?

For clarification:

All of these predictions are out-of-sample. The values I'm discussing are the one-step-ahead out of sample predictions. For both the whole dataset and individual months.
Model being used is a GBM

See [Simpson's Paradox](https://en.wikipedia.org/wiki/Simpson%27s_paradox). It should become clear if you plot your time series. — jkpate, Nov 23 '21 at 12:05
Do we talk about in-sample R2 or out of sample? Do you fit by OLS? With an intercept (in which case there should never be a negative R2)? — Christoph Hanck, Nov 23 '21 at 14:06
My answer in the duplicate thread includes this very example to illustrate why this use of $R^2$ is problematic. — whuber, Nov 23 '21 at 16:37

R2 value lower on all subsets of data than entire dataset

0 Answers0