I have a little bit of a conundrum that I am trying to work through.
I'm fitting a time series model with a rolling window of around ~300 samples and then predicting the next sample that comes up.
I fit this on about 7 years of data, and when I am finished my overall R2 value is around 0.85. However, when I calculate my R2 value for each individual year within my data, every year is lower than that overall value.
This struck me as odd, since a priori, I would expect the scores for individual years to be a combination of higher and lower and average out to be about the same. But the average yearly score is 0.65.
Further more, this pattern seems to be fractal. My monthly scores have an average R2 value that's negative, with a max value of 0.65 (my average monthly R2).
My R2 value seems to go up the bigger the time series (keep in mind though that I'm only ever using 300 data points for a single fit).
Is this behavior typical? Is this a common pathology of the R2 value or is it unique to my data?
For clarification:
- All of these predictions are out-of-sample. The values I'm discussing are the one-step-ahead out of sample predictions. For both the whole dataset and individual months.
- Model being used is a GBM