Splitting my forecast data in an optimal way

Asked Oct 22 '21 at 15:02

Active Oct 23 '21 at 13:43

Viewed 25 times

I have monthy historical data from 2017 to 2021 as seen below:

* 2017 - 1 - 1   4000$ 
* 2017 - 2 - 1   7000$

...

* 2021 - 11 - 1   5000$
* 2021 - 12 - 1   1000$

What i am doing is trying to fit a SARIMA model in my dataset to make predictions into the future.
To tune it i split my dataset to train,validation,test.
I am grid searching (p,d,q)(P,D,Q) in my train set and for each combination i am making a prediction into the future which has the same number of steps as my val set.
So what i am doing is for each prediction calculate RMSE based on my validation set and collectively choose the parameters that resulted in the best combination of AIC(train),RMSE(validaton).
After finding these parameters i re-train the sarima model with train+validation as the new train and predict to compare with test set.
Is this methodology considered wrong?
The optimal parameters i find this way do not produce the best(sometimes not even good) result in my test set.
How could this be done differently?

edited Oct 23 '21 at 13:43

kjetil b halvorsen

63,378
26
142
467

asked Oct 22 '21 at 15:02

Epitheoritis 32

This is rather abstract, maybe rather ask about your concrete forecasting problem? But see https://stats.stackexchange.com/questions/8807/cross-validating-time-series-analysis – kjetil b halvorsen Oct 23 '21 at 13:45

Splitting my forecast data in an optimal way

0 Answers0