I have monthy historical data from 2017 to 2021 as seen below:
* 2017 - 1 - 1 4000$
* 2017 - 2 - 1 7000$
...
* 2021 - 11 - 1 5000$
* 2021 - 12 - 1 1000$
What i am doing is trying to fit a SARIMA model in my dataset to make predictions into the future.
To tune it i split my dataset to train,validation,test.
I am grid searching (p,d,q)(P,D,Q) in my train set and for each combination i am making a prediction into the future which has the same number of steps as my val set.
So what i am doing is for each prediction calculate RMSE based on my validation set and collectively choose the parameters that resulted in the best combination of AIC(train),RMSE(validaton).
After finding these parameters i re-train the sarima model with train+validation as the new train and predict to compare with test set.
Is this methodology considered wrong?
The optimal parameters i find this way do not produce the best(sometimes not even good) result in my test set.
How could this be done differently?