I've been using ARIMA modelling to predict the number of orders a business receives. I have data for 3 years, and the time series shows a strong (uneven) upward trend, with increasing variance over time. I can't share my data due to confidentiality issues, but here's a mockup of what it looks like over 3 years:[![An example of what my data looks like][1]][1]
So, the business is made up of offices in different cities. I'm using vectorized autoregressions to predict the order volumes in each city, and then summing them up to get total predicted order volumes. So for City 1, the endogenous variables are the past order volumes in City 1 - I've put in AR(1) and AR(52) terms. The exogenous variables here would be AR(1) and AR(52) terms in Cities 2 to 10, as well as a few dummy variables for month, year, public holidays, etc. I'm not using the VAR package in R - I'm manually running linear regressions to get the predicted values for the test sets in each city, so I have more control over the outputs and method.
I have logged and differenced the data to make it stationary. When I ran the model, I got an overall MAPE of 23% on my predictions, as well as odd swings in predicted values. See below:
[![Actual vs. Predicted][2]][2]
I then tried running the model on actual orders (ie. no logging or differencing). My MAPE was 12%, and there were no swings in the data. I got similar results for every country I ran the model on - MAPEs almost halving when I didn't log and/or difference the data. If it helps, I ran a KPSS test on the residuals here, and they are stationary around a constant mean. See below:
[![Actual vs. Predicted][3]][3]
I have a few questions:
1) Should I implement my model without accounting for stationarity since my predictions and MAPEs are so much better?
2) Is there a better way to account for trend and seasonality, since my data isn't a great example of cyclical data with a constant trend?
Any advice would be useful!