I have hourly summer cooling data for 4 months starting May 2016 to August 2016. In my data the cooling is high on standard business hours range from 08:00 to 21:00 during weekdays and is low in non-business hours and weekends.
I have some predictor variables that also time dependent, which I used them as xreg
in auto.arima model. I used first three months as my training set and did a prediction and forecast on the fourth month.
However, my predictions are way off than the actual variables. I saw the post from Dr.Rob Hyndman suggested tbats
model would be great to handle multiple seasonality, however, unfortunately, I can not include my xreg
. Any ideas on how to tackle this problem?
So far, I have something similar like this,
I set the frequency to 24 and used auto.arima
to find the (p,d,q).
train_df <- df[1:2208,]
test_df <- df[2209:2902,]
cooling <- ts(train_df$volume, frequency = 24)
trainreg <- cbind(Weekday=model.matrix(~as.factor(train_df$dayofweek)),
temp1 = train_df$firsttemp, temp2 = train_df$secondtemp,
humidity = train_df$gghhumidity)
testreg <- cbind(Weekday=model.matrix(~as.factor(test_df$dayofweek)),
temp1 = test_df$firsttemp, temp2 = test_df$secondtemp,
humidity = test_df$gghhumidity)
arimafit <- auto.arima(cooling,xreg = trainreg, stepwise=FALSE, approximation=FALSE, seasonal = TRUE)
firstcast <- forecast(arimafit , h = 693, xreg=testreg)
firstpred <- predict((arimafit , h = 693, newxreg = testreg)
I got the following ARIMA values,
Series: cooling
Regression with ARIMA(2,0,2)(1,0,0)[24] errors
When I compared the results, it was way off from the test set's cooling values.
Picture below is to show the high frequency of seasonality in data.
Any ideas or help to improve the results will be appreciated. Thanks!