Forecasting High-Frequency, Multi-Seasonal Data with External Regressors in ARIMA

Question

I have hourly summer cooling data for 4 months starting May 2016 to August 2016. In my data the cooling is high on standard business hours range from 08:00 to 21:00 during weekdays and is low in non-business hours and weekends.

I have some predictor variables that also time dependent, which I used them as xreg in auto.arima model. I used first three months as my training set and did a prediction and forecast on the fourth month.

However, my predictions are way off than the actual variables. I saw the post from Dr.Rob Hyndman suggested tbats model would be great to handle multiple seasonality, however, unfortunately, I can not include my xreg. Any ideas on how to tackle this problem?

So far, I have something similar like this, I set the frequency to 24 and used auto.arima to find the (p,d,q).

train_df <- df[1:2208,]
test_df <- df[2209:2902,]    

cooling <- ts(train_df$volume, frequency = 24)

trainreg <- cbind(Weekday=model.matrix(~as.factor(train_df$dayofweek)),
            temp1 = train_df$firsttemp, temp2 = train_df$secondtemp,
            humidity = train_df$gghhumidity)

testreg <- cbind(Weekday=model.matrix(~as.factor(test_df$dayofweek)),
           temp1 = test_df$firsttemp, temp2 = test_df$secondtemp,
           humidity = test_df$gghhumidity)

arimafit <- auto.arima(cooling,xreg = trainreg, stepwise=FALSE, approximation=FALSE, seasonal = TRUE) 
firstcast <- forecast(arimafit , h = 693, xreg=testreg)

firstpred <- predict((arimafit , h = 693, newxreg = testreg)

I got the following ARIMA values,

Series: cooling 
Regression with ARIMA(2,0,2)(1,0,0)[24] errors

When I compared the results, it was way off from the test set's cooling values.

Picture below is to show the high frequency of seasonality in data.

Forecasted values,

Any ideas or help to improve the results will be appreciated. Thanks!

@ IrishStat Sorry, unfortunately, I can not post data. QQ, is my approach correct? — i.n.n.m, Jun 26 '17 at 21:26
then code the data to mask it . I have seen data before where the within day structure is different from weekdays to weekends.suggesting a mixed frequency approach. If you can't or won't post similar data be it simulated or simply possibly double scaled I won't be able to illustrate/demonstrate what might be a useful solution. Your call ! Just think that posting the data is the price of poker. Whether or not my solution may be useful for you is judged/evaluated based upon actual practice and not hand waving. — IrishStat, Jun 26 '17 at 21:32
@IrishState I have a modified reusable version of my data that i can share. How can I post it here? — i.n.n.m, Jun 26 '17 at 21:55
There ways but I am having a senior moment and I forget precisely how to do that .. i think lockbox was the tool of choice. At a minimum you can email it to me . — IrishStat, Jun 26 '17 at 22:00

IrishStat · Accepted Answer · 2017-06-30T10:29:15.010

I took your data including three stochastic predictors into AUTOBOX ( a piece of software that is available in R which I have helped develop ). My intent here was to provide some top level guidance for you and the list which is why I wanted some dummy data (coded) to prove the point that something practical could result . Following is a picture of the actual and forecasts . I used the most recent 100 days , some 2400 observations and predicted out 7 days (168 values) .

The issue here is to

identify the hour of the day efffects and the day-of-the week effects .Longer series would facillitate holiday effects etc. to be conditioned for
the form of the relationship between the dependent series and the three candidate input series i.e contemporaneous/lagged etc.

while

identifying and remeding unusual data points that would distort 1 and 2
Furthermore it is important to idenify any time trends or level shifts which untreated would provide distortion to 1 and 2

ARIMA modelling (univariate Box-Jenins) is of little or no value when dealing hourly /daily data full of holiday/weekend/economic activity. Transfer Functions i.e. regression on steroids is the play delivering assignable cause to hours of the day , day-of-the-week , week-of-the-month , days-of-the month , weeks-of-the monrth , long weekends et.al. All of the "rear-window driving mechanisms" are anachronistic (that's a pun ! ) .

I am unaware of any tools that you might have access to that comprehensively solves the problem so you may have no recourse but to write your own procedures as intervention detection is broadly unavailable with causal series while identifying the presence of time trends is still rare. Decompose a time series data into deterministic trend and stochastic trend .

Here is the model (with some coefficients masked ) for confidentiality reasons and here and here

The plot of the model residuals seems quite correct suggesting that we have successfully separated signal and noise .

I hope this helps you and motivates you to independently perform a similar analysis.

In closing the future expectataions for the three candidate supporting series for the next 168 periods are VERY CRITICAL in providing the expectation of Y.

EDITED AFTER INVESTIGATING ANOTHER OPTION IN AUTOBOX:

Here is the actual/fit and forecast ... very improved

with forecasts for the next 21 days here

Thank you for your time look at the question. Due to the high-frequency and mutli-seasonality, ARIMA did not work the best. I have two questions from your answer, I see you have done a D-W statistic test, is it something that can be done in R for my ARIMA model I came up with? Also, I am on the process of trying out `Ordinary Least Sqaures` and `Support Vector Regression`. However, you mentioned about the transfer function, is it something to consider `ARIMA` and a different regression model? thank you! — i.n.n.m, Jun 27 '17 at 18:00
the DW test is just window dressing and mostly optional. OLS is insuffficient because it would be tarnished by the time trends , level shifts , pulses . etc. . Identification of the form of the contemporaneous and lagged effects would be mostly ineffective. — IrishStat, Jun 27 '17 at 20:19

Forecasting High-Frequency, Multi-Seasonal Data with External Regressors in ARIMA

1 Answers1