I have some data that is gathered once every 5 minutes and I'd like to create an ARIMA(p, d, q) (P, D, Q) m
model for it in order to forecast what might come next.
I have a week's worth of data, being gathered at every 5 minutes, there are 2016 data points in the set.
The data looks like this:
The data is clearly seasonal by day.
The problem is that I have too many points in a day (288) in order to set the m
parameter for ARIMA to 288 (288 = 24*12 = 24hrs in day where each hour contains 12 periods of length = 5 minutes).
Is the value 288 correct for m
? If yes, what can be done about this high value?
The autocorrelation plot, as a whole looks like this:
Now, as I understand from the web, this plot being sinusoidal tells me that I need the nonseasonal part of the ARIMA model to look something like this: (p,d,0)
.
The problem is that when I try to create the Partial Autocorrelation plot like this (in python, using the statsmodels.graphics.tsaplots
package):
plot_pacf(data, lag=50)
pyplot.show()
It takes a very long time, presumably because the data is too frequent?
This would allow me to choose the p
and P
parameter depending of where I had a spike in the data (at the beginning for the nonseasonal part and later in the data, for the seasonal part).
Still, the PACF plot for a single day looks like this:
plot_pacf(data.iloc[0:288])
pyplot.show()
Now this suggests a p
of 2, since it is at lag 2 that I have the last peak.
But from this I cannot infer neither P
, nor D
, since the plot is just a single period.
What should I do to be able to determine the P
and the D
parameters from the complete 7 day data set?