What values of ARIMA(p,d,q)(P,D,Q)[7] should I use?

Question

I am working on a data consisting of number of customers visiting a clinic for an X-ray scan on the daily basis. I have the data for the last 4 years. I am building a time series model to predict the number of customers visiting on a daily basis. On a usual week day there are around hundred customers per day. On Saturdays there are around maybe 30-50 customers and on Sundays there mostly no customers or less than 10 customers. I have divided the data in training and testing part.

Below is the plot of raw data.

Clearly the data does not looks stationary. I also used the ADF test and the KPSS test to check if the data looks stationary or not.

adf.test(train_data)

Augmented Dickey-Fuller Test

data: ts_beverly_train
Dickey-Fuller = -8.0101, Lag order = 10, p-value = 0.01
alternative hypothesis: stationary

kpss.test(ts_beverly_train)

KPSS Test for Level Stationarity

data: ts_beverly_train
KPSS Level = 0.28099, Truncation lag parameter = 7, p-value = 0.1

Even though both the test shows the data is stationary, the plot does not looks stationary. So I tried to make the data stationary by differencing.

Now the data looks stationary. I confirmed it using the ADF test and the KPSS test.

adf.test(ts_volume_data2_diff1)

Augmented Dickey-Fuller Test

data: ts_volume_data2_diff1
Dickey-Fuller = -14.981, Lag order = 10, p-value = 0.01
alternative hypothesis: stationary

Next I tried plotting the ACF and PACF after 1st differencing

We can see a spike after every 7th lag in ACF as there is a weekly seasonality. To capture seasonality I want to run a seasonal ARIMA.

Now I have two questions
1. What values of ARIMA(p,d,q)(P,D,Q)[7] should be consider?
2. What should I use to capture the long term yearly seasonality along with weekly?

Hi you can try out `auto.arima`. It automatically finds the best parameters for you based on AIC. — Kane Chua, May 31 '19 at 14:27
https://stats.stackexchange.com/search?tab=newest&q=user%3a3382%20daily%20data presents examples of how daily data should be handled taking into account latent deterministic factors in your data . Post your data in a csv file with all days accounted for ..and give the starting date and the country — IrishStat, May 31 '19 at 21:18
ARIMA selection needs to be done AFTER you have accounted for calendar effects. — IrishStat, May 31 '19 at 21:19
@KaneChua I have tried `auto.arima` but I am not able to capture the yearly seasonality — Prasad Dalvi, Jun 10 '19 at 09:16
@IrishStat I have one question for you, by looking at the graph above of **ts_volume_data2_diff1**, do you think the series is stationary after 1st differencing? — Prasad Dalvi, Jun 10 '19 at 09:18
You say" We can see a spike after every 7th lag in ACF as there is a weekly seasonality. To capture seasonality I want to run a seasonal ARIMA. I say "adjust your data for DETERMINISTIC SEASONALITY by incorporating 6 daily dummies and then adjust for outliers and then identify the p and q from the acf/pacf " https://stats.stackexchange.com/questions/108877/capturing-seasonality-in-multiple-regression-for-daily-data/108905#108905 and https://stats.stackexchange.com/questions/313810/simple-method-of-forecasting-number-of-guests-given-current-and-historical-data/313852#313852 will be of help to y — IrishStat, Jun 10 '19 at 11:38

What values of ARIMA(p,d,q)(P,D,Q)[7] should I use?

0 Answers0