1

I am working on a data consisting of number of customers visiting a clinic for an X-ray scan on the daily basis. I have the data for the last 4 years. I am building a time series model to predict the number of customers visiting on a daily basis. On a usual week day there are around hundred customers per day. On Saturdays there are around maybe 30-50 customers and on Sundays there mostly no customers or less than 10 customers. I have divided the data in training and testing part.

Below is the plot of raw data.

Plot for the number of customers visiting the clinic for X-rays scans

Clearly the data does not looks stationary. I also used the ADF test and the KPSS test to check if the data looks stationary or not.

adf.test(train_data)

Augmented Dickey-Fuller Test

data: ts_beverly_train
Dickey-Fuller = -8.0101, Lag order = 10, p-value = 0.01
alternative hypothesis: stationary

kpss.test(ts_beverly_train)

KPSS Test for Level Stationarity

data: ts_beverly_train
KPSS Level = 0.28099, Truncation lag parameter = 7, p-value = 0.1

Even though both the test shows the data is stationary, the plot does not looks stationary. So I tried to make the data stationary by differencing.

Plot after 1st differencing

Now the data looks stationary. I confirmed it using the ADF test and the KPSS test.

adf.test(ts_volume_data2_diff1)

Augmented Dickey-Fuller Test

data: ts_volume_data2_diff1
Dickey-Fuller = -14.981, Lag order = 10, p-value = 0.01
alternative hypothesis: stationary

Next I tried plotting the ACF and PACF after 1st differencing

The ACF and PACF plot after 1st differencing

We can see a spike after every 7th lag in ACF as there is a weekly seasonality. To capture seasonality I want to run a seasonal ARIMA.

Now I have two questions
1. What values of ARIMA(p,d,q)(P,D,Q)[7] should be consider?
2. What should I use to capture the long term yearly seasonality along with weekly?

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
Prasad Dalvi
  • 135
  • 11
  • Hi you can try out `auto.arima`. It automatically finds the best parameters for you based on AIC. – Kane Chua May 31 '19 at 14:27
  • https://stats.stackexchange.com/search?tab=newest&q=user%3a3382%20daily%20data presents examples of how daily data should be handled taking into account latent deterministic factors in your data . Post your data in a csv file with all days accounted for ..and give the starting date and the country – IrishStat May 31 '19 at 21:18
  • 2
    ARIMA selection needs to be done AFTER you have accounted for calendar effects. – IrishStat May 31 '19 at 21:19
  • @KaneChua I have tried `auto.arima` but I am not able to capture the yearly seasonality – Prasad Dalvi Jun 10 '19 at 09:16
  • @IrishStat I have one question for you, by looking at the graph above of **ts_volume_data2_diff1**, do you think the series is stationary after 1st differencing? – Prasad Dalvi Jun 10 '19 at 09:18
  • 1
    You say" We can see a spike after every 7th lag in ACF as there is a weekly seasonality. To capture seasonality I want to run a seasonal ARIMA. I say "adjust your data for DETERMINISTIC SEASONALITY by incorporating 6 daily dummies and then adjust for outliers and then identify the p and q from the acf/pacf " https://stats.stackexchange.com/questions/108877/capturing-seasonality-in-multiple-regression-for-daily-data/108905#108905 and https://stats.stackexchange.com/questions/313810/simple-method-of-forecasting-number-of-guests-given-current-and-historical-data/313852#313852 will be of help to y – IrishStat Jun 10 '19 at 11:38

0 Answers0