4

I am working on project to forecast sales of stores to learn forecasting. Until now I have successfully used simple auto.arima function for forecasting. But to make these forecast more accurate I can make use of covariates. I have defined covariates like holidays, promotion which affect on sales of store using xreg argument with the help of this post: How to setup xreg argument in auto.arima() in R?

But my code fails at line:

ARIMAfit <- auto.arima(saledata, xreg=covariates)

and gives error saying:

Error in model.frame.default(formula = x ~ xreg, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'xreg')
In addition: Warning message:
In !is.na(x) & !is.na(rowSums(xreg)) :
  longer object length is not a multiple of shorter object length

Below is link to my Dataset: https://drive.google.com/file/d/0B-KJYBgmb044blZGSWhHNEoxaHM/view?usp=sharing

This is my code:

data = read.csv("xdata.csv")[1:96,]
View(data)

saledata <- ts(data[1:96,4],start=1, end=96,frequency =7 )
View(saledata)

saledata[saledata == 0] <- 1
View(saledata)

covariates = cbind(DayOfWeek=model.matrix(~as.factor(data$DayOfWeek)),
             Customers=data$Customers,
             Open=data$Open,
             Promo=data$Promo,
             SchoolHoliday=data$SchoolHoliday)
View(head(covariates))


# Remove intercept
covariates <- covariates[,-1]
View(covariates)

require(forecast)
ARIMAfit <- auto.arima(saledata, xreg=covariates)//HERE IS ERROR LINE
summary(ARIMAfit)

Also tell me how I can forecast for the next 48 days. I know how to forecast using simple auto.arima using the argument n.ahead but I don't know how to do it when the argument xreg is used.

ptim ktim
  • 41
  • 1
  • 2
  • 4
  • Make sure you consider different days-of-week effects, different months-of-the-the year effects, level shift effects, local time trend effects , lead and lag effects around holidays/events , particular days-of-the-month effects , particular weeks-of-the-month effects , anomalous data points , long-weekend effects etc.. The data you are attempting to analyze in statistical sense is in the "deep end of the pool" so to speak thus simple methods/solutions will probably be deficient.otherwise you may "drown" – IrishStat Dec 13 '15 at 15:03
  • @IrishStat Can you please tell me what model to use if you have these many variables as you mentioned above. I saw your post in another topic that we can have 29 dummy var for days of week and hours of day. But, not sure how to handle these many variables. – tjt Apr 09 '20 at 05:04
  • I will look at your data when I get some free time – IrishStat Apr 10 '20 at 20:31

1 Answers1

6

Basically what caused the issue is the line ts(data[1:96,4],start=1, end=96,frequency =7 ), when you specify both start and end with frequency = 7, R is trying multiply the series so that it has a length of 96 weeks.

Recall R defines the start and end time in seasons (weeks in your case). Since you are fitting daily data, only specifying start = 0 or start = 1 should be sufficient.

Instead of running View(saledata), try to use saledata to debug yourself and you can see wrong length of time series is outputted .

Start = c(1, 1) 
End = c(96, 1)  

When you do ARIMA forecast with xreg, basically you will need to create a matrix newxreg for your next 48 days with the same structure as xreg, then specify newxreg = newxreg in the forecast function. A good habit for the xreg and newxreg matrix would be to include a Day column that acts as an ordering for the data.

Matthew Lau
  • 417
  • 2
  • 10