I am doing forecasts for a lot of time series. These consist of forecasts for specific items in specific regions, where all regions belong to a country. So this is a hierarchial series.
- Product 1 - Country 1
- product 1 - country 1 - region 1-1
- product 1 - country 1 - region 1-2
- Product 1 - Country 2
- Product 1 - Country 2 - region 2-1
- ........
- ........
And then there is of course also the "Total" of all.
My matrix cmat
contains all bottom level data, i.e. for all products and regions. Every column is a time series. The name of a column is for example: 000214ABCab
. The numbers are the product ID, the next 3 letters are the country, and the next two letters are the region. I want forecasts on this level, but also on the product-country level.
First I was struggling with the performance, as I have a lot of data. In the example below I only use a subset of 1000 time series.
subs = cmat[,1:1000]
hh = hts(subs, characters = c(maxlength+3,2))
ally = aggts(hh)
#make list of data tables with time series
subs2 = as.data.table(t(ally))
subs2$CODE = unlist(hh$labels)
datlist = subs2[, list(list(.SD)),by=CODE]$V1
setattr(datlist, 'names', subs2$CODE) #code is the identifier of products/country/region
s1 = Sys.time()
fcast = NULL
for(i in 1:nrow(subs2)){
if(i%%1000==0){
print(i)
}
fc = data.table(pmax(forecast(stlf(ts(matrix(datlist[[i]]),frequency=12)),h=12)$mean,0) )
names(fc) = names(datlist)[i]
fcast = rbind(fcast, t(fc))
}
s2 = Sys.time()
fcasts = t(fcast)
y.f = combinef(fcasts, get_nodes(hh),keep="all")
s3 = Sys.time()
I am also surprised by how fast combinef
is. However, I only want positive or zero forecasts. Preferably also integer forecasts. I already do pmax(forecast, 0)
in the loop. But by using combinef
, I lose this again.
If I do now again pmax
then I will also lose the functionality of the combinef
, right? So how can I make the forecasts positive? Also, for me it is not necessary to have a "total forecast" at all, but I don't know if I can omit this?