8

I got a question about modeling time series in R. my data consist of the following matrix:

1   0.03333333 0.01111111 0.9555556
2   0.03810624 0.02309469 0.9387991
3   0.00000000 0.03846154 0.9615385
4   0.03776683 0.03119869 0.9310345
5   0.06606607 0.01201201 0.9219219
6   0.03900325 0.02058505 0.9404117
7   0.03125000 0.01562500 0.9531250
8   0.00000000 0.00000000 1.0000000
9   0.04927885 0.01802885 0.9326923
10  0.06106870 0.02290076 0.9160305
11  0.03846154 0.00000000 0.9615385
12  0.00000000 0.00000000 1.0000000
13  0.06028636 0.03843256 0.9012811
14  0.09646302 0.05144695 0.8520900
15  0.04444444 0.06666667 0.8888889

these matrix has in total 200 rows.

as you can see in each situation the sum of each row is 1, that becomes because the values are the percentage of a whole. for example row 1 contains 3.33% of variable a, 1.11% of variable 2 and 95.5% of veriable 3. the first collomn indicates the year that the values are measured.

my target is to make a prediction for the next 5 years, so from year 200 to 205.

I can doe that by making three normal time series forecast. But for that forecast the total sum is never equal to 1, which is very important. Normaly is use techniques like arima and exponential smoothing.

Does somebody know a method to make a forecast for such a problem?

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
karmabob
  • 125
  • 5
  • Do you know if these are discrete proportions (eg, number of units in categories A, B, & C, given a total number of units) or continuous proportions? If they are discrete, do you know the total number of units at each time point? – gung - Reinstate Monica Mar 25 '15 at 16:21
  • Yes these values comes from discrete units, I calculated these ratios with use of the total numbers. – karmabob Mar 26 '15 at 08:22

1 Answers1

8

You are attempting to forecast a compositional time series. That is, you have three components that are all constrained to lie between 0 and 1 and to add up to 1.

You can address this issue using standard exponential smoothing, by using an appropriate generalized logistic transformation. There was a presentation on this by Koehler, Snyder, Ord & Beaumont at the 2010 International Symposium on Forecasting, which turned into a paper (Snyder et al., 2017, International Journal of Forecasting).

Let's walk though this with your data. Read the data into a matrix obs of time series:

obs <- structure(c(0.03333333, 0.03810624, 0, 0.03776683, 0.06606607, 
0.03900325, 0.03125, 0, 0.04927885, 0.0610687, 0.03846154, 0, 
0.06028636, 0.09646302, 0.04444444, 0.01111111, 0.02309469, 0.03846154, 
0.03119869, 0.01201201, 0.02058505, 0.015625, 0, 0.01802885, 
0.02290076, 0, 0, 0.03843256, 0.05144695, 0.06666667, 0.9555556, 
0.9387991, 0.9615385, 0.9310345, 0.9219219, 0.9404117, 0.953125, 
1, 0.9326923, 0.9160305, 0.9615385, 1, 0.9012811, 0.85209, 0.8888889
), .Dim = c(15L, 3L), .Dimnames = list(NULL, c("Series 1", "Series 2", 
"Series 3")), .Tsp = c(1, 15, 1), class = c("mts", "ts", "matrix"
))

You can check whether this worked by typing

obs

Now, you have a few zeros in there, which will be a problem once you take logarithms. A simple solution is to set everything that is less than a small $\epsilon$ to that $\epsilon$:

epsilon <- 0.0001
obs[obs<epsilon] <- epsilon

Now the modified rows do not sum to 1 any more. We can rectify that (although I think this might make the forecast worse):

obs <- obs/matrix(rowSums(obs),nrow=nrow(obs),ncol=ncol(obs),byrow=FALSE)

Now we transform the data as per page 35 of the presentation:

zz <- log(obs[,-ncol(obs)]/obs[,ncol(obs)])
colnames(zz) <- head(colnames(obs),-1)
zz

Load the forecast package and set a horizon of 5 time points:

library(forecast)
horizon <- 5

Now model and forecast the transformed data column by column. Here I am simply calling ets(), which will attempt to fit a state space exponential smoothing model. It turns out that it uses single exponential smoothing for all three series, but especially if you have more than 15 time periods, it may select trend models. Or if you have monthly data, explain to R that you have a potential seasonality, by using ts() with frequency=12 - then ets() will look at seasonal models.

baz <- apply(zz,2,function(xx)forecast(ets(xx),horizon=horizon)["mean"])
forecasts.transformed <- cbind(baz[[1]]$mean,baz[[2]]$mean)

Next we backtransform the forecasts as per page 38 of the presentation:

forecasts <- cbind(exp(forecasts.transformed),1)/(1+rowSums(exp(forecasts.transformed)))

Finally, let's plot histories and forecasts:

plot(obs[,1],ylim=c(0,1),xlim=c(1,nrow(obs)+horizon),type="n",ylab="")
for ( ii in 1:ncol(obs) ) {
    lines(obs[,ii],type="o",pch=19,col=ii)
    lines(forecasts[,ii],type="o",pch=21,col=ii,lty=2)
}
legend("left",inset=.01,lwd=1,col=1:ncol(obs),pch=19,legend=colnames(obs))

compositional forecasts

EDIT: a paper on compositional time series forecasting just appeared. I haven't read it, but it may be of interest.

Stephan Kolassa
  • 95,027
  • 13
  • 197
  • 357
  • Thanks @stephan Kolassa, that is exactly what I mean. Does anybody knows wether there is toolpack in R to make a forecasting with compositional time series. – karmabob Mar 26 '15 at 09:21
  • I don't think so, and these authors are not among the "always write a companion R package to any publication" people. But their approach is pretty simple and should not require more than five lines to preprocess your time series, after which you can use the standard tools (e.g., `ets()` in the `forecast` package for state space exponential smoothing). – Stephan Kolassa Mar 26 '15 at 10:11
  • My mathematical knowledge is not that good. can you maybe help with the pre processing the data? when the data is in the right format i can do the forecasting process. – karmabob Mar 26 '15 at 16:00
  • I was wandering if you have found some time to help me with the problem I had with the compositional time series? – karmabob Apr 01 '15 at 14:47
  • Here you go. Sorry it took a while; I actually got stuck understanding the presentation and had to clear my head. – Stephan Kolassa Apr 01 '15 at 15:33
  • There are essentially two variables to model since the three variables should sum to one. – Vladislavs Dovgalecs Apr 01 '15 at 15:54
  • @xeon: indeed. In the code above, `zz` is a matrix with two transformed columns, corresponding to the first two columns of the original three ones. The problem in "simply" forecasting these is twofold: (a) the "simple" forecast of one of these columns could be <0, and (b) the sum of the "simple" forecasts of these two could be >1 (so the third component would by <0). – Stephan Kolassa Apr 01 '15 at 15:59
  • @StephanKolassa Thanks a lot for al the good help:D – karmabob Apr 02 '15 at 13:47
  • @karmabob: I just edited the answer to link to a paper on compositional time series forecasting that just appeared - may be interesting. – Stephan Kolassa Jun 24 '15 at 01:46