question about bayesian structural time series model

Question

I am investigating the stability of results from a bayesian structural time series model using bsts package in R. The following code estimates a local linear trend model (using an example from the R package) for three different numbers of MCMC draws (100, 1000, 10000). For each of the three cases, I repeat the estimation 10 times and store the R-square in a data frame:

library(bsts)
data(AirPassengers)
y <- log(AirPassengers)
df=data.frame(niter100=NA,niter1000=NA,niter10000=NA)
dfcolumn=0
for (j in c(100,1000,10000)) {
dfcolumn=dfcolumn+1
  for (i in 1:10) {
  ss <- AddLocalLinearTrend(list(), y)
  model_benchmark <- bsts(y,state.specification = ss,niter = j)
  summary=summary(model_benchmark)
  df[i,dfcolumn]=summary$rsquare
  }
}

Below the results:

> df
        niter100 niter1000  niter10000
    1  0.9058217 0.8959122  0.8352333
    2  0.9058217 0.6254595  0.7148984
    3  0.9058217 0.8956490  0.8840317
    4  0.9058217 0.9071929  0.8682971
    5  0.9050454 0.9076566  0.9017717
    6  0.9050454 0.8904109  0.8416038
    7  0.9050454 0.9073501  0.8674943
    8  0.9050454 0.9059262  0.8563360
    9  0.9050454 0.9070879  0.8585177
    10 0.9050454 0.6612644  0.8920700

My expectation is that the results should get more accurate and stable as we increase the number of MCMC draws. The above test however indicates to me that results tend to become less stable with more MCMC draws, i.e. for 100 iterations the R-square varies only marginally while for 1000 iterations it varies between 0.63 and 0.91. What could be the reason for this? Are there any strategies how to deal with this in a real application?

score 7 · Accepted Answer · answered May 19 '16 at 06:47

I can see several potential issues with your example:

You don't specify a seed so bsts will use the system clock and serial correlation between successive monte carlo runs will mess up your statistics
Your chosen metric, rsquare might not be what you think it is (see the help for summary.bsts)
Your model is not a great fit to the data, so it might take a LOT of samples to converge

Modifying your code to address 1 and 2...

library(bsts)
data(AirPassengers)
y <- log(AirPassengers)

res <- list()

for (j in c(100,1000,10000)) {
  res.inner <- list()
  for (i in 1:10) {
    ss <- AddLocalLinearTrend(list(), y)
    # ss <- AddSeasonal(ss, y, nseasons = 12)
    seed <- floor(i*j*(as.numeric(Sys.time()) %% pi))
    model_benchmark <- bsts(y, state.specification = ss, niter = j, seed=seed)
    x <- summary(model_benchmark)
    res.inner[[i]] <- c(x$residual.sd, x$prediction.sd, x$rsquare, x$relative.gof)
  }
  df.inner <- Reduce(rbind, res.inner)
  colnames(df.inner) <- c("residual.sd", "prediction.sd", "rsquare", "relative.gof")
  res[[j]] <- df.inner
  print(res[[j]])
}

X <- Reduce(rbind, lapply(res, function(x) {if (length(x) > 0) apply(x,2,sd)}))
row.names(X) <- c("100", "1000", "10000")

X

...produces results similar to yours:

       residual.sd prediction.sd     rsquare relative.gof
100   0.0005273839  2.215716e-05 0.000728682 0.0005939483
1000  0.0163049235  5.001772e-04 0.026663130 0.0133087130
10000 0.0244355253  6.447072e-04 0.042745282 0.0170885684

Now, if we add in the seasonal term by uncommenting this line

# ss <- AddSeasonal(ss, y, nseasons = 12)

We get:

       residual.sd prediction.sd      rsquare relative.gof
100   0.0022619056  2.987453e-04 0.0007276338 0.0031640530
1000  0.0018138912  1.890575e-04 0.0005448601 0.0020016078
10000 0.0007862781  2.854752e-05 0.0002245090 0.0002997004

So it looks like the culprit is number 3 - a local linear trend is not a good fit of the highly seasonal AirPassenger data set.

question about bayesian structural time series model

1 Answers1