33

I have a mixed effect model (in fact a generalized additive mixed model) that gives me predictions for a timeseries. To counter the autocorrelation, I use a corCAR1 model, given the fact I have missing data. The data is supposed to give me a total load, so I need to sum over the whole prediction interval. But I should also get an estimate of the standard error on that total load.

If all predictions would be independent, this could be easily solved by :

$Var(\sum^{n}_{i=1}E[X_i]) = \sum^{n}_{i=1}Var(E[X_i])$ with $Var(E[X_i]) = SE(E[X_i])^2$

Problem is, the predicted values are coming from a model, and the original data has autocorrelation. The whole problem leads to following questions :

  1. Am I correct in assuming that the SE on the calculated predictions can be interpreted as the root of the variance on the expected value of that prediction? I tend to interprete the predictions as "mean predictions", and hence sum a whole set of means.
  2. How do I incorporate the autocorrelation in this problem, or can I safely assume that it wouldn't influence the results too much?

This is an example in R. My real dataset has about 34.000 measurements, so scaleability is a problem. That's the reason why I model the autocorrelation within each month, otherwise the calculations aren't possible any more. It's not the most correct solution, but the most correct one isn't feasible.

set.seed(12)
require(mgcv)

Data <- data.frame(
    dates = seq(as.Date("2011-1-1"),as.Date("2011-12-31"),by="day")
)

Data <- within(Data,{
X <- abs(rnorm(nrow(Data),3))
Y <- 2*X + X^2 + scale(Data$dates)^2
month <- as.POSIXlt(dates)$mon+1
mday <- as.POSIXlt(dates)$mday
})

model <- gamm(Y~s(X)+s(as.numeric(dates)),correlation=corCAR1(form=~mday|month),data=Data)

preds <- predict(model$gam,se=T)

Total <- sum(preds$fit)

Edit :

Lesson to learn : first go through all the samples in all the help files before panicking. In the help files of predict.gam, I can find :

#########################################################
## now get variance of sum of predictions using lpmatrix
#########################################################

Xp <- predict(b,newd,type="lpmatrix") 

## Xp %*% coef(b) yields vector of predictions

a <- rep(1,31)
Xs <- t(a) %*% Xp ## Xs %*% coef(b) gives sum of predictions
var.sum <- Xs %*% b$Vp %*% t(Xs)

Which seems to be close to what I want to do. This still doesn't tell me exactly how it is done. I could get as far as the fact that it's based on the linear predictor matrix. Any insights are still welcome.

Joris Meys
  • 5,475
  • 2
  • 32
  • 43
  • 7
    Im not sure what the r program is doing but we have $$var(\sum_iE[X_i])=a^Tvar(E[X])a$$ Where $a$ is a column vector of ones and $var(E[X])$ is the covariance matrix for $E[X]=(E[X_1],\dots,E[X_n])^T$. Does this help? – probabilityislogic Apr 04 '12 at 22:36
  • @probabilityislogic That's basically what the r program is doing. Thx for the math – Joris Meys Apr 27 '12 at 08:16
  • 2
    @probabilityislogic If you can wrap that into an answer, you can grab my +50 bounty. ;) – e-sushi Nov 11 '13 at 12:08
  • One problem I see and maybe Im just misinterpreting your notation but $E(X_{i})=\mu_{i}$ which is a constant so $\sum_{i=1}^{n}Var(E[X_{i}])=0$ that is where Im confused mostly – user52220 Aug 07 '14 at 20:22
  • @user52220 That's where you're wrong. E(Xi) is the expected value and hence a random variable, whereas mu_i is the mean of the population and hence a fixed number. Var(mu) = 0, but the same is not correct for E(Xi). – Joris Meys Jan 12 '15 at 10:49

1 Answers1

1

In matrix notation a mixed model can be represented as

y = X*beta + Z*u + epsilon

where X and Z are known design matrices relating to the fixed effects and random effects observations, respectively.

I would apply a simple and adequate (but not the best) transformation for correcting for auto-correlation that involves the lose of the first observation, and replacing the column vector of [y1, y2,...yn] with a smaller by one observation column vector, namely: [y2 - rho*y1, y3 - rho*y2,..., yn - rho*y(n-1)], where rho is your estimated value for serial auto-correlation.

This can be performed by multiplying by a matrix T, forming T*y, where the 1st row of T is composed as follows: [ -rho, 1, 0, 0,....], the 2nd row: [ 0, -rho, 1, 0, 0, ...], etc. Similarly, the other design matrices are changed to T*X and T*Z. Also, the variance-covariance matrix of the error terms is altered as well, now with independent error terms.

Now, just compute the solution with the new design matrices.

AJKOER
  • 1,800
  • 1
  • 9
  • 9