Variance on the sum of predicted values from a mixed effect model on a timeseries

Question

I have a mixed effect model (in fact a generalized additive mixed model) that gives me predictions for a timeseries. To counter the autocorrelation, I use a corCAR1 model, given the fact I have missing data. The data is supposed to give me a total load, so I need to sum over the whole prediction interval. But I should also get an estimate of the standard error on that total load.

If all predictions would be independent, this could be easily solved by :

$Var(\sum^{n}_{i=1}E[X_i]) = \sum^{n}_{i=1}Var(E[X_i])$ with $Var(E[X_i]) = SE(E[X_i])^2$

Problem is, the predicted values are coming from a model, and the original data has autocorrelation. The whole problem leads to following questions :

Am I correct in assuming that the SE on the calculated predictions can be interpreted as the root of the variance on the expected value of that prediction? I tend to interprete the predictions as "mean predictions", and hence sum a whole set of means.
How do I incorporate the autocorrelation in this problem, or can I safely assume that it wouldn't influence the results too much?

This is an example in R. My real dataset has about 34.000 measurements, so scaleability is a problem. That's the reason why I model the autocorrelation within each month, otherwise the calculations aren't possible any more. It's not the most correct solution, but the most correct one isn't feasible.

set.seed(12)
require(mgcv)

Data <- data.frame(
    dates = seq(as.Date("2011-1-1"),as.Date("2011-12-31"),by="day")
)

Data <- within(Data,{
X <- abs(rnorm(nrow(Data),3))
Y <- 2*X + X^2 + scale(Data$dates)^2
month <- as.POSIXlt(dates)$mon+1
mday <- as.POSIXlt(dates)$mday
})

model <- gamm(Y~s(X)+s(as.numeric(dates)),correlation=corCAR1(form=~mday|month),data=Data)

preds <- predict(model$gam,se=T)

Total <- sum(preds$fit)

Edit :

Lesson to learn : first go through all the samples in all the help files before panicking. In the help files of predict.gam, I can find :

#########################################################
## now get variance of sum of predictions using lpmatrix
#########################################################

Xp <- predict(b,newd,type="lpmatrix") 

## Xp %*% coef(b) yields vector of predictions

a <- rep(1,31)
Xs <- t(a) %*% Xp ## Xs %*% coef(b) gives sum of predictions
var.sum <- Xs %*% b$Vp %*% t(Xs)

Which seems to be close to what I want to do. This still doesn't tell me exactly how it is done. I could get as far as the fact that it's based on the linear predictor matrix. Any insights are still welcome.

Im not sure what the r program is doing but we have $$var(\sum_iE[X_i])=a^Tvar(E[X])a$$ Where $a$ is a column vector of ones and $var(E[X])$ is the covariance matrix for $E[X]=(E[X_1],\dots,E[X_n])^T$. Does this help? — probabilityislogic, Apr 04 '12 at 22:36
@probabilityislogic That's basically what the r program is doing. Thx for the math — Joris Meys, Apr 27 '12 at 08:16
@probabilityislogic If you can wrap that into an answer, you can grab my +50 bounty. ;) — e-sushi, Nov 11 '13 at 12:08
One problem I see and maybe Im just misinterpreting your notation but $E(X_{i})=\mu_{i}$ which is a constant so $\sum_{i=1}^{n}Var(E[X_{i}])=0$ that is where Im confused mostly — user52220, Aug 07 '14 at 20:22
@user52220 That's where you're wrong. E(Xi) is the expected value and hence a random variable, whereas mu_i is the mean of the population and hence a fixed number. Var(mu) = 0, but the same is not correct for E(Xi). — Joris Meys, Jan 12 '15 at 10:49

score 1 · Answer 1 · answered Aug 20 '14 at 22:08

In matrix notation a mixed model can be represented as

y = X*beta + Z*u + epsilon

where X and Z are known design matrices relating to the fixed effects and random effects observations, respectively.

I would apply a simple and adequate (but not the best) transformation for correcting for auto-correlation that involves the lose of the first observation, and replacing the column vector of [y1, y2,...yn] with a smaller by one observation column vector, namely: [y2 - rho*y1, y3 - rho*y2,..., yn - rho*y(n-1)], where rho is your estimated value for serial auto-correlation.

This can be performed by multiplying by a matrix T, forming T*y, where the 1st row of T is composed as follows: [ -rho, 1, 0, 0,....], the 2nd row: [ 0, -rho, 1, 0, 0, ...], etc. Similarly, the other design matrices are changed to T*X and T*Z. Also, the variance-covariance matrix of the error terms is altered as well, now with independent error terms.

Now, just compute the solution with the new design matrices.

Variance on the sum of predicted values from a mixed effect model on a timeseries

1 Answers1

Linked