I'm running across an interesting case where the estimated uncertainty based on the Bayesian simulation approach outlined in Simon Wood's book (Generalized Additive Models (2004), section 4.8; as coded in the examples of help(predict.gam)) is giving surprisingly large uncertainty for logistic regression predictions where the point prediction is very near 0.
Here's example R code for simplified problem. Note that the true probability of success is low for a large part of the domain and this is where the uncertainty blows up for the Bayesian approach for the response but not the linear predictor (and not for the se.fit values).
library(mgcv)
x = (1:200)/200
# note that the empirical probabilities are 0 except
# in the middle of the domain
y = c(rep(0, 100), 3, 7, 12, 15, 12, 20, 15, 9, 5, 8, 3, rep(0, 89))
z = cbind(y, 100 - y)
k <- 150
mod = gam(z ~ s(x, k = k), family = 'binomial')
rmvn <- function(n, mu, sig) { ## MVN random deviates
L <- mroot(sig); m <- ncol(L);
t(mu + L %*% matrix(rnorm(m*n), m, n))
}
n <- 500
# Bayesian uncertainty calculation, as per help(predict.gam)
Xp <- predict(mod, type="lpmatrix")
draws_coef <- rmvn(n , coef(mod), mod$Vp)
draws_lp <- Xp %*% t(draws_coef) # type = "link"
draws <- exp(draws_lp) / ( 1+exp(draws_lp)) # type = "response"
# Bayesian standard deviation
sds = apply(draws,1,sd)
# plot to show Bayesian standard deviation is very large where
# predictions are near zero, unlike standard errors
par(mfrow = c(1,3))
plot(x, predict(mod, type = 'response'))
plot(x, sds)
# mgcv's estimated prediction standard error
sef = predict(mod, type = 'response', se.fit=TRUE)$se.fit
plot(x, sef)
# Bayesian standard deviation and standard error very similar
# for linear predictor
par(mfrow = c(1,2))
sds_lp <- apply(draws_lp, 1, sd)
plot(x, sds_lp)
plot(x, predict(mod, se.fit=TRUE)$se.fit)
Comparing to the output of predict.gam with se.fit=TRUE, the Bayesian standard deviation and the predict.gam standard error are very similar for for the linear predictor (and those uncertainties are very large where the success probability is very near zero), but for the reponse scale the standard error is small where the Bayesian uncertainty is large. Mathematically this seems to just be a consequence of the fact that the se.fit calculation in predict.gam() seems to involve applying the delta method to the type="link" se.fit values, and this scales the type="response" values down a lot where the fitted probabilities are near zero.
Does anyone have any insights to help explain the large uncertainty? Presumably some sort of instability in parts of the domain where the linear predictor is very negative, but I'd like to understand it better and figure out if there is a workaround, as I need posterior draws for downstream processing and not just pointwise standard errors.