Laplace approximation in high-dimensions

Question

Obviously computing the inverse Hessian is hard when a probability distribution is fitted on high-dimensional datapoints. One idea to reduce computational cost would be to approximate the distribution with a diagonal Gaussian. Are there any standard efficient techniques allowing to make efficient use of the Laplace approximation in high-dimensions?

Benjamin Christoffersen · Answer 1 · 2020-12-21T13:08:55.030

Are there any standard efficient techniques allowing to make efficient use of the Laplace approximation in high-dimensions?

Assuming that you are using a Laplace approximation for a random effect model, often groups of outcomes are conditional independent given the random effects. These model have a Hessian which sparse making it easy to work with high-dimensional problems. As an example, consider the following mixed logistic regression

$$ \begin{align} y_{ij} &\sim \text{Bin}(\text{logit}(b_i + u_i), 1) & j &= 1,\dots 5 \\ u_i &\sim N(0, \sigma^2_u) & i &= 1, \dots, k \\ b_i &\sim N(0, \sigma^2_b) & i &= 1, \dots, k \end{align} $$

We can simulate from this model with $k = 3$ and plot the Hessian as follow

# simulate data
n <- 5
k <- 3
sig <- 1
set.seed(1)
dat <- do.call(rbind, lapply(1:k, function(i){
  u <- rnorm(2, sd = sig)
  x <- runif(n, -1, 1)
  y <- 1/(1 + exp(-u[1] - u[2] * x)) < runif(n)
  data.frame(y = y, x = x, grp = i)
}))

# approximate the hessian 
library(numDeriv)
ll <- function(u){
  i_start <- (dat$grp - 1) * 2
  p_hat <- 1/(1 + exp(-u[i_start + 1] - dat$x * u[i_start + 2]))
  sum(log(ifelse(dat$y > 0, p_hat, 1 - p_hat))) +
    sum(dnorm(u, sd = sig, log = TRUE))
}
he <- hessian(ll, numeric(2 * k))
he[abs(he) < 1e-3] <- 0 # threshold just as an example here
print(he, digits = 2)
#R>       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]
#R> [1,] -2.25 -0.42  0.00  0.00  0.00  0.00
#R> [2,] -0.42 -1.49  0.00  0.00  0.00  0.00
#R> [3,]  0.00  0.00 -2.25 -0.23  0.00  0.00
#R> [4,]  0.00  0.00 -0.23 -1.39  0.00  0.00
#R> [5,]  0.00  0.00  0.00  0.00 -2.25 -0.23
#R> [6,]  0.00  0.00  0.00  0.00 -0.23 -1.32

For this model, the Hessian is just a diagonal block matrix with 2x2 block matrices. Thus, it is easy to invert and to compute the determinant of. The number of non-zero entries of the Hessian is only $2^2k$ rather than $k^2$. This is the basis of much software like the lme4 package and the TMB package.

One idea to reduce computational cost would be to approximate the distribution with a diagonal Gaussian.

There are of course settings where the Hessian is not sparse. These include some Gaussian processes and autoencoders. Assuming a diagonal covariance matrix is a kind to the gaussian variational approximations used in many machine learning applications. However, I suspect that one might opt for some low rank approximation of the determinant instead.

score 0 · Answer 2 · answered Jul 23 '20 at 23:51

0

You may want to look into the SR1 and BFGS (and its dual, DFP) optimization methods. Those iteratively update the approximated inverse Hessian using rank 1 (SR1) and rank 2 (BFGS) updates. They are also fairly efficient and can likely handle your problem.

answered Jul 23 '20 at 23:51

kurtosis

1,560
2
14

1

The OP wants to apply a Laplace approximation so he needs to compute the determinant of the Hessian regardless of the optimizer he is using. – Benjamin Christoffersen Jul 24 '20 at 07:32
Yes, but if computing the inverse Hessian is hard, I've given OP a few methods to make that easier -- which is OP's ultimate goal ("idea to reduce computational cost"). – kurtosis Jul 24 '20 at 07:35
1

But that will not help the OP as the OP needs to evaluate the determinant? Using SR1 and BFGS approximations would still yield a dense Hessian approximation which the OP needs compute a determinant of in order to apply a Taylor approximation. This is computationally hard for larger problems which I guess is the reason that the OP asks for a diagonal approximation instead. – Benjamin Christoffersen Sep 03 '20 at 14:20

Laplace approximation in high-dimensions

2 Answers2