7

Consider a log-likelihood function $\ell(\theta,b)$, where $b\sim F$. I want to calculate the marginal log-likelihood

$$\ell(\theta) = \int\exp\left(\ell(\theta,b) \right) dF(b).$$

However, $\exp(\ell(\theta,b)$ is typically large, which causes overflows if I try to integrate it directly. Is there a clever way for integrating this function?

Dimitris Rizopoulos
  • 17,519
  • 2
  • 16
  • 37
Optimus
  • 71
  • 1
  • For more details on computations in log-space using the $\text{logsumexp}$ function, see this related question: [Adding very small probabilities - How to compute?](https://stats.stackexchange.com/questions/379335/adding-very-small-probabilities-how-to-compute) – Ben Nov 02 '19 at 21:15

3 Answers3

4

The marginal log-likelihood in mixed models is typically written as:

$$\ell(\theta) = \sum_{i = 1}^n \log \int p(y_i \mid b_i) \, p(b_i) \, db_i.$$

In specific settings, e.g., in linear mixed model, where both terms in the integrand are normal densities, this integral has a closed-form solution. But in general you need to approximate it using Monte Carlo of Gaussian quadrature or a similar technique. In this case we get the form:

$$\ell(\theta) \approx \sum_{i = 1}^n \log \sum_{q = 1}^Q \varpi_q \{p(y_i \mid b_{iq}) \, p(b_{iq})\},$$

where in the case of Gaussian quadrature $\varpi_q$ are the weights.

Now, going specifically to your question, indeed it is better to work in the log-scale when doing computations, however you still need to calculate the exponent, i.e.,

$$\ell(\theta) \approx \sum_{i = 1}^n \log \sum_{q = 1}^Q \exp \{\log(\varpi_q) + \log p(y_i \mid b_{iq}) + \log p(b_{iq})\}.$$

There are some ways to more accurately calculate the logarithm of sum of exponentials. If you happen to work in R, have a look in the logSumExp() function in the matrixStats package.

Dimitris Rizopoulos
  • 17,519
  • 2
  • 16
  • 37
3

The log-sum-exp trick is a way to calculate sums over finite sets, operating in the log domain to avoid overflow. I'll show how to generalize this trick to integrals, giving a way to rewrite the log of your marginal likelihood.

The log marginal likelihood is:

$$\log \ell_m(\theta) = \log \int \exp \big( \ell(\theta, b) \big) dF(b)$$

Let $\ell^*(\theta)$ be the maximum value the log joint likelihood can take, given $\theta$:

$$\ell^*(\theta) = \max_b \ \ell(\theta, b)$$

Use this to rewrite the log marginal likelihood:

$$\log \ell_m(\theta) = \log \int \exp \Big( \ell(\theta, b) - \ell^*(\theta) + \ell^*(\theta) \Big) dF(b)$$

After a little algebra, we have:

$$\log \ell_m(\theta) = \ell^*(\theta) + \log \int \exp \Big( \ell(\theta, b) - \ell^*(\theta) \Big) dF(b)$$

The above integral can be computed without overflow, since the values we need to exponentiate are smaller. Note that a precise solution for $\ell^*(\theta)$ isn't particularly necessary; all we really need is a value big enough to avoid overflow.

user20160
  • 29,014
  • 3
  • 60
  • 99
2

Accurate integration of this kind of function will require you to work in log-space, which means that you will use the $\text{logsumexp}$ function for sums of terms. You can then use any standard method of numerical integration (e.g., quadrature, importance sampling, etc.)

To give you a more specific idea of how to implement this technique, I will construct a simple method using an arbitrary weighting on the terms for the numerical integration, and working in log-space. For simplicity, I will assume you have a continuous distribution where you have a known quantile function $Q$ that you can compute. On this basis, we choose some odd value $n$ and split the unit interval into $n$ equal sub-intervals, and look at the midpoints of these sub-intervals. We consider the points $b_1,...,b_n$ (which are the values at the quantiles equal to those sub-interval midpoints) given by:

$$b_i \equiv Q \bigg( \frac{2i-1}{2n} \bigg).$$

Suppose we take some corresponding positive weights $w_1,...,w_n$ with mean $\bar{w} \equiv \sum w_i / n$, and we define the coefficients $\ell_{i*} \equiv \ell(\theta,b_i) + \log w_i$. The integral can be approximated (using these weights) as:

$$\begin{equation} \begin{aligned} \log \ell_m(\theta) &= \log \int \exp \left( \ell(\theta,b) \right) \ dF(b) \\[6pt] &= \log \int \limits_0^1 \exp \left( \ell(\theta,Q(p)) \right) \ dp \\[6pt] &\approx \log \Bigg( \sum_{i=1}^n \frac{w_i}{n \bar{w}} \cdot \exp \left( \ell(\theta,b_i) \right) \Bigg) \\[6pt] &= -\log(n \bar{w}) + \log \Bigg( \sum_{i=1}^n \exp \left( \ell(\theta,b_i) + \log w_i \right) \Bigg) \\[6pt] &= -\log(n \bar{w}) + \log \Bigg( \sum_{i=1}^n \exp \left( \ell_{i*} \right) \Bigg) \\[6pt] &= -\log(n \bar{w}) + \text{logsumexp} \left( \ell_{1*}, ..., \ell_{n*} \right). \\[6pt] \end{aligned} \end{equation}$$

Now, we will generally choose the weights $w_1,...,w_n$ to correspond to some appropriate quadrature rules. For example, we could implement the numerical integration using Simpson's rule by taking the weights to be $w_1,...,w_n = 1, 4, 2, 4, ..., 4, 2, 4, 1$.


We can program this method in R as follows. (Note that this is not a particularly efficient method. I am including it for purposes of exposition rather than recommending it as a method. There are much better numerical integration algorithms already programmed into R.) Here we program a function INTEGRATE that takes your function l and your quantile function Q and computes the integral using Simpson's rule with n terms. The option log specifies whether you want the logarithm of the integral as the output (or the actual integral).

INTEGRATE <- function(l, Q, n = 100001, log.out = TRUE) {

#Check the input n
if (length(n) != 1)       { stop('Error: n should be a single value') }
if (n < 1)                { stop('Error: n should be a positive integer') }
n <- as.integer(n);
if (n%%2 == 0)            { n <- n+1; }

#Set Simpson weights
WW <- rep(0, n);
for (i in 1:n) { WW[i] <- ifelse(i%%2 == 0, 4, 2); }
WW[1] <- 1;
WW[n] <- 1;

#Set numerical terms
BB <- rep(0, n);
TT <- rep(-Inf, n);
for (i in 1:n) { 
  PROB  <- (2*i-1)/(2*n);
  BB[i] <- Q(PROB);
  TT[i] <- l(BB[i]) + log(WW[i]); }

#Perform numerical integration
LOG_INT <- - log(sum(WW)) + matrixStats::logSumExp(TT);

#Give output
if (log.out) { LOG_INT } else { exp(LOG_INT) } }

We will implement this with an example where the true value of the integral is known. Taking $\ell(\theta, b) = \theta \cdot b$ means that the integral is just the moment generating function of the random variable $B$, evaluated at $\theta$. We can compute this integral numerically for the standard normal distribution and compare the result to the known value of the MGF. Taking $\theta = 2$ and using the standard normal distribution, the log-MGF value should be $\theta^2/2 = 2$. Our computation below shows that we get a numerical integral that is somewhat near this value by taking a sufficiently large value for $n$.

#Set an example function
theta <- 2;
l <- function(z) { theta*z; }

#Set the quantile function (standard normal)
Q <- function(p) { qnorm(p, 0, 1); }

#Compute the integral numerically
LOG_INT <- INTEGRATE(l, Q, n = 10^7, log.out = TRUE);

LOG_INT;
[1] 1.999567

We can see that our numerical integration in this case comes close to the true value of the integral. Note that this is not a particularly efficient numerical integration algorithm, and there are better numerical integration algorithms in R. Nevertheless, hopefully this illustrates a broad class of methods for integrating this kind of function in log-space.

Ben
  • 91,027
  • 3
  • 150
  • 376