How are PQL, REML, ML, Laplace, Gauss-Hermite related to each other?

Question

While learning about the Generalized Linear Mixed Models, I often see the above terms. Sometimes it seems to me these are separate methods of estimation of (fixed? random? both?) effects, but when I read the literature, I see the terms mixed. For example PQL under REML. Some write, that the Penalized Qusi Likelihood works well for non-normal conditional response, like log-normal, but is biased in classic binomial or Poisson case, so I should use REML or ML for that. In other articles I can see that REML or ML is used only in linear models, while another articles say that REML is now available also for GLMM (glmmTMB in R, for example). So I understand the PQL is a separate method from REML. But then I see a book, where they compare various estimation methods, including PQL via REML. So, is REML a special case of the PQL? And then, where the Laplace or Gauss-Hermitte comes into play? I am totally lost.

Let's consider it using the examples from R: glmmPQL, nlme, lme4, glmmTMB. I know, that glmmPQL uses PQL, nlme uses Laplace, lme4 uses LMER or ML, glmmTMB uses LMER for GLM. I also saw both Laplace and Gauss-Hermitte Quadrature term related to all of them except PQL.

Is there a way to organize these methods?

Dimitris Rizopoulos · Accepted Answer · 2020-02-09T07:37:03.667

Generalized Linear Mixed Models (GLMMs) have the following general representation: $$\left\{ \begin{array}{l} Y_i \mid b_i \sim \mathcal F_\psi,\\\\ b_i \sim \mathcal N(0, D), \end{array} \right.$$ where $Y_i$ is the response for the $i$-th sample unit and $b_i$ is the vector of random effects for this unit. The response $Y_i$ conditional on the random effects has a distribution $\mathcal F$ parameterized by the vector $\psi$, and the random effects are typically assumed to follow a multivariate normal distribution with mean 0 and variance-covariance matrix $D$. Some standard GLMMs assume that the distribution $\mathcal F_\psi$ is the binomial, Poisson, negative binomial, Beta or Gamma distribution.

The likelihood function of theses models has the following general form $$L(\theta) = \prod_{i = 1}^n \int p(y_i \mid b_i; \psi) \, p(b_i; D) \, db_i,$$ in which the first term is the probability mass or probability density function of $\mathcal F_\psi$, and the second term is the probability density function of the multivariate normal distribution for the random effects. Also, $\theta = (\psi, \mbox{vech}(D))$.

The problem is that the integral in the definition of this likelihood function does not have a closed-form solution. Hence, to estimate the parameters in these models under maximum likelihood, you need to somehow approximate this integral. In the literature, two main types of approximation have been proposed.

Approximation of the integrand: These methods entail approximating the product of the two terms $p(y_i \mid b_i; \psi) \times p(b_i; D)$ by a multivariate normal distribution because for this distribution we can solve the integral. The PQL and Laplace approximation methods fall into this category.
Approximation of the integral: These methods entail approximation of the whole integral by a (weighted) sum, i.e., $$\int p(y_i \mid b_i; \psi) \, p(b_i; D) \, db_i \approx \sum_k \varpi_k \, p(y_i \mid b_k; \psi) \, p(b_k; D).$$ Some methods that fall into this category are the Monte Carlo and adaptive Gaussian quarature approximations.

Merits & Flaws

The Approximation of the integrand methods are in general faster than then Approximation of the integral ones. However, they do not provide any control of the approximation error. For this reason, these methods work better when the product of the two terms can be well approximated by a multivariate normal distribution. This is when the data are more continuous. That is, in Binomial data with large number of trials and Poisson data with large expected counts.
The Approximation of the integral methods are slower, but they do provide control of the approximation error by using more terms in the summation. That is, by considering a larger Monte Carlo sample or more quadrature points. Hence, these methods will work better in binary data or Poisson data with low expected counts.
Just to mention that there are some links between the two classes of methods. For example, the Laplace approximation is equivalent to the adaptive Gaussian quadrature rule with one quadrature point.
Finally, the REML method is more relevant in the estimation of linear mixed models for which the integral does have a closed-form solution, but the point is how to estimate the variance components, i.e., the unique elements in the specification of the $D$ covariance matrix. The classic maximum likelihood procedure is known to produce biased results for estimating these parameters, especially in small samples, because it does not account for the fact that to estimate the variance parameters, you first need to estimate the mean parameters. The REML approach does account for that and is a generalization of the idea why in the sample variance we need to divide by $n - 1$ to get an unbiased estimate of the population variance instead of $n$, which is the maximum likelihood estimator, with $n$ being the sample size.

EDIT: PQL in Combination with REML

The approximation performed by the PQL method results in a new response vector $Y_i^*$, which is a transformation of the original data $Y_i$ that attempts to make $Y_i^*$ normally distributed. Hence, fitting a GLMM is equivalent to fitting a linear mixed model for $Y_i^*$, and as mentioned above, in the linear mixed model you may select to estimate the variance components either with maximum likelihood (ML) or restricted maximum likelihood (REML).

Thank you very much, Sir. This is beautifully organized answer. Please, tell me, is PQL related to REML? I sometimes see "REML under PQL" in publications. I thought these two are separate methods. Yet I also saw, that PQL calls ML in the algorithm. Now I'm lost in this chain: REML "under PQL", PQL "uses" ML, so REML -> PQL -> ML. How to distinct the REML from PQL and ML (which of course differs from REML). I'm sorry for a probably naive or stupid question, but as a beginner, I'm totally lost. — humbleasker, Nov 18 '19 at 20:02
Ah! I got it! So, the glmmPQL routine in R uses PQL to approximate of the integrands and then uses ML to estimate the variance components. lme4, which uses the Gauss-Hermite quadrature (as far as I recall), uses then REML (or ML, depending on switch) to calculate the variance components. I have to check, if lmer set to Laplace and REML=FALSE will give the same results as PQL. Thank you a lot! So much reading of complicated articles, to complex to me, explained by you so simply. I'm sorry, I cannot upvote your beautiful answer yet, having only 12 points. — humbleasker, Nov 18 '19 at 20:24
@humbleasker Now you have more points and should be able to upvote :) (+1 from me too) — amoeba, Nov 18 '19 at 20:42

How are PQL, REML, ML, Laplace, Gauss-Hermite related to each other?

1 Answers1

Linked