Generalized Linear Mixed Models (GLMMs) have the following general representation:
$$\left\{
\begin{array}{l}
Y_i \mid b_i \sim \mathcal F_\psi,\\\\
b_i \sim \mathcal N(0, D),
\end{array}
\right.$$
where $Y_i$ is the response for the $i$-th sample unit and $b_i$ is the vector of random effects for this unit. The response $Y_i$ conditional on the random effects has a distribution $\mathcal F$ parameterized by the vector $\psi$, and the random effects are typically assumed to follow a multivariate normal distribution with mean 0 and variance-covariance matrix $D$. Some standard GLMMs assume that the distribution $\mathcal F_\psi$ is the binomial, Poisson, negative binomial, Beta or Gamma distribution.
The likelihood function of theses models has the following general form $$L(\theta) = \prod_{i = 1}^n \int p(y_i \mid b_i; \psi) \, p(b_i; D) \, db_i,$$
in which the first term is the probability mass or probability density function of $\mathcal F_\psi$, and the second term is the probability density function of the multivariate normal distribution for the random effects. Also, $\theta = (\psi, \mbox{vech}(D))$.
The problem is that the integral in the definition of this likelihood function does not have a closed-form solution. Hence, to estimate the parameters in these models under maximum likelihood, you need to somehow approximate this integral. In the literature, two main types of approximation have been proposed.
- Approximation of the integrand: These methods entail approximating the product of the two terms $p(y_i \mid b_i; \psi) \times p(b_i; D)$ by a multivariate normal distribution because for this distribution we can solve the integral. The PQL and Laplace approximation methods fall into this category.
- Approximation of the integral: These methods entail approximation of the whole integral by a (weighted) sum, i.e.,
$$\int p(y_i \mid b_i; \psi) \, p(b_i; D) \, db_i \approx \sum_k \varpi_k \, p(y_i \mid b_k; \psi) \, p(b_k; D).$$
Some methods that fall into this category are the Monte Carlo and adaptive Gaussian quarature approximations.
Merits & Flaws
The Approximation of the integrand methods are in general faster than then Approximation of the integral ones. However, they do not provide any control of the approximation error. For this reason, these methods work better when the product of the two terms can be well approximated by a multivariate normal distribution. This is when the data are more continuous. That is, in Binomial data with large number of trials and Poisson data with large expected counts.
The Approximation of the integral methods are slower, but they do provide control of the approximation error by using more terms in the summation. That is, by considering a larger Monte Carlo sample or more quadrature points. Hence, these methods will work better in binary data or Poisson data with low expected counts.
Just to mention that there are some links between the two classes of methods. For example, the Laplace approximation is equivalent to the adaptive Gaussian quadrature rule with one quadrature point.
Finally, the REML method is more relevant in the estimation of linear mixed models for which the integral does have a closed-form solution, but the point is how to estimate the variance components, i.e., the unique elements in the specification of the $D$ covariance matrix. The classic maximum likelihood procedure is known to produce biased results for estimating these parameters, especially in small samples, because it does not account for the fact that to estimate the variance parameters, you first need to estimate the mean parameters. The REML approach does account for that and is a generalization of the idea why in the sample variance we need to divide by $n - 1$ to get an unbiased estimate of the population variance instead of $n$, which is the maximum likelihood estimator, with $n$ being the sample size.
EDIT: PQL in Combination with REML
The approximation performed by the PQL method results in a new response vector $Y_i^*$, which is a transformation of the original data $Y_i$ that attempts to make $Y_i^*$ normally distributed. Hence, fitting a GLMM is equivalent to fitting a linear mixed model for $Y_i^*$, and as mentioned above, in the linear mixed model you may select to estimate the variance components either with maximum likelihood (ML) or restricted maximum likelihood (REML).