10

First let's simulate some data for a logistic regression with fixed and random parts:

set.seed(1)
n    <- 100
x    <- runif(n)
z    <- sample(c(0,1), n, replace=TRUE)
b    <- rnorm(2)
beta <- c(0.4, 0.8)
X    <- model.matrix(~x)
Z    <- cbind(z, 1-z)
eta  <- X%*%beta + Z%*%b
pr   <- 1/(1+exp(-eta))
y    <- rbinom(n, 1, pr)

If we just wanted to fit a Logistic regression with no random parts, we could use the glm function:

glm(y~x, family="binomial")

glm(y~x, family="binomial")$coefficients
# (Intercept)           x 
#  -0.2992785   2.1429825 

Or constructing our own function of the log-likelihood

$$ \log\text{L}(\boldsymbol{\beta}) = \sum_{i=1}^n y_i \log \Lambda(\eta_i) + (1-y_i)\log(1-\Lambda(\eta_i)) $$

where $\Lambda(\eta_i)=\frac{1}{1+\exp(-\eta_i)}$ and $\eta_i=\boldsymbol{X_i' \beta}$ and use optim() to estimate the parameters that maximize it, as in the following example code:

ll.no.random <- function(theta,X,y){
  beta <- theta[1:ncol(X)]
  eta  <- X%*%beta
  p    <- 1/(1+exp(-eta))
  ll   <- sum( y*log(p) + (1-y)*log(1-p) )
  -ll
}

optim(c(0,1), ll.no.random, X=X, y=y)

optim(c(0,1), ll.no.random, X=X, y=y)$par
# -0.2992456  2.1427484

which of course gives the same estimates and maximizes the log-likelihood for the same value. For mixed effects, we would want something like

library(lme4)
glmer(y~x + (1|z), family="binomial")

But how can we do the same with our own function? Since the likelihood is

$$ L = \prod_{j=1}^J \int \text{Pr}(y_{1j},...,y_{n_jj}|\boldsymbol{x}, b_j) f(b_j)db_j $$

and the integral has no closed-form expression, we need to use numerical integration like Gaussian Quadrature. We can use the package statmod to get some quadratures, say 10

library(statmod)    
gq <- gauss.quad(10)
w  <- gq$weights
g  <- gq$nodes

UPDATE: Using these quadrature locations $g_r$ and weights $w_r$ for the $r=1,...,R$ ($R=10$ here), we can approximate the integral over $b_j$ by a sum of the $R$ terms with $g_r$ substituted for $b_j$ and the whole term multiplied by the respective weights $w_r$. Thus, our likelihood function should be now

$$ L = \prod_{j=1}^J \sum_{r=1}^{R} Pr (y_{1j},...,y_{n_jj} | \boldsymbol{x}, b_j=g_r ) w_r $$

Also, we need to account for the variance of the random part, I read that this can be achieved by replacing the $b_j \sim N(0,\sigma_b^2)$ in our $\eta$ function with $\sigma_j \theta_j$ where $\theta_j \sim N(0,1)$, so in the likelihood function above we actually replace $\theta$'s with $g$'s and not $\beta$'s.

One computational issue I don't get is how to substitute the terms since the vectors won't be of the same length. But probably I don't understand that, because I'm missing something crucial here, or misunderstood how this method works.

Steve
  • 611
  • 6
  • 26
  • I don't have time to take a good look now but this looks like a good use for MCMC. – shadowtalker Aug 22 '14 at 15:41
  • @ssdecontrol thanks, MCMC is good alternative. But I would like to apply the classical approach. – Steve Aug 22 '14 at 15:57
  • What's non-classical about MCMC to evaluate an integral? – shadowtalker Aug 22 '14 at 16:00
  • @ssdecontrol well, I'm not sure... But all the books I checked and the lme4,ordinal R packages, do not use MCMC. So, I would like to stick with that. At least at the beginning. – Steve Aug 22 '14 at 16:14
  • You're right that it's not in the "classical statistics" toolkit, but that's for outdated computational reasons rather than conceptual, philosophical, or statistical reasons. In any case if you want to take a further look at it there are decent slides here: http://humis.bmi.utah.edu/humis/docs/organization_1863_1330728239.pdf and a mathoverflow question here: http://mathoverflow.net/q/41855 – shadowtalker Aug 22 '14 at 16:51
  • Steve, can you make sure that you agree with Randel's edit? You can re-edit or roll-back. – Glen_b Aug 24 '14 at 22:52
  • 1
    Have you asked this in the R-sig-ME list (r-sig-mixed-models@r-project.org)? Some people might be able to help you there further. In addition: I would strongly urge you to study the paper [Efficient Laplacian and Adaptive Gaussian Quadrature Algorithms for Multilevel Generalized Linear Mixed Models](http://www.jstor.org/stable/27594165) by Pinheiro and Chao. It contains results regarding AGQ performance as well as the linear algebra behind them. If you want to code them.... well, get ready for some serious sparse matrix decompositions. :D – usεr11852 Aug 25 '14 at 00:48

1 Answers1

2

I did not see how "the vectors won't be of the same length", please clarify your question.

First of all, for the integral with dimension less than 4, the direct numerical methods like quadrature are more efficient than MCMC. I studied these questions for a while, and I would be happy to discuss this problem with you.

For mixed-effects logistic regression, the only explicit R code I have found is from Prof. Demidenko's book, Mixed Models: Theory and Applications, you can download the code via the column of "SOFTWARE AND DATA" on the webpage. The logMLEgh() can be found in \mixed_models_data.zip\MixedModels\Chapter07. He did not use the statmod package to obtain the quadratures, but wrote his own function gauher(). There are some minor errors in the code and I have discussed them with the author, but it is still very helpful to start from his code and book. I can provide the corrected version if needed.

Another issue is that, if you want to get accurate estimates, optim() is not enough, you may need to use methods like Fisher scoring, as in glm().

Randel
  • 6,199
  • 4
  • 39
  • 65
  • the book seems to have rich info on what I'm working on. The code itself doesn't say much - just ordered the book so I have to wait for that. The thing about the vectors is that if we substitute the terms, the 2 `b`'s with the 10 nodes, how we'll be able to multiply the matrices `Z` and `g`? Or I have it completely wrong? – Steve Aug 25 '14 at 13:18
  • I know that I should go further to get accurate estimates, but I was hoping to understand first the GQ as a first step. – Steve Aug 25 '14 at 13:30
  • You can preview the two editions on [Google Books](https://www.google.com/search?q=Mixed+Models%3A+Theory+and+Applications&oq=Mixed+Models%3A+Theory+and+Applications&aqs=chrome..69i57.461j0j7&sourceid=chrome&es_sm=122&ie=UTF-8#newwindow=1&q=Mixed+Models:+Theory+and+Applications&safe=off&tbm=bks) first. In your code, $Z$ is a $n \times 2$ matrix? But $b$ just a scalar? You can start from the random intercept model `Z = rep(1,n)` first. If the dimension of the random effects is two, you need totally 100 two-dimensional nodes then, 10 nodes on each dimension. – Randel Aug 25 '14 at 14:22
  • sorry but the more I'm thinking it the more it confuses me. If I do the `Z=rep(1,n)` then I would get one random intercept for each row, meaning that each individual is a group? In my example I have two groups, thus we have `Z%*%b` $n \times 2$ and $2 \times 1$ to give the $n \times 1$ we need. No? – Steve Aug 25 '14 at 14:41
  • Oh, I just noticed that `Z` is used to separate the random intercept for each cluster, not the design matrix for the random effect. Then you are right, but you should evaluate the integral and utilize the quadrature separately for each cluster. You do not need `Z` anymore, just evaluate the integral for each cluster, and then sum them together. The design matrix for the random intercept is just the vector of 1s. – Randel Aug 25 '14 at 17:17
  • I think did it (hopefully), as you said. Is this what you had in mind? – Steve Aug 25 '14 at 20:47
  • Probably. Each $b_j$ is a random variable, and they are independent with each other, so you can evaluate the integral separately for each cluster. After you read the book, you will find `logMLEgh()` does in this way, i.e., evaluate the integral for each unique id. The number of quadrature points should be at least 5 based on my experience. And the number of cluster in your example seems too small. – Randel Aug 25 '14 at 21:01
  • Yes I know, I gave this small example with 3 points and 2 clustests, just to see if I'm on the right track with the way I'm thinking it. I can understand that for tenths of clusters would be much more complex, hence the matrix decompositions? – Steve Aug 25 '14 at 21:07
  • Not really, you just time (or sum) the (log) likelihood for each cluster together, like you did for 2 clusters. – Randel Aug 25 '14 at 21:12