0

I have data containing 40,000 observations, coming from around 30,000 subjects. Most subjects contribute one observation, some contribute two.

The dependent variable is binary (1/0), and let's say, for the sake of the discussion, that I only have one independent variable, also binary. I want to model the effect of X on Y, and to get an odds ratio.

I started by calculating the odds ratio by a simple 2X2 table, and got a crude OR of 3.18. Since there is dependency between some observations, I ran the 2X2 analysis for the first observation of the subjects and then for the second, separately, and got OR1 = 3.09 and OR2 = 4.36. The Mantel Haenszel is OR_m-h = 3.11. However, when I run a mixed effects logistic regression, I get higher OR's. Using the Laplace approximation, I get an OR of ~4.5. If I use the quadrature method of integration, I get an OR of over 6.

This makes no sense to me. Can someone please explain to me how can this be possible? My random effect (subjectID) has thousands of levels...

My R code of the model is:

glmer(Y ~ X + (1 | SubjectID), data = dataset, family = binomial, control =  glmerControl(optimizer = "bobyqa"), nAGQ = 1)
kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
user3275222
  • 675
  • 8
  • 18
  • 1
    Don't know exactly what is happening, but lme4 has some niggles in there. I would try R package [glmm](https://cran.r-project.org/web/packages/glmm/index.html). – Greenparker Dec 11 '16 at 21:08
  • 1
    Check out for example this thread http://stats.stackexchange.com/questions/86309/marginal-model-versus-random-effects-model-how-to-choose-between-them-an-advi. The odds ratio in your glmm applies conditional on a particular value of the random effect. If you look at the marginal model (integrating out the random effect) the odds ratio will match that computed from your 2x2 table. – Jarle Tufto Dec 11 '16 at 21:41
  • I tried it with SAS (Glimmix) and Stata too. Same results. – user3275222 Dec 11 '16 at 21:47

0 Answers0