2

I'm trying to fit logistic regression with formula like this:

mod <- glmer(response ~  factor1+factor2+numeric1+numeric2+numeric3+numeric4 +(1|factor3),
            data=myDataset,family = binomial,
            control=glmerControl(optimizer="bobyqa"))

Factor1 and factor2 are categorical variables with (5 and 2 categories). Factor3 is id of subject. The rest of predictors are numeric variables - all of them scaled with scale().

I am getting following error/warning:

Warning messages:
1: Some predictor variables are on very different scales: consider rescaling 
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.406353 (tol = 0.001, component 1)
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?;Model is nearly unidentifiable: large eigenvalue ratio
 - Rescale variables?

When I am trying to fit the model without factor1 and factor2 variables, model fits without complain, so I am assuming that problem is in my factor predictors. But I have no idea how to fix it. Should I recode my factor variables to dummy variables myself and then scale them? Will it help? Any idea will be very appreciated.

makak
  • 115
  • 1
  • 13
  • Can you confirm that R is storing your factor variables as "factors" in the dataset? e.g.(`str(myDataset)`). How many observation do you have nested per person? How many individuals overall? And how much variability is there in your DV? Just curious in terms of trying to reproduce all of the conditions for this example. – Matt Barstead Jul 05 '17 at 04:35
  • You may also find this discussion helpful [click here](https://stats.stackexchange.com/questions/164457/r-glmer-warnings-model-fails-to-converge-model-is-nearly-unidentifiable?rq=1) – Matt Barstead Jul 05 '17 at 04:39
  • Matt, thank you for comments. I can confirm that R is storing factors I mentioned as factors. I have 21 individuals overall with min 10, max 103 observations, average 24 observations and standard deviation 19.95 . – makak Jul 06 '17 at 04:15
  • Is there any chance you can provide any additional summary info for all variables in the model and/or something like the first 25 rows of data? – Matt Barstead Jul 07 '17 at 15:00
  • 1
    At first instance try using `scale(numeric1)+scale(numeric2)+scale(numeric3)+scale(numeric4)` instead of `numeric1+numeric2+numeric3+numeric4` and see if rescale your numerical variables help. – usεr11852 Jul 08 '17 at 10:09
  • My guess is that the warning is correct at face value. The model with factor1 and factor2 is nearly unidentifiable, presumably because at least one of factor1 and factor2 is tightly correlated with another variable in the model. I would try dropping these factors one at a time just to see if the model can be fit properly, and then digging a bit further to figure out what about this model is nearly unidentifiable. – Jacob Socolar Jul 09 '17 at 17:11
  • Can you post the data somewhere or use `dput()`? – Mark White Jul 12 '17 at 03:27
  • Thank you all for your comments. I have already solve that two days ago I was just on holiday and offline so I couldn't share it earlier. The problem was as everyone suspected in data. My factor2 has 5 classes and one class had been associated only with one response variable - 0. I figured it out by staring at correlation matrix and seeing that that class has correlation exactly 0.0 with all other variables. I somehow didn't notice it before. So I have merged that class with other class resulting in 4 classes for factor2 and model fits without complain. Than you all again for your effort. – makak Jul 12 '17 at 04:38

1 Answers1

4

Not enough reputation to comment so I'm replying as an answer.

You mention that the model fits fine when excluding your factor variables but complains when you include them. Is there any possibility of a structure/relationship between your factor<1/2> and your id variable? I have encountered this problem before with a client who was interested in exploring the effect of two categorical variables and did multiple replicates for the experimental levels of those variables, but only one control. One control that was at the zero level of both the variables. So we could estimate the effect of each factor individually but not jointly because the effect of the control level of one factor was inseparable from the control level of the other factor. I didn't realize this until I got convergence warnings trying to fit the model.

Without actually seeing your data we cannot be sure, but I would suggest going back and double check to see if there is structure that is preventing the model from being able to estimate certain levels of your factors.

user1993951
  • 471
  • 3
  • 5
  • 1
    Not being able to comment is not a good reason for using an answer to comment. – Michael R. Chernick Jul 07 '17 at 18:44
  • But they were also correct... See @makak's latest comment on the question. – mikeck Jul 12 '17 at 20:25
  • Well, time to pay my debts. I think that your answer/comment was pretty close, so bounty is yours. And I think, that intention to help is sometimes a good reason to break the rules, so I would not be very concerned about some comments with different opinion. And if I remember it correctly, I did the same thing, somewhere on stack overflow. – makak Jul 14 '17 at 14:21