Binomial glmer() singular despite "lots" of data

Question

I have been reading different questions about how easy it is to bump into singularities when fitting mixed effects models with glmer(). In general, the idea is that singularities might arise from very complex random structures. If the random structure is simple, it might also happen when the data is not enough to calculate the variance-covariance matrix... see for example this page by Ben Bolker, Robert Long's answer to this post or the help page of isSingular().

However, the model I'm trying to fit is very simple:

mod.detection_rand <- glmer(reaction ~ Pedra + (1|Channel), family="binomial", data = garotes)
boundary (singular) fit: see ?isSingular

... and apparently I have enough data for the different (fixed and random) predictor variable combinations:

library(tidyverse)
garotes %>% 
  group_by(Channel, Pedra) %>% 
  summarise(n = n())
# A tibble: 16 x 3
# Groups:   Channel [8]
   Channel Pedra     n
     <int> <fct> <int>
 1       1 No       13
 2       1 Yes      13
 3       2 No       14
 4       2 Yes      12
 5       3 No       12
 6       3 Yes      14
 7       4 No       13
 8       4 Yes      13
 9       5 No       13
10       5 Yes      13
11       6 No       14
12       6 Yes      12
13       7 No       13
14       7 Yes      13
15       8 No       14
16       8 Yes      12

What do you think?

EDIT: Here's the summary of the model, summary(mod.detection_rand)

Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: binomial  ( logit )
Formula: reaction ~ Pedra + (1 | Channel)
   Data: garotes

     AIC      BIC   logLik deviance df.resid 
   261.5    271.5   -127.7    255.5      205 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.8533 -0.9449  0.5396  0.5396  1.0583 

Random effects:
 Groups  Name        Variance Std.Dev.
 Channel (Intercept) 0        0       
Number of obs: 208, groups:  Channel, 8

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -0.1133     0.1946  -0.582     0.56    
PedraYes      1.3473     0.3066   4.394 1.11e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
         (Intr)
PedraYes -0.635
convergence code: 0
boundary (singular) fit: see ?isSingular

EDIT2: Following Billy's comment:

bobyqa : boundary (singular) fit: see ?isSingular
[OK]
Nelder_Mead : boundary (singular) fit: see ?isSingular
[OK]
nlminbwrap : boundary (singular) fit: see ?isSingular
[OK]
nmkbw : boundary (singular) fit: see ?isSingular
[OK]
optimx.L-BFGS-B : boundary (singular) fit: see ?isSingular
[OK]
nloptwrap.NLOPT_LN_NELDERMEAD : boundary (singular) fit: see ?isSingular
[OK]
nloptwrap.NLOPT_LN_BOBYQA : boundary (singular) fit: see ?isSingular
[OK]

EDIT3: Following Isabella's answer:

I checked the structure of the outcome variable (reaction). Here's the table of outcomes:

library(tidyverse)
garotes %>% 
  group_by(Channel, Pedra, reaction) %>% 
  summarise(n = n()) %>% 
  print(n = Inf)
# A tibble: 32 x 4
# Groups:   Channel, Pedra [16]
    Channel Pedra   reaction  n
      <int> <fct>    <int>  <int>
 1       1 No           0     6
 2       1 No           1     7
 3       1 Yes          0     3
 4       1 Yes          1    10
 5       2 No           0     7
 6       2 No           1     7
 7       2 Yes          0     2
 8       2 Yes          1    10
 9       3 No           0     8
10       3 No           1     4
11       3 Yes          0     6
12       3 Yes          1     8
13       4 No           0     7
14       4 No           1     6
15       4 Yes          0     3
16       4 Yes          1    10
17       5 No           0     8
18       5 No           1     5
19       5 Yes          0     1
20       5 Yes          1    12
21       6 No           0     6
22       6 No           1     8
23       6 Yes          0     2
24       6 Yes          1    10
25       7 No           0     6
26       7 No           1     7
27       7 Yes          0     2
28       7 Yes          1    11
29       8 No           0     8
30       8 No           1     6
31       8 Yes          0     4
32       8 Yes          1     8

Apparently, there are both types of outcomes for all Channels and all Pedratreatments... so it is not like the example Isabella presented... furthermore, I tried to model this GLMM with the library(GLMMadaptive) and it did not converge either.

EDIT4: The data set I'm using, in case someone's curious.

Channel Pedra   reaction
1   No  1
2   No  0
3   No  0
4   No  0
5   No  0
6   No  1
7   No  0
8   No  0
1   No  1
2   No  1
3   No  1
4   No  1
5   No  0
6   No  0
7   No  0
8   No  0
1   No  0
2   No  1
3   No  0
4   No  0
5   No  0
6   No  0
7   No  0
8   No  1
1   No  0
2   No  1
3   Yes 0
4   Yes 1
5   Yes 1
6   Yes 1
7   Yes 1
8   Yes 0
1   Yes 1
2   Yes 1
3   Yes 0
4   Yes 0
5   No  0
6   No  1
7   Yes 1
8   Yes 1
1   Yes 0
2   Yes 1
3   Yes 1
4   Yes 1
5   Yes 1
6   Yes 0
7   No  1
8   No  1
1   Yes 1
2   Yes 1
3   Yes 1
4   Yes 1
5   Yes 1
6   Yes 1
7   Yes 1
8   Yes 1
1   Yes 1
2   Yes 1
3   Yes 1
4   Yes 1
5   Yes 0
6   Yes 1
7   Yes 1
8   Yes 1
1   Yes 1
2   Yes 1
3   Yes 0
4   Yes 1
5   Yes 1
6   Yes 1
7   Yes 0
8   Yes 0
1   Yes 1
2   Yes 1
3   Yes 0
4   Yes 0
5   Yes 1
6   Yes 1
7   Yes 1
8   Yes 0
1   Yes 1
2   Yes 1
3   Yes 0
4   Yes 1
5   Yes 1
6   Yes 1
7   Yes 0
8   Yes 0
1   Yes 1
2   Yes 0
3   Yes 1
4   Yes 0
5   Yes 1
6   Yes 1
7   Yes 1
8   Yes 1
1   Yes 1
2   Yes 1
3   Yes 0
4   Yes 1
5   Yes 1
6   Yes 0
7   Yes 1
8   Yes 1
1   Yes 1
2   Yes 1
3   Yes 1
4   Yes 1
5   Yes 1
6   Yes 1
7   Yes 1
8   Yes 1
1   Yes 0
2   Yes 0
3   Yes 1
4   Yes 1
5   Yes 1
6   Yes 1
7   Yes 1
8   Yes 1
1   Yes 1
2   No  0
3   Yes 1
4   No  1
5   Yes 1
6   No  1
7   Yes 1
8   No  1
1   No  0
2   Yes 1
3   No  0
4   Yes 1
5   No  1
6   Yes 1
7   No  1
8   Yes 1
1   Yes 0
2   No  1
3   Yes 1
4   No  0
5   Yes 1
6   No  1
7   Yes 1
8   No  0
1   No  0
2   No  1
3   No  1
4   No  0
5   No  1
6   No  0
7   No  0
8   No  0
1   No  1
5   No  0
3   No  1
4   No  1
2   No  1
6   No  0
7   No  1
8   No  0
1   No  0
5   No  0
3   No  0
4   No  0
2   No  1
6   No  0
7   No  0
8   No  0
1   No  1
5   No  1
3   No  1
4   No  0
2   No  0
6   No  1
7   No  1
8   No  0
1   No  1
5   No  0
3   No  0
4   No  1
2   No  0
6   No  1
7   No  1
8   No  1
1   No  1
5   No  1
3   No  0
4   No  1
2   No  0
6   No  1
7   No  1
8   No  1
1   No  1
5   No  1
3   No  0
4   No  0
2   No  0
6   No  1
7   No  0
8   No  0
1   No  0
5   No  0
3   No  0
4   No  1
2   No  0
6   No  0
7   No  1
8   No  1

Thank you very much for all of your responses, in any case! Learning a lot from them!

If you include the output of `summary(mod.detection_rand)`, we can usually see why it singular. — Eoin, Sep 18 '20 at 14:24
Thanks Eoin, just edited the main question to include the `summary(mod.detection_rand)`. — Jordi F. Pagès, Sep 18 '20 at 14:45
The following seems to work: `mixed_model(reaction ~ Pedra , random = ~ 1 | Channel, data = DF, family = binomial(), iter_EM = 200)` — Dimitris Rizopoulos, Sep 21 '20 at 14:11
@DimitrisRizopoulos it works! Ha! But... why was it so difficult to make it converge? Any hypothesis? Thanks a lot! — Jordi F. Pagès, Sep 21 '20 at 15:37

score 8 · Answer 1 · answered Sep 18 '20 at 19:52

8

Isabella made some excellent points. This can also happen when there is very little variation at the channel level. Perhaps channels are just very similar to each other so their variance really is close to zero and therefore not needed in the model. You can assess this by fitting a glm and see if the inferences are similar.

answered Sep 18 '20 at 19:52

Robert Long

53,316
10
84
148

In fact, this is what the output of `summary(mod.detection_rand)` is telling us. The model estimates that the variance in intercepts across the different channels (in the random effects block) is $0$. In other words, there are apparently no systematic differences between channels once the effect of `Pedra` has been taken into account. – Eoin Sep 20 '20 at 17:52
1

@Eoin yes, that's exactly what I meant, but Isabella also pointed out that this can happen due to data sparsity and in my experience it can happen for unknown reasons when there is actual variation. – Robert Long Sep 20 '20 at 18:12
Agreed, but we can see from the fixed effects and the balanced design that sparsity isn't the problem here (unless some very exotic contrasts were used for `Pedra`). – Eoin Sep 21 '20 at 08:47
@Eoin and @Robert, I don't think variance at the `Channel` level is 0. This must be some kind of artifact. In fact, there are big differences across channels, that's why we decided to include them as a random effect... weird... anyway, thanks a lot for your help! – Jordi F. Pagès Sep 21 '20 at 10:29
1

@JordiPagès yes that's what I meant in my last comment. Have you tried increasing nAGQ > 1 ? Also have you tried other packages apart from GLMMAdaptive, or setting different start points in the optimizer ? Can you share the data itself ? – Robert Long Sep 21 '20 at 10:59
@RobertLong yes, thanks. I've tried with `nAGQ` > 1 using `lme4::glmer()` and using `GLMMadaptive::mixed_model()`. It never converges. I haven't tried different start points for the optimizer... I will try that. Of course I can share the data. I will attach it at the end of the original question. Cheers! – Jordi F. Pagès Sep 21 '20 at 13:55
I've looked again at this, including your data, and can confirm that there is no evidence of any variance between the channels. I've included this as a [new answer](https://stats.stackexchange.com/a/488585/42952). – Eoin Sep 22 '20 at 10:04

score 7 · Answer 2 · answered Sep 18 '20 at 15:45

Because this is a mixed effects binary logistic regression model, it assumes that your outcome variable is binary with values coded as either 0 or 1.

What you need to investigate is whether you have enough 1's present in your response variable for a sufficient number of 'subjects'. (In your case, subject stands for channel.)

Here is a made-up example which produces the same warning as wnat you got:

SubjectID <- rep(1:5, each = 3)
SubjectID

Outcome <- rep(0, 15)
Outcome[1] <- 1

Data <- data.frame(Outcome, SubjectID)
str(Data)

Data

library(lme4)

glmer(Outcome ~ 1 + (1|SubjectID), family="binomial", data = Data)

In this example, there are 5 subjects such that 4 of them have only 0 outcome values and one of them has outcome values which include a single value of 1. (Each subject has 3 outcome values in total.)

Even if you give each of the subject in this made-up example a value of 1 for their first outcome value, you will still get the same error message when fitting the model:

Outcome <- rep(0, 15)

Outcome[c(1, 4, 7, 10, 13)] <- 1

However, if all 4 subjects who initially had only 0 values are allowed to keep these values and the first subject receives two values of 1, the error message disappears:

Outcome <- rep(0, 15)

Outcome[c(1,2)] <- 1

Once you understand better the pattern of 0 and 1 values for the outcome variable among your study subjects, the other thing you can try is fitting your model with the mixed_model() function from the GLMMadaptive package in R.

For the small example provided here, this function would be used like this:

library(GLMMadaptive)

m <- mixed_model(fixed = Outcome ~ 1, 
                 random = ~ 1 | SubjectID, 
                 data = Data,
                 family = binomial())
summary(m)

the only big difference between `lme4::glmer` and `GLMMadaptive::mixed_model` is the *default* number of quadrature points (1 for `glmer`, 11 for `mixed_model` in this case); you can adjust this with `nAGQ` (in either package). — Ben Bolker, Sep 18 '20 at 23:33
Thank you very much for this complete response Isabella. I have checked the outcome variable (`reaction` in this case) and the `GLMMadaptive::mixed_model` function. See EDIT3 in my original question. Cheers! — Jordi F. Pagès, Sep 21 '20 at 10:30

score 6 · Accepted Answer · answered Sep 22 '20 at 10:02

A further comment: I took a look at your data, and it's clear, again, that there is no evidence of systematic variance between the different channels. This is why the mixed model estimates the between-channel variance to be $0$, making the model singular.

You can see this in the figure below, where the standard errors for almost every channel overlap...

...and can confirm it by ANOVA decomposition of a fixed-effects GLM, showing that there is no significant main effect of Channel (p = .986).

m_fixed_effects = glm(cbind(n, total) ~ Pedra + factor(Channel), 
                      data=positive, family=binomial)
car::Anova(m_fixed_effects)
# Analysis of Deviance Table (Type II tests)
# 
# Response: cbind(n, total)
#                 LR Chisq Df Pr(>Chisq)  
# Pedra             4.9148  1    0.02663 *
# factor(Channel)   1.3859  7    0.98600  
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Code

library(tidyverse)
df = read.csv('/path/to/reaction.csv')
head(df)
#   Channel Pedra reaction  n
# 1       1    No        0  6
# 2       1    No        1  7
# 3       1   Yes        0  3
# 4       1   Yes        1 10
# 5       2    No        0  7
# 6       2    No        1  7

df = df %>%
  group_by(Channel, Pedra) %>%
  mutate(total = sum(n),
         prop = n / total,
         se = sqrt((prop * (1-prop)) / n)) %>%
  ungroup()
positive = filter(df, reaction==1)

ggplot(positive, aes(Pedra, prop, group=Channel, color=factor(Channel))) +
  geom_path(position = position_dodge(width=.1)) +
  geom_point(position = position_dodge(width=.1)) +
  stat_summary(fun.data=mean_se, group=1, color='black',
               position = position_nudge(x=c(-.2, .2))) +
  geom_linerange(mapping=aes(ymin=prop-se, ymax=prop+se),
                 position = position_dodge(width=.1)) +
  geom_hline(linetype='dashed', yintercept=.5) +
  coord_cartesian(ylim=c(0, 1)) +
  labs(color='Channel',  y='Proportion positive reactions', 
       caption='Error bars show SEM')

m_fixed_effects = glm(cbind(n, total) ~ Pedra + factor(Channel), 
                      data=positive, family=binomial)
car::Anova(m_fixed_effects)
# Analysis of Deviance Table (Type II tests)
# 
# Response: cbind(n, total)
#                 LR Chisq Df Pr(>Chisq)  
# Pedra             4.9148  1    0.02663 *
# factor(Channel)   1.3859  7    0.98600  
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Wow, thanks a lot @Eoin, that's a very clear way to show it. Thanks again for your answer and for your time! — Jordi F. Pagès, Sep 23 '20 at 07:34

score 4 · Answer 4 · answered Sep 18 '20 at 15:25

4

Out of curiosity, does the error arise when you use an alternative estimator? It could be that the estimator is for some reason getting stuck at a singularity. You may just try the following: mod.alt_est <- allFit(mod.detection_rand). Alternatively, you may need a Bayesian solution to help regularize the estimation and force it away from a singularity (try blme package if the allFit function doesn't produce an estimator that works).

answered Sep 18 '20 at 15:25

Billy

596
2
7

Thanks Billy. It does give the same warning for all optimizers. See edit in the main question. – Jordi F. Pagès Sep 18 '20 at 15:43

Binomial glmer() singular despite "lots" of data

4 Answers4

Code