Likelihood ratio not significant but bootsrapping is (and is not)

Question

I modelled eight lmer models via the lme4 package.

First, I compared the models via a likelihood ratio test yielding this:

npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) baselinemodel 4 63313 63338 -31652 63305 mod1 5 63313 63344 -31651 63303 2.0226 1 0.15497 mod2 6 63232 63270 -31610 63220 82.5963 1 < 2.2e-16 *** mod3 7 63185 63229 -31586 63171 49.0685 1 2.472e-12 *** mod4 8 63186 63237 -31585 63170 0.7713 1 0.37981 mod5 9 63188 63244 -31585 63170 0.3639 1 0.54632 mod6 10 63186 63249 -31583 63166 3.4708 1 0.06246 . mod7 11 63188 63257 -31583 63166 0.3997 1 0.52722

Then, I bootstrapped the lrt values using this kind of code:

set.seed(123)
b1 <- bootMer(baselinemodel,
         FUN = function(x) as.numeric(logLik(x)), 
         nsim = 1000)

set.seed(123)
b2 <- bootMer(mod1, 
          FUN = function(x) as.numeric(logLik(x)), 
          nsim = 1000)


# Observed Likelihood-Ratio
lrt <- as.numeric(-2 * logLik(baselinemodel3) + 2 * logLik(stimM4))

# 1000 bootstrap Likelihood-Ratios
lrt.b <- -2 * b1$t + 2 * b2$t

# 95% CI
qu <- quantile(lrt.b, probs = c(0.025, 0.975))

so far so good. This is what the lrt test gives me, too.

But comparing the other models I get these CIs:

# mod1 vs mod2
#  2.5%     97.5% 
# -169.3204  323.8374 
# -> mod 2 not better than mod1


# mod 2 vs. mod3
#  2.5%    97.5% 
# 46.37453 54.53090 
# -> mod3. better than mod 2


# mod3 vs. mod4
#   2.5%     97.5% 
# 0.7715048 5.9623860 
# -> mod4 better than mod 3

# mod4 vs mod5
#    2.5%     97.5% 
# 0.3611693 5.7582975 
# -> mod5 better than mod4

# mod 5 vs. mod6
# 2.5%    97.5% 
# 3.471502 8.316651 
# -> mod6 better than mod5

# mod 5 vs mod6
#   2.5%     97.5% 
#0.4005908 5.2964914 
# -> mod6 better than mod5

# mod 6 vs mod7
#  2.5%     97.5% 
# -238.5510  255.8117 
# -> mod7 better than mod6

So, according to the lrt test: mod2 > mod1 mod3 > mod2

But, according to the bootstrapped lrt values: mod3> mod2 mod4> mod3 mod5> mod4 mod6> mod5 mod7> mod6

How is this possible? 1. Is it because I got this warning with every bootstrap: "2 warning(s): Model failed to converge with max|grad| = 0.00229957 (tol = 0.002, component 1) (and others)"? 2. Or did I do something wrong?

btw. descriptive data and theory says that mod2> mod1 (which I did not find w/ bootstrapping)

Thanks in advance!!

Edit:

I adapted ping`s code

boot01 <- numeric(100)

for(i in 1:100){
  rmath <- unlist(simulate(baselinemodel))
  bmod <- refit(mod1, rmath)
  smod <- refit(baelinemodel, rmath)
  boot01[i] <- 2*(logLik(bmod)-logLik(smod))
}

pvalue01 <- mean(boot01 > lr01)

# 0.011

boot12 <- numeric(1000)
for(i in 1:1000){
  rmath <- unlist(simulate(mod1))
  bmod <- refit(mod2, rmath)
  smod <- refit(mod1, rmath)
  boot12[i] <- 2*(logLik(bmod)-logLik(smod))
}

pvalue12 <- mean(boot12 > lr12)

# 0

boot23 <- numeric(100)
for(i in 1:100){
  rmath <- unlist(simulate(mod2))
  bmod <- refit(mod3, rmath)
  smod <- refit(mod2, rmath)
  boot23[i] <- 2*(logLik(bmod)-logLik(smod))
}

pvalue23 <- mean(boot23 > lr23)

# 0

According to this code none of my models is significant. Why?

Edit 2:

I fitted the models without REML= FALSE (with ML) and the results corresponded to my lrt-test. But this is still wrong because I am not resampling from the same distribution... or is set.seed(123) accounting for that?

Did you set REML=FALSE? For model comparison you should use ML. — , Jul 05 '20 at 09:45
I used REML= False because this is needed for an lrt-Test. Is it not needed with an lrt test via bootstrapping? — a.henrietty, Jul 05 '20 at 09:46
Your bootstrap procedure is not correct. You are not estimating the logLik with the same resampled data for the various models. — , Jul 05 '20 at 10:45
What is the correct way? I am running the models with ML right now — a.henrietty, Jul 05 '20 at 10:50

score 1 · Answer 1 · 2020-07-05T12:06:36.730

This is a similar solution to that used here using the pbkrtest R package.

A mistake is that you are combining bootstrapped statistics with different sampled datasets.

If you want to aggregate the bootstrapped log-likelihoods of two models, you need to estimate them with the same resampled datasets.

This is a snippet to perform a parametric bootstrap:

library(lme4)

data(cake)

fm1 <- lmer(angle ~ recipe + (1|recipe:replicate), cake, REML=FALSE)
fm2 <- lmer(angle ~ recipe + temperature + (1|recipe:replicate), cake, REML=FALSE)
fm3 <- lmer(angle ~ recipe * temperature + (1|recipe:replicate), cake, REML=FALSE)

lr12 <- anova(fm1, fm2)$Chisq[2]
lr23 <- anova(fm2, fm3)$Chisq[2]

boot12 <- numeric(1000)
for(i in 1:1000){
  rmath <- unlist(simulate(fm1))
  bmod <- refit(fm2, rmath)
  smod <- refit(fm1, rmath)
  boot12[i] <- 2*(logLik(bmod)-logLik(smod))
}

pvalue12 <- mean(boot12 > lr12)

boot23 <- numeric(1000)
for(i in 1:1000){
  rmath <- unlist(simulate(fm2))
  bmod <- refit(fm3, rmath)
  smod <- refit(fm2, rmath)
  boot23[i] <- 2*(logLik(bmod)-logLik(smod))
}

pvalue23 <- mean(boot23 > lr23)

Comparison of the results:

fm1, fm2

Data: cake
Models:
fm1: angle ~ recipe + (1 | recipe:replicate)
fm2: angle ~ recipe + temperature + (1 | recipe:replicate)
    npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
fm1    5 1785.7 1803.7 -887.84   1775.7                         
fm2   10 1709.6 1745.6 -844.79   1689.6 86.106  5  < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

fm2, fm3

Data: cake
Models:
fm2: angle ~ recipe + temperature + (1 | recipe:replicate)
fm3: angle ~ recipe * temperature + (1 | recipe:replicate)
    npar    AIC    BIC  logLik deviance Chisq Df Pr(>Chisq)
fm2   10 1709.6 1745.6 -844.79   1689.6                    
fm3   20 1719.0 1791.0 -839.53   1679.0 10.53 10     0.3953

and the corresponding results obtained with boostrapping:

mean(boot12 > lr12)
[1] 0

mean(boot23 > lr23)
[1] 0.421

Thank you! I adapted my code (see edit above). Also, I fitted without REML=FALSE for my initial attemp and that yielded my results from the lrt test. — a.henrietty, Jul 05 '20 at 13:51

Likelihood ratio not significant but bootsrapping is (and is not)

Edit:

Edit 2:

1 Answers1