21

I was looking for ways to do a likelihood ratio test in R to compare model fits. I first coded it myself, then found both the default anova() function and also lrtest() in the lmtest package. When I checked, though, anova() always produces a slightly different p-value from the other two even though the 'test' parameter is set to "LRT". Is anova() actually performing some subtly different test, or am I not understanding something?

Platform: R 3.2.0 running on Linux Mint 17, lmtest version 0.9-33

Sample code:

set.seed(1) # Reproducibility
n=1000
y = runif(n, min=-1, max=1)
a = factor(sample(1:5, size=n, replace=T))
b = runif(n)

# Make y dependent on the other two variables
y = y + b * 0.1 + ifelse(a==1, 0.25, 0)
mydata = data.frame(y,a,b)

# Models
base = lm(y ~ a, data=mydata)
full = lm(y ~ a + b, data=mydata)

# Anova
anova(base, full, test="LRT")

# lrtest
library(lmtest)
lrtest(base, full)

# Homebrew log-likelihood test
like.diff = logLik(full) - logLik(base)
df.diff = base$df.residual - full$df.residual
pchisq(as.numeric(like.diff) * 2, df=df.diff, lower.tail=F)

When I run it, anova() gives a p-value of 0.6071, whereas the other two give 0.60599. A small difference, but consistent, and too big to be imprecision in how floating point numbers are stored. Can someone explain why anova() gives a different answer?

Ben Bolker
  • 34,308
  • 2
  • 93
  • 126
Jason
  • 313
  • 1
  • 2
  • 4

2 Answers2

22

As mentioned in the previous answer, the difference comes down to a difference in scaling, i.e., different estimators for the standard deviation of the errors. Sources for the difference are (1) scaling by $n-k$ (the unbiased OLS estimator) vs. scaling by $n$ (the biased ML estimator), and (2) using the estimator under the null hypothesis or alternative.

The likelihood ratio test implemented in lrtest() uses the ML estimator for each model separately while anova(..., test = "LRT") uses the OLS estimator under the alternative.

sd_ols <- function(object) sqrt(sum(residuals(object)^2)/df.residual(object))
sd_mle <- function(object) sqrt(mean(residuals(object)^2))

Then the statistic that lrtest() computes is

ll <- function(object, sd) sum(dnorm(model.response(model.frame(object)),
  mean = fitted(object), sd = sd, log = TRUE))
-2 * (ll(base, sd_mle(base)) - ll(full, sd_mle(full)))
## [1] 0.266047

anova(..., test = "LRT") on the other hand uses

-2 * (ll(base, sd_ols(full)) - ll(full, sd_ols(full)))
## [1] 0.2644859

Under the null hypothesis both are asymptotically equivalent, of course, but in finite samples there is a small difference.

Achim Zeileis
  • 13,510
  • 1
  • 29
  • 53
  • 1
    Thanks for the answer. So, can we say that one variant is better than the other? Can I use the anova-test without concerns? – Funkwecker Oct 07 '16 at 09:38
  • 1
    I don't know any theoretical results concerning this question but I wouldn't be surprised if the OLS variant performs slightly better in small samples with Gaussian errors. But already in moderately large samples the differences should be negligible. – Achim Zeileis Oct 07 '16 at 16:35
10

The test statistics is derived differently. anova.lmlist uses the scaled difference of the residual sum of squares:

anova(base, full, test="LRT")
#  Res.Df    RSS Df Sum of Sq Pr(>Chi)
#1    995 330.29                      
#2    994 330.20  1   0.08786   0.6071

vals <- (sum(residuals(base)^2) - sum(residuals(full)^2))/sum(residuals(full)^2) * full$df.residual 
pchisq(vals, df.diff, lower.tail = FALSE)
#[1] 0.6070549
Roland
  • 5,758
  • 1
  • 28
  • 60