Recalculate log-likelihood from a simple R lm model

Question

I'm simply trying to recalculate with dnorm() the log-likelihood provided by the logLik function from a lm model (in R).

It works (almost perfectly) for high number of data (eg n=1000) :

> n <- 1000
> x <- 1:n
> set.seed(1)
> y <- 10 + 2*x + rnorm(n, 0, 2)
> mod <- glm(y ~ x, family = gaussian)
> logLik(mod)
'log Lik.' -2145.562 (df=3)
> sigma <- sqrt(summary(mod)$dispersion)
> sum(log(dnorm(x = y, mean = predict(mod), sd = sigma)))
[1] -2145.563
> sum(log(dnorm(x = resid(mod), mean = 0, sd = sigma)))
[1] -2145.563

but for small datasets there are clear differences :

> n <- 5
> x <- 1:n
> set.seed(1)
> y <- 10 + 2*x + rnorm(n, 0, 2)
> 
> mod <- glm(y ~ x, family = gaussian)
> logLik(mod)
'log Lik.' -8.915768 (df=3)
> sigma <- sqrt(summary(mod)$dispersion)
> sum(log(dnorm(x = y, mean = predict(mod), sd = sigma)))
[1] -9.192832
> sum(log(dnorm(x = resid(mod), mean = 0, sd = sigma)))
[1] -9.192832

Because of small dataset effect I thought it could be due to the differences in residual variance estimates between lm and glm but using lm provides the same result as glm :

> modlm <- lm(y ~ x)
> logLik(modlm)
'log Lik.' -8.915768 (df=3)
> 
> sigma <- summary(modlm)$sigma
> sum(log(dnorm(x = y, mean = predict(modlm), sd = sigma)))
[1] -9.192832
> sum(log(dnorm(x = resid(modlm), mean = 0, sd = sigma)))
[1] -9.192832

Where am I wrong ?

With `lm()`, you are using $\sqrt{\hat\sigma}$ instead of $\hat\sigma$. — Stéphane Laurent, Oct 18 '13 at 22:40
Thanks Stéphane for the correction but it still doesn't seem to work — Gilles, Oct 19 '13 at 09:31
I did this but this function just reverse the aic slot from the glm object to find back the log-likelihood. And I don't see anything about aic in the glm function... — Gilles, Oct 19 '13 at 10:20
I suspect this has something to do with LogLik and AIC (which are tied together at the hip) assuming that three parameters are being estimated (the slope, intercept, and dispersion/residual standard error) whereas the dispersion/residual standard error is calculated assuming two parameters are estimated (slope and intercept). — Tom, Oct 19 '13 at 12:04

Stéphane Laurent · Accepted Answer · 2016-08-06T11:32:23.510

15

The logLik() function provides the evaluation of the log-likelihood by substituting the ML estimates of the parameters for the values of the unknown parameters. Now, the maximum likelihood estimates of the regression parameters (the $\beta_j$'s in $X{\boldsymbol \beta}$) coincide with the least-squares estimates, but the ML estimate of $\sigma$ is $\sqrt{\frac{\sum \hat\epsilon_i^2}{n}}$, whereas you are using $\hat\sigma = \sqrt{\frac{\sum \hat\epsilon_i^2}{n-2}}$, that is the square root of the unbiased estimate of $\sigma^2$.

>  n <- 5
>  x <- 1:n
>  set.seed(1)
>  y <- 10 + 2*x + rnorm(n, 0, 2)
>  modlm <- lm(y ~ x)
>  sigma <- summary(modlm)$sigma
> 
>  # value of the likelihood with the "classical" sigma hat
>  sum(log(dnorm(x = y, mean = predict(modlm), sd = sigma)))
[1] -9.192832
> 
>  # value of the likelihood with the ML sigma hat
>  sigma.ML <- sigma*sqrt((n-dim(model.matrix(modlm))[2])/n) 
>  sum(log(dnorm(x = y, mean = predict(modlm), sd = sigma.ML)))
[1] -8.915768
>  logLik(modlm)
'log Lik.' -8.915768 (df=3)

edited Aug 06 '16 at 11:32

answered Oct 19 '13 at 18:45

Stéphane Laurent

17,425
5
59
101

By the way you have to similarly be careful with the REML/ML option for lme/lmer models. – Stéphane Laurent Oct 19 '13 at 18:51
(+1) Is it n-1 or is it indeed n-2 in the denominator of $\hat\sigma$ ? – Patrick Coulombe Oct 19 '13 at 20:07
@PatrickCoulombe No : intercept + slope – Stéphane Laurent Oct 19 '13 at 20:28
Ok, perfectly clear now. Thanks a lot ! But what do you mean with REML/ML (something to do with my last post on GuR I guess) ? Please explain (there maybe). I want to learn ! – Gilles Oct 19 '13 at 21:26
The REML estimates of the variance components in a mixed models are like the "corrected for bias" ML estimates. I have not seen your post on GuR yet :) – Stéphane Laurent Oct 19 '13 at 22:17

Recalculate log-likelihood from a simple R lm model

1 Answers1

Linked