18

I have tried calculating the AIC of a linear regression in R but without using the AIC function, like this:

lm_mtcars <- lm(mpg ~ drat, mtcars)

nrow(mtcars)*(log((sum(lm_mtcars$residuals^2)/nrow(mtcars))))+(length(lm_mtcars$coefficients)*2)
[1] 97.98786

However, AIC gives a different value:

AIC(lm_mtcars)
[1] 190.7999

Could somebody tell me what I'm doing wrong?

luciano
  • 12,197
  • 30
  • 87
  • 119
  • 5
    (without checking your answer yet): You're not necessarily doing anything wrong, since likelihood is actually only defined up to a multiplicative constant; two people can calculate log-likelihood and get different numbers (but differences in log-likelihood is the same). – Glen_b Feb 20 '14 at 20:26
  • 1
    [Hong Oois answer](http://stackoverflow.com/a/8374231/2378656) is related to this question, I think. The formula that the function `AIC` uses is `-2*as.numeric(logLik(lm_mtcars))+2*(length(lm_mtcars$coefficients)+1)`. – COOLSerdash Feb 20 '14 at 20:27
  • luciano: The "+1" in that formula @COOLSerdash points to arises from the variance parameter term. Note also that the function `logLik` says that for `lm` models it includes 'all constants' ... so there'll be a `log(2*pi)` in there somewhere – Glen_b Feb 20 '14 at 21:30
  • 1
    @Glen_b: Why say likelihood's *defined* only up to an multiplicative constant? After all, when comparing non-nested models from different families of distribution (e.g. with AIC, or with the Cox test), you need to remember that constant. – Scortchi - Reinstate Monica Feb 20 '14 at 21:43
  • @Scortchi the definition isn't mine! You'll have to take it up with R.A.Fisher. It's been that way from the start, I think (1921). That it's still defined that way, at least in the continuous case, see [here](http://en.wikipedia.org/wiki/Likelihood_function#Continuous_probability_distribution), for example, at the sentence beginning 'More precisely,'. – Glen_b Feb 20 '14 at 21:52
  • @Glen_b: Wasn't crediting/blaming you - it's just that it's intermittently puzzled me for ages & you reminded me again now. – Scortchi - Reinstate Monica Feb 20 '14 at 21:56
  • @Scortchi Okay. Then I say it's defined that way because that's the way it has been defined. I wish I had Fisher's own explanation to hand (it was pretty clear), but right now I don't. If you're comparing two likelihoods you have to make sure that they don't leave out different constants, of course. – Glen_b Feb 20 '14 at 21:57
  • @Glen_b: Fair enough. There's a post [here](http://stats.stackexchange.com/questions/29682/how-to-rigorously-define-the-likelihood) on how to rigorously define likelihood. For the moment (i.e. till I miraculously become clever enough to understand it all) at least I like gui11aumes's comment: "Personally I prefer to call it up [equality up to a multiplicative constant] when needed rather than hard code it in the definition." – Scortchi - Reinstate Monica Feb 20 '14 at 22:28

2 Answers2

20

Note that the help on the function logLik in R says that for lm models it includes 'all constants' ... so there will be a log(2*pi) in there somewhere, as well as another constant term for the exponent in the likelihood. Also, you can't forget to count the fact that $\sigma^2$ is a parameter.

$\cal L(\hat\mu,\hat\sigma)=(\frac{1}{\sqrt{2\pi s_n^2}})^n\exp({-\frac{1}{2}\sum_i (e_i^2/s_n^2)})$

$-2\log \cal{L} = n\log(2\pi)+n\log{s_n^2}+\sum_i (e_i^2/s_n^2)$

$= n[\log(2\pi)+\log{s_n^2}+1]$

$\text{AIC} = 2p -2\log \cal{L}$

but note that for a model with 1 independent variable, p=3 (the x-coefficient, the constant and $\sigma^2$)

Which means this is how you get their answer:

nrow(mtcars)*(log(2*pi)+1+log((sum(lm_mtcars$residuals^2)/nrow(mtcars))))
       +((length(lm_mtcars$coefficients)+1)*2)
Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • Why in your calculation of $s^2$ are you only dividing by $n$ and not $n-p$? – Luke Thorburn Sep 28 '17 at 01:00
  • 1
    See the definition of AIC: $-2\log\mathcal{L}(\hat\theta)+2p$ where the vector of parameters, $\theta$ are evaluated at the maximum (i.e. all the elements of $\hat\theta$ are MLEs); e.g. see Wikipedia [Akaike information criterion: Definition](https://en.wikipedia.org/wiki/Akaike_information_criterion#Definition). If you're not dividing by $n$ there in the computation of $\hat{\sigma}^2$, you're not calculating the MLE of $\sigma^2$ and so not really computing AIC -- in effect you'd be adjusting twice for the effect of fitting parameters. (Yes, lots of people do it wrong) – Glen_b Sep 28 '17 at 01:01
  • Is there a typo in the second equation? Should it be $-2\log \cal{L} = n\log(2\pi)+n\log{s_n}+\sum_i (e_i^2/s_n^2)$ Ok I see, you're using $\sqrt{2\pi s^2_n}$ – rhody Sep 26 '19 at 22:08
13

The AIC function gives $2k -2 \log L$, where $L$ is the likelihood & $k$ is the number of estimated parameters (including the intercept, & the variance). You're using $n \log \frac{S_{\mathrm{r}}}{n} + 2(k-1)$, where $S_{\mathrm{r}}$ is the residual sum of squares, & $n$ is the sample size. These formulæ differ by an additive constant; so long as you're using the same formula & looking at differences in AIC between different models where the constants cancel, it doesn't matter.

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248