Different AIC definitions

Question

From Wikipedia there is a definition of Akaike's Information Criterion (AIC) as $ AIC = 2k -2 \log L $, where $k$ is the number of parameters and $\log L$ is the log-likelihood of the model.

However, our Econometrics notes at a well-respected university state that $ AIC = \log (\hat{\sigma}^2) + \frac{2 \cdot k}{T} $. Here $ \hat{\sigma}^2 $ is the estimated variance for the errors in an ARMA model and $ T $ is the number of observations in the time series dataset.

Is the latter definition equivalent to the first, but simply tuned for ARMA models? Or is there some kind of conflict between the two definitions?

For the record: criterion singular, criteria plural. (Edited accordingly.) — Nick Cox, Jan 20 '16 at 10:46

Glen_b · Accepted Answer · 2016-01-20T11:15:21.910

The formula you quote from your notes is not exactly AIC.

AIC is $-2\log\mathcal{L}+2k$.

Here I'll give an outline of an approximate derivation that makes clear enough what's going on.

If you have a model with independent normal errors with constant variance,

$$\mathcal{L}\propto \sigma^{-n} \: e^{-\frac{1}{2\sigma^2}\sum \varepsilon_i^2}$$

which can be estimated under maximum likelihood as

\begin{eqnarray} & \propto &(\hat{\sigma}^2)^{-n/2} e^{-\frac12 n\hat{\sigma}^2/\hat{\sigma}^2}\\ & \propto &(\hat{\sigma}^2)^{-n/2} e^{-\frac12 n}\\ & \propto &(\hat{\sigma}^2)^{-n/2} \end{eqnarray}

(assuming the estimate of $\sigma^2$ is the ML estimate)

So $-2\log\mathcal{L} +2k = n\log{\hat{\sigma}^2} + 2k$ (up to shifting by a constant)

Now in the ARMA model, if $T$ is really large compared to $p$ and $q$, then the likelihood can be approximated by a such a Gaussian framework (e.g. you can write the ARMA approximately as a longer AR and condition on enough terms to write that AR as a regression model), so with $T$ in place of $n$:

$AIC \approx T\log{\hat{\sigma}^2} + 2k$

hence

$AIC/T \approx \log{\hat{\sigma}^2} + 2k/T$

Now if you're simply comparing AICs, that division through by $T$ doesn't matter at all, since it doesn't change the ordering of AIC values.

However, if you're using AIC for some other purpose that relies on the actual value of differences in AIC (such as to do multimodel inference as described by Burnham and Anderson), then it matters.

Numerous econometrics texts seem to use this AIC/T form. Oddly, some books seem to reference Hurvich and Tsai 1989 or Findley 1985 for that form, but Hurvich & Tsai and Findley seem to be discussing the original form (though I only have an indirect indication of what Findley does right now, so perhaps there is something in Findley on it).

Such scaling might be done for a variety of reasons -- for example, time series, especially high frequency time series, can be very long and ordinary AICs might have a tendency to become unwieldy, especially if $\sigma^2$ is very small. (There are some other possible reasons, but since I really don't know the reason this was done I won't start going down a list of all possible reasons.)

You may like to look at Rob Hyndman's list of Facts and fallacies of the AIC, - particularly items 3 to 7. Some of those points might lead you to be at least a little cautious about relying too heavily on the approximation by Gaussian likelihood, but maybe there's a better justification than I offer here.

I'm not sure there's a good reason to use this approximation to the log-likelihood rather than the actual AIC since a lot of time series packages these days tend to calculate (/maximize) the actual log-likelihood for ARMA models. There seems little reason not to use it.

Sooner or later, every discussion about any *IC turns into "This is the criterion you should use, except that it often gives the wrong answer in such-and-such circumstances". Just being ironic, not at all critical of a typically helpful answer. This is just like real life, where some generic maxim such as "love everybody" is usually to be overridden temporarily by other advice if somebody is trying to beat you up or rip you off. — Nick Cox, Jan 20 '16 at 10:45
@Nick I'm not bothered by the texts that use AIC/$n$ rather than AIC, but what does worry me is that so many of the econometrics books I've looked at just call it "AIC" *without any comment*. To me that's just recklessly irresponsible. Whoever was first to do it but not say so has been copied again and again. — Glen_b, Feb 01 '16 at 21:42

score 2 · Answer 2 · answered Jan 20 '16 at 10:58

I believe this is based on the assumption of normal errors. In econometrics, you operate using asymptotics, especially in the time series applications using AIC. As a consequence, the normal assumption should hold asymptotically to justify this (asymptotic) model selection scheme.

Recall that the logarithm of the normal likelihood is $ln(L) = -(T/2)ln(2\pi) -(T/2)ln(\sigma^2) - (1/2\sigma^2)\sum(x_i - \mu) $, where we use $\mathbb{E}(X) = \mu$ and $Var(X) = \sigma^2$ if your data is drawn from X. In what follows we neglect the first term, as the observed sample $x_1, ..., x_T$ does not affect it.

Simply use the more general (first) formula and plug in $L$ for the normal likelihood. The first term can be ignored (it is a constant regardless of regressor choice). The second term becomes $Tln(\sigma^2)$. The third term becomes $(1/\sigma^2)(T\hat{\sigma}^2)$, where we have used $\hat{\sigma}^2 = T^{-1} \sum(x_i - \bar{x})$. Again, not using a finite sample correction is justified here because this estimator is only valid asymptotically if the errors are not normal. Since we do not know $\sigma^2$, we have to estimate the third term as $(1/\sigma^2)(T\hat{\sigma}^2) = (1/\hat{\sigma}^2)(T\hat{\sigma}^2)$ = T.

In summary, this means we get for the normal likelihood that $AIC = 2k + Tln(\sigma^2) + 1$. Needless to say, the minimization is not affected by ignoring the constant $1$. The term is now simply divided by $T$, since it does not change the minimization problem to scale all additive components by $T$. This lands you at the second result, because $AIC$ and $AIC/T$ are identical for the purpose of minimization.

Different AIC definitions

2 Answers2

Linked