4

I am using AIC (Akaike information criterion) for model selection. There are 2 models. The first model has 2 parameters with log likelihood of -10182.0284 and the second model has 3 parameters with the same likelihood when tried on a specific dataset that displays the need for only two parameters. The weighting I get with AIC is equal for both models. The equality seems to come from the fact that not all significant figures are taken into account and with such low log likelihood the number of parameter penalization is insignificant. The results:

AICmodelSelect(-10182.0284,-10182.0284)
AIC_min
null model min AIC
relprob_null
     1
relprob_alt
     1
weight_null
    0.5000
weight_alt
    0.5000

AIC equally favours both models. I am also doing likelihood ratio test cause the models are nested and the p-value is below 0.01 for the null model (simpler constrained model). But how do I justify choosing the simpler model with AIC when there is equal weighting given here?

Vass
  • 1,425
  • 2
  • 14
  • 20

3 Answers3

5

There was a fairly good commentary in the Journal of Wildlife Management concerning uninformative parameters within the AIC framework.

Arnold, T. W. 2010. Uninformative parameters and model selection using Akaike’s Information Criterion. Journal of Wildlife Management 74:1175–1178. [Link].

We usually consider models within 2 delta AIC as competitive. However, if a model has an addition of only one parameter to its competitor and that parameter is not significant, that parameter is likely spurious. AIC = –2LL + 2K so the penalty for adding one parameter is +2 AIC. If only one parameter is added but the AIC is within 2 delta AIC, the model fit was not improved enough to overcome the penalty. Therefore, that parameter is uninformative and should not be included in the model or interpreted as having an effect.

RioRaider
  • 814
  • 11
  • 24
4

I couldn't find AICmodelSelect in any R package, searching in both R with ?? and Google. What package did you use? Or is it R?

In any case, if the log likelihoods are equal and the models have different numbers of parameters, then the AIC are not equal, which is what you have entered. The formula for AIC is $AIC = 2k - 2ln(L)$ where k is the number of parameters and $2ln(L)$ is the log likelihood.

In your case the two AICs would be 6 + 10182.0284 and 4 + 10182.0284, the second is smaller and that is the model you should choose, based on AIC.

gui11aume
  • 13,383
  • 2
  • 44
  • 89
Peter Flom
  • 94,055
  • 35
  • 143
  • 276
  • 4
    (+1) Still the fact that the log likelihood of two models is exactly the same is a bit intriguing. It might be the case that there is a mistake in the code or the bigger model is a repameterisation of the smaller one, or something else ... –  Jul 27 '12 at 14:20
  • 1
    That's certainly true. Hard to imagine how they could be the same to so many decimal places. Good point. – Peter Flom Jul 27 '12 at 15:04
  • @Procrastinator, the first model is a contrained (nested) model of the other. The extra parameter is not joint to the others. But what about http://en.wikipedia.org/wiki/Akaike_information_criterion#How_to_apply_AIC_in_practice which describes an averaging of the values for a weighted – Vass Jul 27 '12 at 15:20
  • @Vass I think the first, and more worrying, thing to clarify is why they have the same log-likelihood. –  Jul 27 '12 at 15:26
  • @Procrastinator in the second model there is an extra term added to the first model that takes into account a higher order effect, which is not present in the dataset used, so that it is zero everytime in the data, makes no contribution. – Vass Jul 27 '12 at 15:48
  • @Vass Then it appears to me that both models are the same. For this reason you are obtaining the same log-likelihood. –  Jul 27 '12 at 15:53
  • @Procrastinator, great, but then it is indecisive in this situation. with the likelihood ratio test I can then find the chi-squared distribution value to see that it does not fall below a p-value and conclude that the simplest model is **best** can I not get such an answer with AIC? – Vass Jul 27 '12 at 15:56
  • @Vass Well, that is also intriguing because, as you mentioned, the log-likelihood of both models is the same (because the models are the same) then the likelihood ratio is $1$ for any data set and therefore the assumptions for using a likelihood ratio test are no longer valid. –  Jul 27 '12 at 15:59
  • @Procrastinator, no the models are not the same, the more complex/uncontrained model collapses to the constrained more simple model in the lack of higher order features. The likelihood ratio in the instance of equal likelihoods is 0, and only when higher order features are not present, and the likelihood ratio is valid. When D=0, the chi-squared test does not fall below a p-value to reject the null hypothesis. – Vass Jul 27 '12 at 16:05
  • 1
    When both models are the same, it doesn't matter which one you choose. You can't use AIC or anything else to choose between two models that are exactly the same. However, including a term that contributes nothing at all is silly. (See my next comment, too) – Peter Flom Jul 27 '12 at 16:05
  • 1
    OK, the two models are the same in *this dataset*. To distinguish between them, get more data. – Peter Flom Jul 27 '12 at 16:07
-2

Why do people strictly rely upon a criteria (ie AIC) to determine the "best" model? Why not use the principle of N.I.I.D and parsimony as the guide instead of a fit statistic? Sure we can compare variance to after that to see who had a better model, but this whole rule based way of modeling is contra to what I believe in.

As you may know, N.I.I.D. is what we are first taught in time series analysis that the errors should be gaussian. By using AIC or BIC criterion to build a model you are losing the goal of building a model and more of fitting. I have found that instead of using AIC that focusing on the N.I.I.D. of the errors, significance of parameters and parsimony you will have a better model using an Identification scheme focused on robustified ACF and PACF.

Tom Reilly
  • 1,677
  • 11
  • 13