10

In parameter estimation, it's common to report a 95% CI around each parameter. Why don't I see AIC (or deltaAIC) with a CI?

If I bootstrap the fitting of two potential models, and get a deltaAIC for each iteration, would reporting a 95% CI be meaningful?

Edit: I don't mean to imply that AIC is the same as parameter estimation. I'm asking why we treat goodness-of-fit estimates (AIC, BIC, etc.) differently from estimates that are reported with a CI.

sharoz
  • 193
  • 11
  • 3
    What would you need the confidence intervals for? – Tim Jun 05 '21 at 18:17
  • 2
    @Tim Let's say a point estimate for deltaAIC is 2.5, it might seem like a moderate difference. But if the 95%CI is [-1, 4], the result is too noisy for there to be a reliable difference between the two models. However, if the CI is [2, 3], the difference is more clear. – sharoz Jun 05 '21 at 18:21
  • 1
    @Tim, in addition to sharoz's valid response, see also my answer for an idea. – Richard Hardy Jun 06 '21 at 10:33
  • AIC is too simple a measure for model selection. See the discussion on AIC for multiple linear regression, and an alternative to it, $SIC_f$, on: https://stats.stackexchange.com/questions/524258/why-does-the-akaike-information-criterion-aic-sometimes-favor-an-overfitted-mo/524311#524311 – Match Maker EE Jun 06 '21 at 10:45
  • 1
    @MatchMakerEE, what about the proof that AIC is the efficient model selector (compare to another proof that BIC is consistent but not efficient)? Does it contain a mistake? If not, why should one use another criterion if the goal is forecast accuracy? – Richard Hardy Jun 06 '21 at 10:55
  • 1
    @sharoz: I added to my answer in response to your edit. – Christian Hennig Jun 07 '21 at 08:45

2 Answers2

19

The AIC is not an estimator of a true parameter. It is a data-dependent measurement of the model fit. The model fit is what it is, there is no model fit that is any "truer" than the one you have, because it's the one you have that is measured. But without any true parameter for which the AIC would be an estimator, one cannot have a confidence interval (CI).

I'm by the way not disputing the answer by Richard Hardy. The AIC, as some other quantities such as $R^2$, can be interpreted as estimating something "true but unobservable", in which case one can argue that a CI makes sense. Personally I find the interpretation as measuring fit quality more intuitive and direct, for which one wouldn't have a CI for the reasons above, but I'm not saying that there is no way for it to be well defined and of some use.

Edit: As a response to the addition in the question: "I don't mean to imply that AIC is the same as parameter estimation. I'm asking why we treat goodness-of-fit estimates (AIC, BIC, etc.) differently from estimates that are reported with a CI." - The definition of a CI relies on a parameter being estimated. It says that given the true parameter value the CI catches this value with probability $(1-\alpha)$. As long as you're not interested in that true parameter value, a CI is meaningless.

Christian Hennig
  • 10,796
  • 8
  • 35
  • 6
    +1 but it might help to explain why this is in contrast to something like MSE or $R^2$. – Dave Jun 05 '21 at 20:24
  • 1
    @Dave: You can do that if you think it makes sense; I hardly ever see CIs for $R^2$ and MSEs. From my point of view the meaning of these is less intuitive than the fact that there's none for AIC. – Christian Hennig Jun 05 '21 at 20:37
  • MSE, for instance (and depending on the denominator you use), estimates the standard deviation of the error term, which is a "true" parameter. – Dave Jun 05 '21 at 20:40
  • Fair enough (the term has more than one use), but I'm not sure this adds to my answer. If you find it important, write your own. – Christian Hennig Jun 05 '21 at 20:45
  • What you can do to get a sense of the reliability of the fit, is bootstrapping or some other simulation! – kjetil b halvorsen Jun 06 '21 at 03:38
  • @Dave, MSE and standard error are not even on the same scale. You probably meant MSE estimates the variance of the error term. This makes sense if we assume the error term has zero mean. – Richard Hardy Jun 06 '21 at 10:09
  • The R-squared and MSE definitively have population analogons. For AIC, I do not think so. – Michael M Jun 06 '21 at 13:34
  • 2
    @MichaelM, for AIC the population analog is twice the negative average log-likelihood. If you find it easy to see the population analog of MSE (average squared error), I think it is not that difficult to see it for AIC; but perhaps it is a subjective impression. – Richard Hardy Jun 06 '21 at 14:02
  • @RichardHardy: interesting, I see! – Michael M Jun 06 '21 at 14:36
  • For context, I do see occasionally see confidence intervals for R and find them useful. (Note, the ^2 is often dropped in these cases, so you can tell if the ends have different signs). But an R CI is often redundant with R^2 and a p-value. – sharoz Jun 07 '21 at 02:27
  • @Lewian I agree, it's the truest fit of the available data. But wouldn't a CI tell you something about how strongly a small number of outliers in the data are skewing the fit? If all residuals were moderate, you might get a different CI than if most residuals were small and a few were huge. – sharoz Jun 07 '21 at 07:33
  • 1
    @sharoz: There are techniques designed to identify and deal with outliers that are much better than that. – Christian Hennig Jun 07 '21 at 08:40
12

AIC estimates $-2n \ \times$ the expected likelihood on a new, unseen data point from the data generating process (DGP) that generated your sample.* Even though the target (the estimand) is not a parameter, it is a meaningful quantity. E.g. it may be interpreted as the expected loss of a point prediction. It is quite natural to wish for a confidence interval around the point estimate (the AIC). This way we can tell not only what the expected loss is but also how uncertain it is. In summary, while I do not have a ready answer for how to obtain the confidence interval and under what conditions your idea of bootstrapping may work, I clearly do see a point in pursuing it.

*See How can we select the best GARCH model by carrying out likelihood ratio test?, Can results for model selection with AIC be interpretable at the population level?, Using AIC/BIC within cross-validation for likelihood based loss functions among other threeads where this idea is employed.

Richard Hardy
  • 54,375
  • 10
  • 95
  • 219
  • 1
    I'd like to emphasize this by saying that in my application field, I often have structure in my data that I cannot adequately convey to the model I train. In many cases I can get sensible predictive quality, but any error or quality (incl. goodness of fit) estimates from within that model may be way off. But I may be able to set up a bootstrap in a way that reflects this structure. – cbeleites unhappy with SX Jun 06 '21 at 11:47
  • @Richard Hardy: This is a fair point and I have amended my answer to take this into account. – Christian Hennig Jun 06 '21 at 12:56
  • @Lewian, thanks, your edit makes sense to me. I also phrased my answer carefully with yours in mind. Mine does not (or at least is not intended to) contradict yours, it is just another perspective. – Richard Hardy Jun 06 '21 at 13:56
  • @RichardHardy Thanks. Looking at your answer for "Can results...": As n->∞, AIC behaves similarly to loglik. That leads to a follow-up: why not report loglik with a CI? – sharoz Jun 07 '21 at 02:45
  • 1
    @sharoz, similarly to the case of AIC, I do not see anything conceptually wrong with reporting a confidence interval for log-likelihood. On the other hand, perhaps only a (very) small fraction of users are typically interested in the confidence interval for log-likelihood, so it is not reported for the sake of brevity. – Richard Hardy Jun 07 '21 at 06:22