6

I would like to know whether I can use AIC, or if the models have the same number of predictors, the log-likelihood, to compare logit vs probit vs cloglog models (fitted for instance with glmer or glmmTMB in R). The question could also be formulated: do the various link functions of these models scale the likelihood similarly (irrespectively of the software-dependent constant included in the likelihood)?

I feel that this question has been asked under various forms but not always answered directly (for instance here: Can I use AIC value for comparing logit and probit model where for each model the number of covariates are equal?) and I can find contrasted answers (for instance here someone suggests you can't: https://www.researchgate.net/post/AIC_of_glmer). So I am allowing myself to post this question here. Answers would certainly be helpful to me and hopefully others in the future. Thanks.

Jehol
  • 462
  • 3
  • 12
  • 3
    (+1) Generalized linear models that differ merely in the link function employed don't have incommensurable likelihoods (see [Likelihood comparable across different distributions](https://stats.stackexchange.com/q/345069/17230)). But your parenthetic "fitted for instance with glmer or glmmTMB" deserves more emphasis (& translation from software-specific jargon): you're talking about mixed models & may not be comparing maximized likelihoods anyway. – Scortchi - Reinstate Monica Jan 09 '19 at 15:21
  • 2
    Not too long ago, someone on this website made the point that the choice of link function should be made primarily based on whether we care about explanation or prediction. If we care about being able to interpret model coefficients, then using a logit link, for example, would be preferred. If we care about prediction, then we could consider links like cloglog. Joseph Hilbe may cover this point in one of his books. – Isabella Ghement Jan 09 '19 at 16:20
  • 2
    A poor choice of link function may result in over-dispersion (see https://www.mun.ca/biology/dschneider/b7932/B7932Final10Dec2008.pdf), so you may counter that poor choice by allowing for over-dispersion in your modelling. – Isabella Ghement Jan 09 '19 at 16:27
  • @Scortchi thanks for what for me is an answer, but being a comment I can't vote for it as such. Yes indeed, I am fitting mixed models but in a maximum likelihood framework. – Jehol Jan 10 '19 at 09:03
  • 1
    @Isabella true, there are many discussions about this on cross validated and elsewhere. Here I am only focusing on the mathematical/statistical validity of the comparison. the issue of the link between link function choice and overdispersion is less discussed though. Thanks for bringing it in. – Jehol Jan 10 '19 at 09:04
  • That's the point though - you could spend so much time trying to optimize a link choice only to discover the optimal link renders a model that is difficult to interpret (assuming you are in an explanatory context). So you will perhaps need to qualify your pursuit and mention that it makes more sense in a predictive context. – Isabella Ghement Jan 10 '19 at 15:36
  • @Scortchi if you were to calculate the log-likelihood for a series of models (say ones fitted by `lme4::glmer` or `glmmTMB::glmmTMB`) but were modeled with different distributions (i.e. binomial and beta-binomial) would you be able to use log-likelihood to compare the models? I feel like this doesn't make sense but a much more seasoned colleague of mine is doing this.... – André.B Apr 29 '19 at 00:33
  • 1
    @André.B: If they're fitting their models by maximum-likelihood, then I can't see a problem with comparing likelihoods/AIC (standard terms & conditions apply). If they're fitting their models by REML (or whatever the equivalent of REML is for GLMMs), then I don't know - you could ask a q. here giving more detail & see what the mixed-model experts have to say. – Scortchi - Reinstate Monica Apr 29 '19 at 08:44

1 Answers1

1

I would say yes, I can't see any reason why they wouldn't. We're talking about evaluating the log-likelihood for the same conditional probability distribution, with the same probability density (i.e. we don't have to account for changes in the scale due to transformation). So we're comparing the likelihoods for

$$ y_i \sim \textrm{Distrib}(g_1^{-1}((\mathbf X \boldsymbol \beta)_i), \phi) $$

with

$$ y_i \sim \textrm{Distrib}(g_2^{-1}((\mathbf X \boldsymbol \beta)_i), \phi) $$

where $\textrm{Distrib}$ is the response distribution, $g_1$ and $g_2$ are the alternative link functions (the rest is as usual for GLMs: $\mathbf X$=model matrix, $\boldsymbol \beta$=coefficient vector, $\phi$=scale parameter). This is just comparing two different nonlinear specifications for the location of the distribution. As long as you're OK with using AIC to compare non-nested functions (which almost everyone is), this should be fine.

Ben Bolker
  • 34,308
  • 2
  • 93
  • 126