6

suppose I have a data set $x_1, \ldots, x_n$ and I would fit a normal, an exponential and a uniform distribution to them. The fitting function spits out a bunch of goodness-of-fit statistics, e.g. the AIC, BIC, chi-square, Kolmogorov-Smirnov, etc.

I am trying to convince someone that the AIC is not appropriate here, because we have different log-likelihoods, and sometimes different number of parameters, depending on the distributions. I would prefer the p-value of the Kolmogorov-Smirnov-Test to compare the fits.

Is my approach justified? How can I convince my coworker the AIC is not okay here (he likes to see a cited paper or something equivalent)?

Thanks in advance!

edit: Specifically, I was shown this article: http://www.vosesoftware.com/whitepapers/Fitting%20distributions%20to%20data.pdf

I have no idea what to say to this. Page 4 lists the flaws of the chi-squared, Kolmogorov-Smirnov etc, and page 5 and 6 praise the AIC. Is he right?

Alexander Engelhardt
  • 4,161
  • 3
  • 21
  • 25
  • 4
    I think it is valid to use [AIC](http://en.wikipedia.org/wiki/Akaike_information_criterion) for comparing these models. The Wikipedia entry mentions "There are, however, important distinctions. In particular, the likelihood-ratio test is valid only for nested models whereas AIC (and AICc) has no such restriction". On the other hand, it is more intriguing why are you comparing such diferent models **""I would fit a normal, an exponential and a uniform distribution"**? They are different in terms of shape, support, ... –  Jul 06 '12 at 12:48
  • I believe the author is a bit biased to information criteria, which are not the Panacea by the way because they are based on asymptotic results. I think it is better to use a couple of criteria that assess different features of the models. For example, AIC penalises the number of parameters, some goodness of fit tests assess the fit on the tails or the shoulders of the distribution, and some other evaluate the predictive performance of the models in question. –  Jul 06 '12 at 13:03
  • I am comparing such different models just because we want to see all possible models. Our function just fits all models that are interesting to us - I know they are quite different in character :) In part, I also wanted to deliberately choose different models to see if even there the AIC could be used (if it could at all). – Alexander Engelhardt Jul 06 '12 at 13:04
  • 1
    Why are you trying to convince them that AIC is not justified? – John Jul 06 '12 at 13:44
  • Because I learned it's wrong to use the likelihood based AIC to compare across models with a different likelihood. If that opinion is wrong, I wouldn't mind being corrected, though. – Alexander Engelhardt Jul 07 '12 at 08:21
  • You learned wrong. It would be a rather rare case where the likelihoods were the same and the AIC's different. You'd have to add parameters that did absolutely nothing. – John Jul 08 '12 at 14:24
  • Oh, I didn't mean the same likelihood *value*, but the same likelihood family, i.e. distribution. The classic example where the AIC is appropriate would be in comparing nested models, i.e. regressions with one more (or one less) input variable. People at university told us that the AIC is not okay for comparing an exponential GLM with a normal LM. How much truth is in that statement? – Alexander Engelhardt Jul 09 '12 at 11:19
  • 1
    @AlexxHardt Some related questions/answers: ["Can AIC compare across different types of model?"](http://stats.stackexchange.com/q/4997/10525), ["Non-nested model selection"](http://stats.stackexchange.com/q/20441/10525), ["Testing the difference in AIC of two non-nested models"](http://stats.stackexchange.com/q/8557/10525). –  Jul 09 '12 at 14:18
  • Thanks. In two of those three threads though, the top answer says that the theory was only worked out for nested models, and it's "less clear" when comparing across different families. What should I think about this? – Alexander Engelhardt Jul 10 '12 at 08:52

2 Answers2

9

You have to penalize the model for number of parameters. Let's say you had 30 data points and a model to fit it to that takes 29 parameters to define it. You could fit the data perfectly. But, that's not a terribly fair way to compare it to a uniform distribution with far fewer parameters.

The paper you cite mentions this. Likely you're having trouble making an argument against it because there isn't a good general one. The argument would be against how much you penalize for extra parameters in the model. In that case you may want to examine different kinds of information criteria.

Furthermore, it's also a good idea to look at some other fit measures as well. There's nothing wrong with using multiple ones and making rational arguments when the AIC differences are very small.

John
  • 21,167
  • 9
  • 48
  • 84
  • Oh, yes, I agree with the need for penalty. What I don't agree with, is that the likelihood descend from different distributions, i.e. different density families. Aren't we comparing apples with oranges there? I thought the AIC only works for nested models where you add one input variable to your regression and then see how much the likelihood improves. But it would still be the same likelihood _family_. – Alexander Engelhardt Jul 09 '12 at 11:21
  • 2
    @AlexxHardt This is not quite correct (as I have mentioned in my first comment). Please, have a look at this document about [AIC Myths](http://warnercnr.colostate.edu/~anderson/PDF_files/AIC%20Myths%20and%20Misunderstandings.pdf), particularly the last paragraph of pp. 2. Other references by Brian Ripley: [1](http://www.stats.ox.ac.uk/~ripley/ModelChoice/Ox08_2.pdf), [2](http://www.stats.ox.ac.uk/~ripley/Nelder80.pdf), [3](http://www.stats.ox.ac.uk/~ripley/ModelChoice.pdf). –  Jul 09 '12 at 13:40
  • Huh. Seems like I misunderstood that for the last year or so. Thanks a lot for clearing that up! I have to think of an alternative to "yo momma so stupid, she comparing non-nested models with the AIC" now. Hm. – Alexander Engelhardt Jul 10 '12 at 08:36
1

I'd go further and say it is probably the most widely accepted method for comparing distributions. But you should really use the corrected AIC, which has a piece added to it to adjust for small sample sizes. See Burnham and Anderson 2002, for example.

This site will take a set of numbers and do the comparisons for you, using the corrected AIC mentioned above. http://www.easydatascience.com/

user95917
  • 11
  • 1