6

I have same data and I would like to choose a model for it. To start with I fit an exponential distribution and a gamma distribution. Now I wanted to do a simple likelihood ratio test . However, I am told that to do this properly the two models have to be nested (which they are) and the parameter space of one has to be in the interior of the other, not on the boundary. This second condition doesn't seem to hold unfortunately.

What can I do?

Glen_b
  • 257,508
  • 32
  • 553
  • 939
graffe
  • 1,799
  • 1
  • 22
  • 34

1 Answers1

5

Fortunately, you're mistaken.

The shape parameter for a gamma ($\alpha$, say) has to be $\ge 0$.

http://en.wikipedia.org/wiki/Gamma_distribution

The exponential has $\alpha=1$.

http://en.wikipedia.org/wiki/Gamma_distribution#Others

So the exponential is not at the boundary and you should be able to apply a likelihood ratio test without difficulty.

(I would say, however, that hypothesis tests are not necessarily a good approach to model selection.)

Glen_b
  • 257,508
  • 32
  • 553
  • 939
  • I am very pleased I am mistaken :) Thank you. Maybe "model selection" has a technical meaning that is not exactly what I am looking for. In my case I am trying to find a good model that accurately describes the data I have. Is there a better way to do this? – graffe Aug 31 '14 at 07:35
  • What do you mean by "a good model" and why would a hypothesis test be a good way to achieve it? – Glen_b Aug 31 '14 at 08:11
  • I am in no way a statistician so I may not be using the right words. I want a model for my data so I can, for example, make predictions or look for anomalies. I just mean a good model in the normal non-formal sense of fitting the data reasonably well. I was using the hypothesis test to see if one model was at least better than another. Any suggestions for better ways are gratefully received. – graffe Aug 31 '14 at 08:35
  • 1
    Yes, that's fine. "Fitting the data reasonably well" doesn't really change much with sample size, but our ability to detect even trivial deviations from a specific model (like an exponential model, against a gamma model) does change with sample size. At larger sample sizes, you become practically certain to reject an exponential model against a more general alternative like the gamma (and then reject a gamma against a generalized gamma, and on an on in turn), but this doesn't imply that an exponential model would be perfectly fine for everything you want to use it for. ...(ctd) – Glen_b Aug 31 '14 at 09:02
  • (ctd)... Your data will almost certainly be neither exponential nor gamma (George Box's epigram is relevant); the relevant question is not whether you can *detect* a difference (as sample size grows, this becomes a certainty) - it's *how much that matters* to your inference. That's not a question answered by hypothesis tests; it's better addressed by diagnostic tools like Q-Q plots and simulations. – Glen_b Aug 31 '14 at 09:06
  • Thanks for this. Do you have a link for George Box's epigram? Your point about sample sizes is very interesting but I don't fully understand it. If you use the LRT you get a $\chi^2$ distribution with one degree of freedom I think here. Why would you expect to reject at the 5% level almost all of the time? It would seem you would reject 5% of the time if the data were really exponentially distributed, for example. – graffe Aug 31 '14 at 11:01
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/16826/discussion-between-felix-and-glen-b). – graffe Aug 31 '14 at 11:21
  • Sure, right [here](http://stats.stackexchange.com/a/730/805). The problem is that real data are essentially never *exactly* from any simple distribution. – Glen_b Aug 31 '14 at 15:48