2

I am fitting a few time series using fitdistr in R. To see how different distributions fit the data, I compare the log likelihood from the fitdistr function. Also, I am fitting both the original data, and the standardized data (ie. (x-mean)/sd).

What I am confused about is that, the original and standardized data generate log likelihood of different signs.

For example,

original:

           loglik           m          s          df
t        1890.340 0.007371982 0.05494671 2.697321186
cauchy   1758.588 0.006721215 0.04089592 0.006721215
logistic 1787.952 0.007758433 0.04641496 0.007758433

standardized:

            loglik           m         s          df
t        -2108.163 -0.02705098 0.5469259  2.69758567
cauchy   -2239.915 -0.03361670 0.4069660 -0.03361670
logistic -2210.552 -0.02328445 0.4619152 -0.02328445

How can I interprete this? Is larger loglik better or smaller better?

Thank you!

Rachel
  • 33
  • 6
  • 1
    Your question is really the same as http://stats.stackexchange.com/questions/4220 (asking about values of a PDF) because the likelihood you are using is a product of PDFs. When you standardize them you change the unit of measure along the base of the graph, and so the height (as given by the PDF) has to compensate with an inverse change. Have you noticed that the *differences* in the log likelihoods are nearly the same in both tables? (The differences should be identical, but considerable floating point roundoff error is involved in the calculations.) – whuber Jun 17 '14 at 19:04

1 Answers1

2

In general, you can only compare compare logliklihoods and penalized logliklihoods (information criteria) across models when using the exact same data. By standardizing the data, you have changed their values. You can compare the t, cauchy, and loglogistic in each set, but not across sets.

The value of the logliklihood is a function of the data. If you are in a part of the distribution where the value of the density (pdf) is greater than 1 you will have positive likelihoods, otherwise they will be negative. When comparing models, you want one with the largest likelihood, or, more commonly, smallest negative logliklihood.

Avraham
  • 3,182
  • 21
  • 40
  • "When comparing models, you want one with the largest likelihood, or, more commonly, smallest negative logliklihood." I may not be specifying it clearly in the question, but this is the part I want to ask about. In each case, which one is a better fit? Is a general rule to look at the larger of absolute value of log likelihood? – Rachel Jun 17 '14 at 20:01
  • No, you want the larger likelihood; negative numbers are OK. A loglikelihood of -2 is better than -4. However, all information criteria, and many optimization routines, actually work by minimizing the negative loglikihood (NLL) as opposed to maximizing the logliklihood (LL) (the two are the same, of course). In that case, you look for the smallest number, which includes being the most negative. So in your case, the Cauchy is the best both times, but I'd recommend using an information criterion and not pure LL, since just adding parameters will lower the LL but increase parameter uncertainty. – Avraham Jun 17 '14 at 20:05
  • 2
    Since the Cauchy is a specific t-distribution, it would be strange to select it as best. I suspect the `logLik` outputs are what they say they are: the log likelihood itself, which we wish to be as large as possible. That would select the t over the Cauchy or logistic (even when penalizing for variation in the number of parameters). (I am mystified by the appearance of a `df` parameter for the Cauchy or logistic distributions, though.) It is quite revealing that the optimal values of `m` and `s` for the standardized data aren't near their actual values of $0$ and $1$, respectively. – whuber Jun 17 '14 at 20:54
  • D@rn it, you're right, I'm so hard-wired to NLL. The best option is the t. Regardless, IC should be used and not just LL. Also, as an aside, I've stopped using `fitdistr` in R as it crashes often even on simple gamma fittings. I have my own custom fitting routines for loss distributions. – Avraham Jun 17 '14 at 22:20
  • Thanks, this is helpful. @Avraham, what's the idea of your way to fit districutions? – Rachel Jun 17 '14 at 22:48
  • 1
    I write a function to return the negative logliklihood and use gradient-free optimizers such as `NLOPT_LN_SBPLX` in `nloptr` or `Nelder-Mead` in `optim`. Often, I'll also return the gradient of the NLL so I can use a gradient version such as LBFGS. As you're minimizing the NLL, you can return its value on convergence, and plug that in to AICc or the like. – Avraham Jun 17 '14 at 22:59