9

I am looking for an appropriate measure of the "explained variance" of a Poisson GLM (using a log-link function).

I have found a number of different resources (both on this site and elsewhere) that discuss a number of different pseudo-$R^2$ measures, but nearly every site mentions the measures in relation to a logit-link function, and they don't discuss whether the pseudo-$R^2$ measures are appropriate for other link functions, such as log-link for my Poission distribution GLM.

For example, here are a few of the sites I've found:

Which pseudo-$R^2$ measure is the one to report for logistic regression (Cox & Snell or Nagelkerke)?

http://thestatsgeek.com/2014/02/08/r-squared-in-logistic-regression/

http://www.ats.ucla.edu/stat/mult_pkg/faq/general/Psuedo_RSquareds.htm

My question is: Are any of the methods discussed at those links (in particular, the FAQ on the UCLA page) appropriate for a Poission GLM (using a log-link function)? Is any particular method more appropriate and/or standardly used than any other method?

Some background:

This is for a research paper in which I am using a Poission GLM to analyze neural data. I am using the deviances of the models (calculated assuming a Poission distribution) to compare two models: One model (A) which includes 5 parameters that were left out of the other model (B). My interest (and the focus of the paper) is to show that that 5 parameters statistically improve the model fit. However, one of the reviewers would like an indication of how well both models fit the data.

If I were using OLS to fit my data the reviewer is effectively asking for the $R^2$ value for both the model with the 5 parameters and w/o the 5 parameters, to indicate how well either model explains the variance. It seems like a reasonable request to me. Lets say that, hypothetically, model B has an $R^2$ of 0.05 and model A has an $R^2$ of 0.25: even though that may be a statistically significant improvement, neither model does a good job of explaining the data. Alternatively, if model B has an $R^2$ of 0.5 and model A has an $R^2$ of 0.7, that could be interpreted in a very different way. I'm looking for the most appropriate measure that can be applied in a similar way to my GLM.

  • Why wouldn't a BIC work or a test of the difference in the log-likelihoods, particularly since one is a nested version of the other? – Mike Hunter Nov 06 '15 at 13:13
  • This is a bit late for my purposes (the paper was published online this past Wednesday), but for the record: I am using the difference in the log-likelihoods as the primary measure, but a reviewer wanted a measure of "explained variance", so in the interest of appeasing the reviewers, I tried to come up with something. What I ended up with was something like what nukimov suggested below. – Benjamin Kraus Nov 07 '15 at 02:19

1 Answers1

1

McCullagh and Nelder 1989 (page 34) give for the deviance function $D$ for the Poisson distribution:

$$ D = 2 \sum\left(y \log\left(\frac{y}{\mu} \right) + (y-\mu)\right) $$

where y represents your data and $\mu$ your modelled output. I use this function to estimate the explained deviance $ED$ of a GLM with Poisson distribution like this:

$$ ED = 1 - \frac{D}{\text{total deviance}} $$

where total deviance is given by the same equation for $D$ but using the mean of $y$ (a single number, i.e., $\mathrm{mean}(y)$) instead of the array of modelled estimates $\mu$.

I do not know if this is 100% correct, it sounds logical for me and seems to work as you would expect an estimate of the explained deviance to work (it gives you 1 if you use $\mu = y$, etc).

nukimov
  • 562
  • 3
  • 13
  • 1
    I used the deviance function as the primary measure for the paper, using exactly the equation you provided above. However, a reviewer wanted a measure of "explained variance", so in the interest of appeasing the reviewers, I tried to come up with something. What I ended up with was: $$ pseudoR^2_M = \frac{ln(\Gamma_M) - ln(\Gamma_{Null})} {ln(\Gamma_{Sat}) - ln(\Gamma_{Null})} $$ $ln(\Gamma_{Sat})$ is the log-likelihood of a saturated model, $ln(\Gamma_{Null})$ is the log-likelihood of the null model, and $ln(\Gamma_{M})$ is the log-likelihood of the model in question. – Benjamin Kraus Nov 07 '15 at 02:47