5

If I understand correctly, a quasi Poisson regression assumes roughly that $$ \mbox{E}\left[y\left|x\right.\right] = \exp{\left(x^{\top}\beta\right)}, \quad \mbox{VAR}\left(y\left|x\right.\right) = \sigma^2 \exp{\left(x^{\top}\beta\right)}, $$ and estimates both $\beta$ and $\sigma^2$. (Poisson regression further assumes $\sigma=1$, rather than estimating it.)

I would like to test the variance assumption. That is, assuming $$ \mbox{E}\left[y\left|x\right.\right] = \exp{\left(x^{\top}\beta\right)}, \quad \mbox{VAR}\left(y\left|x\right.\right) = f\left(x\right) \exp{\left(x^{\top}\beta\right)}, $$ I would like to test the null hypothesis $$ H_0: f\left(x\right) = c,\,\,\mbox{for some}\,c. $$

The most widely used test, as given by Cameron & Trivedi (and implemented in AER::dispersiontest), seems to assume $$ \mbox{E}\left[y\left|x\right.\right] = \mu = \exp{\left(x^{\top}\beta\right)}, \quad \mbox{VAR}\left(y\left|x\right.\right) = \mu + \alpha g\left(\mu\right), $$ for some specified function $g\left(\cdot\right)$ (typically a linear or quadratic function), and tests the null hypothesis $$ H_0: \alpha = 0. $$ Is it possible to adapt this test for my purposes? Concretely can I, through some perverted usage of dispersiontest, somehow assume $$ \mbox{VAR}\left(y\left|x\right.\right) = \alpha_1 \mu + \alpha_2 \mu^2, $$ and test $H_0: \alpha_2 = 0$ regardless of $\alpha_1$?

(I have tried to use dispersiontest on a glm object fit with quasipoisson family, and get an error claiming that "only Poisson GLMs can be tested". So some extra fiddling will be required.)

steveo'america
  • 503
  • 2
  • 11
  • Not seeing any obvious answers from [this question](https://stats.stackexchange.com/q/66586/143028). – steveo'america Jan 18 '19 at 19:54
  • Not seeing how to express this as a conditional moment test either. – steveo'america Jan 18 '19 at 22:40
  • Your question is somewhat confusing to me. "Overdispersion" is a term for the situation where the variance exceeds the mean (since for the ordinary Poisson, the two are equal). .If you start with a quasi-Poisson model with the dispersion parameter free, your distribution under the null will be overdispersed any time the data are. What are you really trying to find out? – Glen_b Jan 19 '19 at 00:03
  • I take 'overdispersion' to mean "more variance than suggested by the model" (or, in fancy talk, "unobserved heterogeneity"). In a Quasi-Poisson regression, one assumes that variance is proportional to the mean, rather than merely equal to it. You could arrive at such a situation by taking conditional Poisson dependent variables (_e.g._ raw counts) and changing their units (_e.g._ five dollars per count.). My question is whether my data have even _more_ variance than a simple redenomination captures. – steveo'america Jan 22 '19 at 17:35
  • If it's the same scaling of variance to each point then clearly not, any amount of it would be captured by the parameter. Are you instead asking about some heterogeneity of variance not captured by the model? – Glen_b Jan 22 '19 at 21:35
  • I am not really sure what is being asked. However, here are two cases. Suppose that, conditional on $x_i$, $y_i$ is 5 times a Poisson random variable with intensity $e^{x_i^{\top}\beta}$. Then the population is Quasi Poisson. (Change the 5 to a 1, and it is Poisson.) The second case is, conditional on $x_i$, $y_i$ is log-normal with $\mu = \log(5) + \frac{1}{2}x_i^{\top}\beta$ and $\sigma^2 = x_i^{\top}\beta$. Now conditional on $x_i$, $y_i$ has the same expectation in both cases (unless I messed up), but the second case cannot be made Quasi Poisson. – steveo'america Jan 23 '19 at 00:13
  • Well, clearly a lognormal cannot be made Poisson, but ignoring that aspect, your lognormal model specifies a very different variance function than one proportional to the mean. That's not overdispersion (if you fit a quasi-Poisson, many of the true variances will be *smaller* than the quasi Poisson predicts) The form of the variance function is misspecified, rather than its raw size. Your example is pretty much exactly the kind of thing I was talking about (if I was to give an example previously, I'd have specified the simpler case where $σ^2$ is constant, which would have var $\propto$ mean) – Glen_b Jan 23 '19 at 00:18
  • ah, yes, well that is correct: fitting the $\sigma^2$ means some observations will have _smaller_ than the modeled variance under my model, and so this is not 'overdispersion' _per se_, but more like 'overdispersion adjacent'. Nonetheless, I doubt there is anyone who knows how to answer the question but was so offended by the tag they passed. – steveo'america Jan 23 '19 at 19:25

1 Answers1

1

The testDispersion in the DHARMa package (disclaimer: I am the developer) allows you to test the dispersion of nearly any GLMM distribution. There are a number of other functions and tests to check for related problems, e.g. heteroscedasticity or other dispersion issues. H0 is always the fitted model, and DHARMa proceeds by simulating new data from the fitted model and comparing them to the observed data.

As Glen_b noted, because the dispersion parameter itself is already fit by the quasi-model, it can't be overdispersed by definition, but it would be conceivable that you have problems such as heteroscedasticity with a quasi-poisson. The issue with the quasi families is that they are not fully defined distributions. They fit a relationship between mean and dispersion, but it is left open how the data-generating model actually looks like. Quasi-distribution are therefore not supported by DHARMa.

How to proceed depends a bit on whether your question is motivated by a theoretical interest, or a practical need.

  1. From a practical side: just don't fit quasi distributions, fit negative binomial, GLMMs with observation-level REs, or one of the many other options. These models can be checked with DHARMa, and if you find a problem, you can adress it more easily (e.g. with glmmTMB which allows you to express the dispersion again as a function of the predictors).

  2. From a theoretical side: I guess in principle, you could try to create a data-generating function with appropriate properties for your quasi distribution, sample, and then check with DHARMa. Or try to think hard to develop a parametric test for heteroscedasticity in quasi-distributions. However, I really don't know why you would like to do this, because I see no practical need for this, because of 1.

Florian Hartig
  • 6,499
  • 22
  • 36
  • +1 for `DHARMa`, which we had been using anyway. I think I am finally accepting the consequences of 'quasi' in 'quasi Poisson'. For our problem we were not observing counts directly, rather something like `round(0.01 * counts)`, but the 0.01 was slightly variable and unknown, and the expected counts were very large. As a consequence, a rescaled Poisson would have been a good model, but our data were essentially continuous. In the end we gave up and pursued another model. – steveo'america Feb 27 '19 at 22:22