3

I have a count data set that I intend to regress on. The obvious choice is to use poisson regression. One of the assumption is then that the variance is the same as the mean. So the questions are:

  1. Do we compare the mean and var for observed response? do we compare the mean and var for the estimated response from our regression? does the difference of mean and var follow some theoretical distribution so we can formally test it?

  2. What are the standard "scores" used to measure the performance of a poisson regression?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
ChuckP
  • 743
  • 7
  • 17

2 Answers2

3

The Poisson assumption is conditional on the covariates. We can write the Poisson regression model as $$ \DeclareMathOperator{\E}{\mathbb{E}} X_i \sim \mathcal{P}(\log \lambda_i = x_i^T \beta) $$ where $X_1, \dotsc, X_n$ are independent random variables, and $\E X_i = \lambda_i$.

We do not assume anything about the marginal distribution of the $X_i$'s, so assumptions cannot be tested prior to modeling. You must first formulate your Poisson regression model, estimate it, and then you can investigate the assumptions from the fitted model. See Interpreting residual diagnostic plots for glm models? and links inthere for some more information.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
0
  1. If data can be grouped nicely, then you should compare the mean and the variance of observed outcomes.

  2. The deviance is probably the best measure for this. When computing the deviance goodness of fit test, you compare your model to a model which perfectly predicts every outcome. This is called the saturated model. A saturated model is the most complex model for the data you have. If you present a simpler model, the difference in deviance between the saturated model and simpler model is small enough, you should prefer your simpler model. This is the rationale behind the deviance goodness of fit test.

    If you reject the null hypothesis of this test, this means your model does not explain the data as well as the saturated model.

Demetri Pananos
  • 24,380
  • 1
  • 36
  • 94