Short answer
Overdispersion does not matter when estimating a vector of regression coefficients for the conditional mean in a quasi/poisson model! You will be fine if you forget about the overdispersion here, use glmnet with the poisson family and just focus on whether your cross-validated prediction error is low.
The Qualification follows below.
Poisson, Quasi-Poisson and estimating functions:
I say the above because overdispersion (OD) in a poisson or quasi-poisson model influences anything to do with the dispersion (or variance or scale or heterogeneity or spread or whatever you want to call it) and as such has an effect on the standard errors and confidence intervals but leaves the estimates for the conditional mean of $y$ (called $\mu$) untouched. This particularly applies to linear decompositions of the mean, like $x^\top\beta$.
This comes from the fact that the estimating equations for the coefficients of the conditional mean are practically the same for both poisson and quasi-poisson models. Quasi-poisson specifies the variance function in terms of the mean and an additional parameter (say $\theta$) as $Var(y)=\theta\mu$ (with for Poisson $\theta$=1), but the $\theta$ does not turn out to be relevant when optimizing the estimating equation. Thus the $\theta$ plays no role in estimating the $\beta$ when conditional mean and variance are proportional. Therefore the point estimates $\hat{\beta}$ are identical for the quasi- and poisson models!
Let me illustrate with an example (notice that one needs to scroll to see the whole code and output) :
> library(MASS)
> data(quine)
> modp <- glm(Days~Age+Sex+Eth+Lrn, data=quine, family="poisson")
> modqp <- glm(Days~Age+Sex+Eth+Lrn, data=quine, family="quasipoisson")
> summary(modp)
Call:
glm(formula = Days ~ Age + Sex + Eth + Lrn, family = "poisson",
data = quine)
Deviance Residuals:
Min 1Q Median 3Q Max
-6.808 -3.065 -1.119 1.819 9.909
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.71538 0.06468 41.980 < 2e-16 ***
AgeF1 -0.33390 0.07009 -4.764 1.90e-06 ***
AgeF2 0.25783 0.06242 4.131 3.62e-05 ***
AgeF3 0.42769 0.06769 6.319 2.64e-10 ***
SexM 0.16160 0.04253 3.799 0.000145 ***
EthN -0.53360 0.04188 -12.740 < 2e-16 ***
LrnSL 0.34894 0.05204 6.705 2.02e-11 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 2073.5 on 145 degrees of freedom
Residual deviance: 1696.7 on 139 degrees of freedom
AIC: 2299.2
Number of Fisher Scoring iterations: 5
> summary(modqp)
Call:
glm(formula = Days ~ Age + Sex + Eth + Lrn, family = "quasipoisson",
data = quine)
Deviance Residuals:
Min 1Q Median 3Q Max
-6.808 -3.065 -1.119 1.819 9.909
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.7154 0.2347 11.569 < 2e-16 ***
AgeF1 -0.3339 0.2543 -1.313 0.191413
AgeF2 0.2578 0.2265 1.138 0.256938
AgeF3 0.4277 0.2456 1.741 0.083831 .
SexM 0.1616 0.1543 1.047 0.296914
EthN -0.5336 0.1520 -3.511 0.000602 ***
LrnSL 0.3489 0.1888 1.848 0.066760 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for quasipoisson family taken to be 13.16691)
Null deviance: 2073.5 on 145 degrees of freedom
Residual deviance: 1696.7 on 139 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5
As you can see even though we have strong overdispersion of 12.21 in this data set (by deviance(modp)/modp$df.residual
) the regression coefficients (point estimates) do not change at all. But notice how the standard errors change.
The question of the effect of overdispersion in penalized poisson models
Penalized models are mostly used for prediction and variable selection and not (yet) for inference. So people who use these models are interested in the regression parameters for the conditional mean, just shrunk towards zero. If the penalization is the same, the estimating equations for the conditional means derived from the penalized (quasi-)likelihood does also not depend on $\theta$ and therefore overdispersion does not matter for the estimates of $\beta$ in a model of the type:
$g(\mu)=x^\top\beta + f(\beta)$
as $\beta$ is estimated the same way for any variance function of the form $\theta \mu$, so again for all models where conditional mean and variance are proportional. This is just like in unpenalized poisson/quasipoisson models.
If you don't want to take this at face value and avoid the math, you can find empirical support in the fact that in glmnet
, if you set the regularization parameter towards 0 (and thus $f(\beta)=0$) you end up pretty much where the poisson and quasipoisson models land (see the last column below where lambda is 0.005).
> library(glmnet)
> y <- quine[,5]
> x <- model.matrix(~Age+Sex+Eth+Lrn,quine)
> modl <- glmnet(y=y,x=x, lambda=c(0.05,0.02,0.01,0.005), family="poisson")
> coefficients(modl)
8 x 4 sparse Matrix of class "dgCMatrix"
s0 s1 s2 s3
(Intercept) 2.7320435 2.7221245 2.7188884 2.7172098
(Intercept) . . . .
AgeF1 -0.3325689 -0.3335226 -0.3339580 -0.3340520
AgeF2 0.2496120 0.2544253 0.2559408 0.2567880
AgeF3 0.4079635 0.4197509 0.4236024 0.4255759
SexM 0.1530040 0.1581563 0.1598595 0.1607162
EthN -0.5275619 -0.5311830 -0.5323936 -0.5329969
LrnSL 0.3336885 0.3428815 0.3459650 0.3474745
So what does OD do to penalized regression models? As you may know, there is still some debate about the proper way to calculate standard errors for penalized models (see e.g., here ) and glmnet
is not outputting any anyway, probably for that reason. It may very well be that the OD would influence the inference part of the model, just as it does in the non-penalized case but unless some consensus regarding inference in this case is reached, we won't know.
As an aside, one can leave all this messiness behind if one is willing to adopt a Bayesian view where penalized models are just standard models with a specific prior.