I would like to carry out inference on a binomial LASSO model, but take into account the fact that my data are overdispersed and use the quasibinomial family instead.
R package selectiveInference
, which does inference for LASSO models, only seems to support the binomial family though and not quasibinomial.
To get around this, I was wondering if it would be correct to adjust the z scores and p values returned by fixedLassoInf
called using family="binomial"
for overdispersion by dividing the z scores by the square root of the estimated dispersion coefficient of a quasibinomial GLM with the selected variables included? (or perhaps all variables included??)
Any thoughts if this would be a correct procedure? If it is, I was also wondering then how I should recalculate/adjust the returned confidence intervals? Any thoughts?
[BTW, package hdi
, which has a similar aim, also doesn't support quasibinomial
, and I also couldn't readily see how that package could be interfaced with package glmmLasso
- if that would be possible then overdispersion could perhaps be taken into account using an observation-level random effect; if anyone would know how to do this then let me know too]
The output I had for my data right now was
fixedLassoInf(x, y, beta, lambda, family = "binomial",
intercept=TRUE, alpha=0.1, type="partial")
# Var Coef Z-score P-value LowConfPt UpConfPt LowTailArea UpTailArea
# 2 2.596 10.710 0 2.194 2.995 0.048 0.050
# 3 1.224 16.400 0 1.101 1.348 0.049 0.050
# 5 2.608 17.219 0 2.356 2.857 0.049 0.050
# 7 0.776 10.588 0 0.655 0.897 0.048 0.050
# 8 -1.857 -5.103 0 1.229 2.462 0.050 0.048