I am having difficulties specifying the appropriate structure for nested/random effects in a mixed model that I am trying to pass through the 'Lasso' shrinkage algorithm. I am using the package glmmLasso
.
My data consists of disease incidence data over 10 years within 11 distinct districts of a state in India. I would like to specify District
as a random factor and Year
nested within each district. The fixed effects are an array of environmental predictors measured over the course of the 10 year study-period. I would like to use the Lasso algorithm to reveal the strongest fixed-effect predictor variables.
An example could be:
lm1 = glmmLasso(log(disease)~var1 + ... + varn, rnd=?, data=district_disease)
In package nlme
, I used random = District|Year
. From my current state of knowledge, I believe that this model is specified appropriately. This is a linear mixed model, as the log of the response variable sufficiently approximates the normal distribution.
My two goals are to correctly specify the random structure in this model and to reduce the number of fixed effects to a set of non-collinear, meaningful variables.