Mixed effects Lasso model setup in R, for high dimensional data

Question

My goal is to model the relationship between RETURN and SCORE from my survey dataset with the following structure:

RETURN (numeric continuous) = company share price performance
SCORE (numeric continuous) = company score collected via survey
PARTICIPATION (binary) = 1 if participated / 0 score was estimated
SIZE (numeric continuous) = company size
COUNTRY (categorical factor 40 levels) = country of company
INDUSTRY (categorical factor 20 levels) = industry of company
COMPANY_ID (categorical factor 400 levels) = company identifier
YEAR (categorical factor 10 levels) = year of survey

By design, the survey score are biased (=higher) according to both PARTICIPATION (=1) and SIZE (=higher).

Both RETURN and SCORE are influenced according to the categories COUNTRY, INDUSTRY, COMPANY_ID (repeat surveys per year), YEAR (scoring methodology is adapted per year).

Not all companies are surveyed every year, so the total number of observations is ~2500.

To model the relationship between RETURN and SCORE I therefore need to control for the effects of the other independent variables. Due to dimensional limits. I'd like to use a regularized regression approach e.g. LASSO. Building up the model setup to include the variables... I started with a multiple regression:

mod1=lm(data$RETURN~data$SCORE+data$SIZE)

Then added dummy variables for PARTICIPATION, COUNTRY, INDUSTRY and YEAR using LASSO from the glmnet package:

mod2=glmnet(x,y,alpha=1)

With x having dimensions (2500x70). I can then use cross validation to obtain the value of lambda for the minimum mse:

cvmod=cv.glmnet(x,y,alpha=1)
cvoptm=cvmod$lambda.min
lcoef=as.matrix(coef(mod2,s=cvoptm))

How can I include the variable COMPANY_ID into the model? Its surely not feasible to add as a dummy variable? Could I include it as a random effect using the glmmLASSO package? Further, shouldn't both COUNTRY and INDUSTRY also be considered as random effects in that case?

its definitely feasible and standard to use large cardinality dummy variables with glmnet. You just have to provide as a sparse matrix.. I am not familiar glmmLASSO, but i would have thought it would also set up as a dummy variable ( behind the scenes) — seanv507, Apr 04 '20 at 17:10

Mixed effects Lasso model setup in R, for high dimensional data

0 Answers0