I am doing lasso regression to understand the influential variables from a lists of 65 odd variables that affect the liquor consumptions of an individual.
The independent variables are combination of categorical and numeric variable like State, Education, Sex, Age, income....
Glmnet package has been used and lambda is decided based on cross validation
fit = glmnet(x, y, alpha = 1,lambda= 0.072,thresh = 1e-12)
The lasso has given list of 25 variables with non zero coefficient and rest all 0.
The Beta values are as below
fit$beta
State -0.350
Education -0.254
Age 0.175
Sex .
... ....
Education is a categorical variables with 5 levels - No school, High school, Graduate, Masters, Doctorate. Unlike linear regression which would give 4 beta estimates for each unique level and one will be used as reference in Lasso it gives only one Beta for Education. I am not able to interpret these beta for categorical variable(factor variable).
- How to interpret those lasso coefficients and the signs
- For numeric variable like Age is it to be interpreted same as in linear regression
I got some clue here Categorical variables in LASSO regression but not sure how to relate that with the beta that I got here.