In R, choosing lambda.1se
over lambda.min
to get a more parsimonious model is common. This post (and this) also indicated that the authors of the glmnet
package suggested using lambda.1se
and that if I don't supply an s
in coef
and predict
, the default is basically s = "lambda.1se"
. However, choosing a big $\lambda$ for ridge regression can be disadvantageous. I'd like to know why someone would still want to go with lambda.1se
in ridge regression.
I can only think of a scenario where two variables are very highly correlated that a regression coefficient flips sign. A choice like lambda.1se
may help regularize the coefficients to a point that helps get the desired signs. For example, in the following instance z_med_TNFa
and z_med_MIP_1b
are highly correlated (z_med_IL_1b
and z_med_IL_7
are highly correlated too). Here, lambda.min=4.472952
and lambda.1se=28.75246
. Originally, z_med_TNFa
was negatively associated with the outcome and you can see how it's coefficient flips sign in the presence of z_med_MIP_1b
, unless I use a high value for $\lambda$ (lambda.1se
).
> coef(ridge.mod.bestlam.6m.4.1, s = 4.472952)
12 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 16.79590822
Age 0.02501949
Gender -1.35264991
Years_of_education 1.26086448
GCS_Bestin24n 0.66208337
z_med_IL_1b -0.37845407
z_med_IL_7 -0.36292357
z_med_TNFa 0.16241947
z_med_sIL_4R 1.42248787
z_med_sIL_6R -1.52555050
z_med_MIP_1b -0.77692434
z_med_RANTES -0.38803093
> coef(ridge.mod.bestlam.6m.4.1, s = 28.75246)
12 x 1 sparse Matrix of class "dgCMatrix"
1
(Intercept) 30.7364011
Age 0.0146922
Gender -0.3717480
Years_of_education 0.4899730
GCS_Bestin24n 0.2759618
z_med_IL_1b -0.2229485
z_med_IL_7 -0.2539682
z_med_TNFa -0.1636407
z_med_sIL_4R 0.5352597
z_med_sIL_6R -0.5709282
z_med_MIP_1b -0.3299396
z_med_RANTES -0.2721371
But there is a high regularization with lambda.1se
and all coefficients are shrunk to a great extent. I'd like to make a $\beta$-weighted index using all the z_*
variables. Do you think such an index will be usable, since the coefficients are highly regularized? Is it better to leave either z_med_TNFa
or z_med_MIP_1b
out of the ridge regression and use lambda.min
instead (which solves the sign-flip problem)? However, I'm not sure why a higher correlation between z_med_IL_1b
and z_med_IL_7
(than the correlation between z_med_TNFa
and z_med_MIP_1b
) doesn't cause any sign-flip problem here!
My questions:
- Should I choose
lambda.1se
overlambda.min
in a ridge regression? - Do you think in the example above, high regularization could be a problem? Biologically, some of these z scores can be highly correlated but they may explain very different functions. So, I'd like to avoid dropping any of these (or choosing one of two important variables randomly) if there is an option.