Why would one want to choose lambda.1se for ridge regression in glmnet?

Question

In R, choosing lambda.1se over lambda.min to get a more parsimonious model is common. This post (and this) also indicated that the authors of the glmnet package suggested using lambda.1se and that if I don't supply an s in coef and predict, the default is basically s = "lambda.1se". However, choosing a big $\lambda$ for ridge regression can be disadvantageous. I'd like to know why someone would still want to go with lambda.1se in ridge regression.

I can only think of a scenario where two variables are very highly correlated that a regression coefficient flips sign. A choice like lambda.1se may help regularize the coefficients to a point that helps get the desired signs. For example, in the following instance z_med_TNFa and z_med_MIP_1b are highly correlated (z_med_IL_1b and z_med_IL_7 are highly correlated too). Here, lambda.min=4.472952 and lambda.1se=28.75246. Originally, z_med_TNFa was negatively associated with the outcome and you can see how it's coefficient flips sign in the presence of z_med_MIP_1b, unless I use a high value for $\lambda$ (lambda.1se).

> coef(ridge.mod.bestlam.6m.4.1, s = 4.472952)
12 x 1 sparse Matrix of class "dgCMatrix"
                             1
(Intercept)        16.79590822
Age                 0.02501949
Gender             -1.35264991
Years_of_education  1.26086448
GCS_Bestin24n       0.66208337
z_med_IL_1b        -0.37845407
z_med_IL_7         -0.36292357
z_med_TNFa          0.16241947
z_med_sIL_4R        1.42248787
z_med_sIL_6R       -1.52555050
z_med_MIP_1b       -0.77692434
z_med_RANTES       -0.38803093

> coef(ridge.mod.bestlam.6m.4.1, s = 28.75246)
12 x 1 sparse Matrix of class "dgCMatrix"
                            1
(Intercept)        30.7364011
Age                 0.0146922
Gender             -0.3717480
Years_of_education  0.4899730
GCS_Bestin24n       0.2759618
z_med_IL_1b        -0.2229485
z_med_IL_7         -0.2539682
z_med_TNFa         -0.1636407
z_med_sIL_4R        0.5352597
z_med_sIL_6R       -0.5709282
z_med_MIP_1b       -0.3299396
z_med_RANTES       -0.2721371

But there is a high regularization with lambda.1se and all coefficients are shrunk to a great extent. I'd like to make a $\beta$-weighted index using all the z_* variables. Do you think such an index will be usable, since the coefficients are highly regularized? Is it better to leave either z_med_TNFa or z_med_MIP_1b out of the ridge regression and use lambda.min instead (which solves the sign-flip problem)? However, I'm not sure why a higher correlation between z_med_IL_1b and z_med_IL_7 (than the correlation between z_med_TNFa and z_med_MIP_1b) doesn't cause any sign-flip problem here!

My questions:

Should I choose lambda.1se over lambda.min in a ridge regression?
Do you think in the example above, high regularization could be a problem? Biologically, some of these z scores can be highly correlated but they may explain very different functions. So, I'd like to avoid dropping any of these (or choosing one of two important variables randomly) if there is an option.

I think you have answered your own question as to to the 1SE choice. I would note that using elastic net (i.e. $\alpha$ between $[0,1]$) would help getting a more stable solution in terms of $\beta$ coefficients' selection. — usεr11852, Feb 23 '20 at 08:35
Biologically, some of these z scores can be highly correlated but they may explain very different functions. LASSO has a tendency to pick one of two very highly correlated variables. I assume the elastic net also has the same property. I want to keep all the variables whose effects are important and not drop it just because it is highly correlated with any other variable. If they explained the same function, I'd consider dropping one of them. So, in this scenario do you think an elastic net would be appropriate? — Blain Waan, Feb 23 '20 at 19:15
It would be more appropriate than LASSO yes, as it will effectively assume that the covariance matrix between the variables was "a bit more diagonal"/uncorrelated (as ultimately the ridge penalty comes in the form $X^TX +\lambda I$). That being said, elastic net is not a panacea. If two variables are highly correlated and they also tend to influence the outcome in a similar way despite having different pathways/biological functions we cannot distinguish them magically through regression. (Off the top of my head: We could try something like structural causal model.) — usεr11852, Feb 23 '20 at 20:42

Why would one want to choose lambda.1se for ridge regression in glmnet?

0 Answers0