glmnet returning lambda that gives all-zero coefficients as optimal lambda

Question

Before I start, I have already looked at the answers for related questions:

It seems from answers to these questions that when glmnet returns a lambda value where lambda_1se=lambda_min and where all its coefficients are zero, that simply indicates that X is just not predictive of Y?

However I find this very strange in my case because all the Xs (three variables, to be exact) that I am using have been shown in previous literature to differ significantly between the two groups (the Y) that I am comparing, and one of the Xs is a very well-established predictor of Y. Furthermore, when I run these variables as stepwise logistic regression on SPSS, I get very high AUCs for them.

Also, it is only the coefficients for the very first lambda from the run of lambdas that result in all-zero coefficients. From the second lambda on, the number of nonzero coefficients increase and it also seems that the AUCs at those lambda values remain pretty stable. So I am confused as to why glmnet would ignore those other lambda values that give similar performance, and choose the one that gives all-zeros, and which gives an AUC of 0.5 on my test set.

On a further note, the reason I am using glmnet on already well-established predictors of Y is to use the combination of these predictors as a "gold standard" or reference to which I will compare the results of my actual predictors of interest. In other words, I am doing glmnet on two separate sets of predictors and comparing their AUC.

Additional info:
A sample output cvglmnet is as below:

My sample size is 324, with 179 in negative class and 145 in positive class. The number of subjects in the image below is 291 because I am running nested cv and the number of outer loop training data is 291.
$\alpha$=0.5 here (it seems only alpha=0 results in AUC that is not 0.5).

'object' in the left top image is the CVerr returned by cvglmnet.
'object.glmnet_fit' in the right top image contains the parameters calculated at each lambda (for some reason, there are only 71 runs of lambda although I think the default number of runs should be 100).
'object.glmnet_fit.beta' contains the beta values of the three predictor variables at each lambda value.

As you can see, the optimal lambda value chosen is the first lambda, which assigned all predictor variables zero coefficients.

lasso results are theoretically biased towards zero. You can try some of the nonconvex penalty parameters like MCP or SCAD which are less biased (R package [ncvreg](https://cran.r-project.org/web/packages/ncvreg/index.html)) — bdeonovic, Apr 03 '17 at 12:51
Some questions: How large is your dataset? Are the classes imbalanced? Could you post the exact output of glmnet, together with the AUC values you calculated for each lambda? I assume you're using cv.glmnet directly? — AaronDefazio, Apr 04 '17 at 04:03
Hi @AaronDefazio , I have added answers to your questions at the bottom of my question. As you can see, the AUC values for each lambda range from 0.8729~0.9323 ( as shown in object.cvm), however, when I apply the beta parameters at the optimal lambda value to my test dataset, I get all AUC=0.5...which is not surprising seeing that my chosen beta values are all zero. — Michelle, Apr 04 '17 at 07:55
@Michelle I have a suggestion that might help. Could you try running with type.measure='auc' and with nfolds=3? — AaronDefazio, Apr 05 '17 at 01:47
@AaronDefazio I am already running cvglmnet with type='auc'. I have tried making nfolds=3 like you said, but I still get the same results (AUC=0.5). Could you tell me what was the reason for your suggestion? — Michelle, Apr 05 '17 at 06:08
Using fewer folds can sometimes help, but I guess it didn't here. Can I ask what the "cvm" vector looks like? Is it generally increasing or decreasing from left to right? If you could paste in the first few values that might help. — AaronDefazio, Apr 05 '17 at 06:25
@AaronDefazio The cvm vector is generally decreasing, hence the cvm for the first lambda (which keeps giving zero-coefficients) has the maximum auc...which confuses me. Does that mean that just the intercept alone resulted in that maximum auc? Another thing is..glmnet puts the lambda_1se as the same as lambda_min..although I see auc values less than the max auc in the cvm vector — Michelle, Apr 05 '17 at 06:34
@bdeonovic I believe ridge regression is the one biased towards zero (but give nonzero coefficients to all features), thus stabilizing the coefficients, while lasso can give zero coefficients, which effectively results in feature selection? — Michelle, Apr 05 '17 at 06:36
@Michelle The all-zero feature model should have AUC 0.5. I'm confused as you are about this. In regards to lambda_1se, it should have the largest lambda that gives a value within 1se of the optimal, so it makes sense it would match the lambda_min here, since it's also the largest lambda tried. Sorry I can't be of more help. — AaronDefazio, Apr 05 '17 at 06:52
@Michelle both ridge and lasso coefficients will be biased, as they both shrink coefficients down. Lasso has the great property that it can shrink some features all the way down to zero (thereby doing variable selection as you pointed out) but nonetheless both of these methods result in biased coefficient estimates. That's why I suggested checking out some of the other penalty functions other than L1 and L2 — bdeonovic, Apr 05 '17 at 13:36

glmnet returning lambda that gives all-zero coefficients as optimal lambda

0 Answers0