Elastic net produces complex output with too many non-zero coefficients

Question

I have run 3-fold cross-validation for elastic net using elasticnet R function on ~200 observations and using 80 variables (and there will be some more).

Both Lasso or Ridge tend to select over 40 variables with non-zero coefficients for the final model. I doubt I need more that 10-15, anything else I consider just overfitting. How can I force elastic net to simplify the output and not to include the redundant "tail"?

It's unclear if "it" refers to $\alpha,$ which trades off between LASSO and elastic net, or $\lambda$, which increases the magnitude of both penalties. I'm referring to $\lambda$, which has no upper limit. — Sycorax, Dec 14 '15 at 23:06
OK, I will select $\lambda$ that maximizes `myModel$cvm` on a subspace of `myModel$nzero` (values =< 10), which limits number of variables glmnet "found" at this stage of its path. I shall interpret then my result as a model that does not necessairly provide me with lowest CV error in general (that would be lambda.min), but only among models I want to consider relevant for me, right? — user2530062, Dec 14 '15 at 23:44

Sycorax · Accepted Answer · 2015-12-15T00:57:08.383

2

It's not immediately what exactly you're doing when fitting a model and what you goal is. I'll answer as best I can with the information provided.

GLMNET has two tuning parameters. A sequence of $\lambda$s is generated internally; the user supplies a value of $\alpha$.

The stated question is how to choose a GLMNET model that has 10-15 predictors. The number of nonzero predictors is tracked by the software. So for the supplied value of $\alpha$, just pick the solution corresponding to a $\lambda$ value that provides the desired number of predictors. On the assumption that the supplied value of $\alpha$ is "known," you're done. If you're uncertain about alpha (perhaps due to a desire to also account for collinearity), you'll have to tune over $\alpha$ and compare alternative models according to some appropriate out-of-sample metric in the usual way.

Also of interest may be my answer here. It's worth noting that this answer is highly controversial among several highly-ranked CV contributors, and I'm not certain about how to correctly approach the issue.

edited Dec 15 '15 at 00:57

answered Dec 15 '15 at 00:18

Sycorax

76,417
20
189
313

+1. I don't actually think that *that* answer of yours is particularly controversial; you probably meant your other answer, here: http://stats.stackexchange.com/questions/184029. – amoeba Dec 15 '15 at 00:33
Hah! You're right. I'll fix the link in a moment. – Sycorax Dec 15 '15 at 00:46
This is perfectly understandable. In my last comment under question I said I am looking only at lasso solution whereas in question I was refering to elasticnet in general. I think this is what has brought confusion. I understand that in order to maximize my evaluation metric I can check _all_ 10-predictor elastic net models and by _all_ I mean models with different $\alpha$ to find the best elasticnet solution. A very good read in the link provided, thank you. – user2530062 Dec 15 '15 at 09:36
Hmm, now you edited the link out of the question entirely -- it says "my answer here" but there is no link... – amoeba Dec 15 '15 at 15:52

Elastic net produces complex output with too many non-zero coefficients

1 Answers1