2

I was working through the lab on ridge regression and LASSO in ISLR and I came across a strange behavior in the cv.glmnet function. When I followed the lab as written I got the following

set.seed(1)
train <- sample(1:nrow(x), nrow(x)/2)
test <- (-train)
y.test <- y[test]
set.seed(1)
cv.out <- cv.glmnet(x[train,], y[train], lambda=grid, alpha=0)
plot(cv.out)
bestlam <- cv.out$lambda.min
bestlam
[1] 231.013

For my own benefit I tried it using a different seed (8675309) and got back a different result. Any combination of setting the seeds resulted in different answers. I am assuming this has to do with how the 10-folds are changed with the different seeds, however the different lambda.min can vary so much I am concerned the package might not be stable. Am I missing something?

Nick Stauner
  • 11,558
  • 5
  • 47
  • 105
Fraijo
  • 1,018
  • 1
  • 7
  • 10
  • 1
    Nice choice of different seed :D – Nick Stauner Jan 15 '14 at 04:18
  • Do the models indicated by the respective `lambda.min` for the different runs/seeds match - have similar included variables or coefs? Have you tried the `lambda.1se` option, which is the simplest model within 1 standard error of the best (`lambda.min`), which may be more stable, esp if the MSE is relatively flat around the "best" model. – Gavin Simpson Jan 15 '14 at 05:17
  • The ```lambda.1se``` is not stable, but the coefficients are similar. At least the same variables have the larger coefficients. I think I might close this question, as it might just be a product of the model. – Fraijo Jan 15 '14 at 16:12
  • 1
    I think the point is the same as [here][1] [1]: http://stats.stackexchange.com/questions/97777/variablity-in-cv-glmnet-results/103144#103144 – Alice Jun 12 '14 at 16:07
  • 1
    This question appears to be off-topic because it is specific to the GLMNET package in R. Those are generally off topic now, so I am voting to close my own question. – Fraijo Jul 30 '14 at 19:03
  • 1
    @Fraijo, I'd disagree--I think it's just on the topical side of the line, in that it's a question about the stuff implemented by GLMNET and not the syntax, etc. – Matt Krause Jul 30 '14 at 20:52
  • @Fraijo: it's on-topic; it's about GLMNET implementation, not syntax. If lambda.1se is not stable for given random-seed, then the 'optimal' coefficients returned by `coef(cv.out, [s='lambda.1se'])` will vary. This is a problem. – smci Feb 24 '17 at 00:01

0 Answers0