In LASSO, does it make sense to choose lambda based on the mean error associated with different lambda values, over multiple cross-validations?

Question

I am running a LASSO regression, but am put off by the different values of lambda each time I run the cross-validation. Does it make sense to run a cross validation multiple times, take the mean error associated with each lambda value, and then choose lambda.1se (i.e., the largest lambda resulting in error within 1 se of the minimum) from the distribution of lambdas and mean errors?

library(glmnet)
set.seed(3)

IV1 <- data.frame(IV1 = rnorm(100))
IV2 <- data.frame(IV2 = rnorm(100))
IV3 <- data.frame(IV3 = rnorm(100))
IV4 <- data.frame(IV4 = rnorm(100))
IV5 <- data.frame(IV5 = rnorm(100))
DV <- data.frame(DV = rnorm(100))

data <- data.frame(IV1,IV2,IV3,IV4,IV5,DV)
x <- model.matrix(DV~.-IV5 , data)[ , -1]
y <- data$DV


lambdas <- NULL
n.fits <- 100
for (i in 1:n.fits) {
    {
  fit <- cv.glmnet(x,y)
  errors = data.frame(fit$lambda,fit$cvm)
  lambdas <- rbind(lambdas,errors)
  r2[i]<-max(1-fit$cvm/var(y))
}
# take mean cvm for each lambda
lambdas <- aggregate(lambdas[, 2], list(lambdas$fit.lambda), mean)
lambdas<-as.data.frame(lambdas)

# find subset with mse within 1 se of mean
onese<-std.error(lambdas[2])
min<-min(lambdas[2])
low<-min-onese
high<-min+onese

lambdas<-subset(lambdas, x>low)
lambdas<-subset(lambdas, x<high)

#choose highest lambda among those

bestindex = which(lambdas[1]==max(lambdas[1]))
bestlambda = lambdas[bestindex,1]

Edit: My question is unique because I am asking about how to implement the one standard error rule after iterating the cross validation.

Edit 2: It's possible my question isn't clear. I am being linked to a previous question which is different. That one talks about running cross validation once and choosing the best lambda from that process. I would like to run cross validation multiple times, averaging over the different results, and choosing the best lambda based off of that. I would like to know if my code is an acceptable way of doing that.

The help page for `cv.glmnet` includes the passage "Note also that the results of cv.glmnet are random, since the folds are selected at random. Users can reduce this randomness by running cv.glmnet many times, and averaging the error curves." The help as well as the vignette (https://cran.r-project.org/web/packages/glmnet/vignettes/glmnet_beta.pdf) are well worth reading before proceeding. — whuber, Jun 12 '18 at 19:02
Thanks for linking this. The passage is indeed what led to me trying the method in my code. As far as I can tell, the vignette doesn't mention how to perform this iteration, however. — Dave, Jun 12 '18 at 21:10

In LASSO, does it make sense to choose lambda based on the mean error associated with different lambda values, over multiple cross-validations?

0 Answers0