I am running a LASSO regression, but am put off by the different values of lambda each time I run the cross-validation. Does it make sense to run a cross validation multiple times, take the mean error associated with each lambda value, and then choose lambda.1se (i.e., the largest lambda resulting in error within 1 se of the minimum) from the distribution of lambdas and mean errors?
library(glmnet)
set.seed(3)
IV1 <- data.frame(IV1 = rnorm(100))
IV2 <- data.frame(IV2 = rnorm(100))
IV3 <- data.frame(IV3 = rnorm(100))
IV4 <- data.frame(IV4 = rnorm(100))
IV5 <- data.frame(IV5 = rnorm(100))
DV <- data.frame(DV = rnorm(100))
data <- data.frame(IV1,IV2,IV3,IV4,IV5,DV)
x <- model.matrix(DV~.-IV5 , data)[ , -1]
y <- data$DV
lambdas <- NULL
n.fits <- 100
for (i in 1:n.fits) {
{
fit <- cv.glmnet(x,y)
errors = data.frame(fit$lambda,fit$cvm)
lambdas <- rbind(lambdas,errors)
r2[i]<-max(1-fit$cvm/var(y))
}
# take mean cvm for each lambda
lambdas <- aggregate(lambdas[, 2], list(lambdas$fit.lambda), mean)
lambdas<-as.data.frame(lambdas)
# find subset with mse within 1 se of mean
onese<-std.error(lambdas[2])
min<-min(lambdas[2])
low<-min-onese
high<-min+onese
lambdas<-subset(lambdas, x>low)
lambdas<-subset(lambdas, x<high)
#choose highest lambda among those
bestindex = which(lambdas[1]==max(lambdas[1]))
bestlambda = lambdas[bestindex,1]
Edit: My question is unique because I am asking about how to implement the one standard error rule after iterating the cross validation.
Edit 2: It's possible my question isn't clear. I am being linked to a previous question which is different. That one talks about running cross validation once and choosing the best lambda from that process. I would like to run cross validation multiple times, averaging over the different results, and choosing the best lambda based off of that. I would like to know if my code is an acceptable way of doing that.