I have a data set with n=199 and p=130. I need to reduce the number of predictors for a regression. I did a Lasso regression but the numer of optimal variables changes dramatically (range from 6 to 41 predictors recommended by Lasso) when I change the seed number input. How could I determine the optimal set of variables to include in the model?
These are my different number of variables that i got.
variables
[1] 24 21 29 24 21 21 24 33 24 33 24 39 21 21 24 21 29 21 29 29 24 29 29 29 24 18 29 24 21 39 39 21 21 29 33 39 24 41
[39] 24 13 29 13 29 29 21 24 24 39 24 33 24 29 33 45 21 21 21 24 29 33 29 29 18 21 21 24 21 21 33 24 33 21 21 21 21 29
[77] 41 24 39 21 21 24 21 39 29 21 39 21 41 21 33 21 33 33 21 33 21 24 21 29 21 21 21 29 24 21 21 21 29 29 24 21 24 21
[115] 21 21 21 33 18 21 39 29 29 21 42 21 13 24 33 21 39 24 29 33 21 41 29 24 42 33 41 21 21 21 13 24 21 24 24 39 24 41
[153] 50 33 21 24 24 18 29 24 39 21 21 24 33 42 21 21 24 29 24 21 24 24 24 6
My target is a dummy variable and my code is this
variables=rep(0) # vector with the numbers of variables with differente seed number
num.var<-0 #number of variable for any best lambda
iter<-0 #number of iteration
seed_input=500 # first seed number
while(iter <177){
seed_input=seed_input+floor(rnorm(1,10,2)) # change the seed number in diferent iteration
set.seed(seed_input) #use this seed number
modelo<-glmnet(independent,dependent, family = "binomial", alpha=1) #fitting lasso regression
cv.modelo <- cv.glmnet(independent, dependent, alpha=1)
best.lambda <- cv.modelo$lambda.min #saving best lambda #saving best lambda
# Creating a vector with number of variables
n=dim(a)[1]
vars=rep(0)
j=1
for(i in 2:n){
if(a[i,1]!=0){vars[j]=i
j=j+1}
}
num.var[iter]<-length(vars) #save the numbers of variables in a vector
iter=iter+1
}
I hope that somebody can help me. Thanks.