In lasso or ridge regression, one has to specify a shrinkage parameter, often called by $\lambda$ or $\alpha$. This value is often chosen via cross validation by checking a bunch of different values on training data and seeing which yields the best e.g. $R^2$ on test data. What is the range of values one should check? Is it $(0,1)$?
-
2Possible duplicate of [Choosing the range and grid density for regularization parameter in LASSO](http://stats.stackexchange.com/questions/174897/choosing-the-range-and-grid-density-for-regularization-parameter-in-lasso) – Alex Oct 20 '16 at 23:05
-
1In fact, the optimal ridge parameter can be 0 or even negative. Some discussion: on stats.SE https://stats.stackexchange.com/questions/331264/understanding-negative-ridge-regression with a paper here https://arxiv.org/abs/1805.10939 – Sycorax Jul 28 '20 at 21:11
2 Answers
You don't really need to bother. In most packages (like glmnet) if you do not specify $\lambda$, the software package generates its own sequence (which is often recommended). The reason I stress this answer is that during the running of the LASSO the solver generates a sequence of $\lambda$, so while it may counterintuitive providing a single $\lambda$ value may actually slow the solver down considerably (When you provide an exact parameter the solver resorts to solving a semi definite program which can be slow for reasonably 'simple' cases.)
As for the exact value of $\lambda$ you can potentially chose whatever you want from $[0,\infty[$. Note that if your $\lambda$ value is too large the penalty will be too large and hence none of the coefficients can be non-zero. If the penalty is too small you will overfit the model and this will not be the best cross validated solution

- 2,489
- 10
- 15
-
5Hi Sid, the OP appears aware of the fact you mention in your post. It also does not appear to answer the question. :-) – cardinal Aug 15 '14 at 19:27
For those trying to figure this out:
I have found that there is a great difference between allowing glmnet
to calculate $\lambda$, and for when we create a range for it to choose from (grid
).
Here is an example using "applicants" in the College
data set from ISLR
# Don't forget to set seed
set.seed(1)
train <- sample(1:dim(College)[1], 0.75*dim(College[1]))
# Matrices
xmat.train <- model.matrix(Apps~.-1,data=College[train,])
xmat.test <- model.matrix(Apps~.-1, data= College[-train,])
y <- College$Apps[train]
# Create a grid of values for the scope of lambda (optional):
grid <- 10 ^ seq(10,-2,length = 100)
# Add the grid here as lambda (optional)
ridge.fit <- glmnet(xmat.train, y, alpha = 0, lambda=grid)
cv.ridge <- cv.glmnet(xmat.train, y, alpha =0, lambda=grid)
bestlam <- cv.ridge$lambda.min
cat("\nBestlam (with grid):",bestlam)
pred <- predict(ridge.fit, s = bestlam, newx= xmat.test)
cat("\nWith Grid:", mean((College$Apps[-train]-pred)^2))
# Again but without the grid (allowing R to figure lambda out
ridge.fit <- glmnet(xmat.train, y, alpha = 0)
cv.ridge <- cv.glmnet(xmat.train, y, alpha =0)
bestlam <- cv.ridge$lambda.min
cat("\n\nBestlam (no grid):",bestlam)
pred <- predict(ridge.fit, s = bestlam, newx= xmat.test)
cat("\nWithout Grid:", mean((College$Apps[-train]-pred)^2))
You can run this yourself, and you can change grid
accordingly as well, I've seen examples ranging from grid <- 10 ^ seq(10,-2,length = 100)
to grid <- 10^seq(3, -2, by = -.1)
.
My best guess is that $\lambda$ can be restricted to certain values, and it is up to us in figuring out the most optimal range.
I have also found this guide quite helpful -> https://drsimonj.svbtle.com/ridge-regression-with-glmnet

- 1