2

I asked this question at stackoverflow, but at this point I'm not exactly sure where it belongs because it is a question related to the standardization process of glmnet and the lasso.

I am running glmnet and trying to see the difference when using standardize = FALSE with pre-standardized variables.

library(glmnet)
data(QuickStartExample)
### Standardized Way
x_standardized <- scale(x, center = TRUE, scale = TRUE)
y_standardized <- scale(y, center = TRUE, scale = TRUE)
cv_standardized <- cv.glmnet(x_standardized, y_standardized,
                             intercept = FALSE,
                             standardize = FALSE, standardize.response = FALSE)
destandardized_coef <- coef(cv_standardized)[-1] * sd(y) / apply(x, 2, sd)
destandardized_coef
mean(y) - sum(destandardized_coef * colMeans(x))

### Let glmnet Stanardize
cv_normal <- cv.glmnet(x, y)
coef(cv_normal, cv_normal$lambda.min) %>% as.numeric()

Initially, I am standardizing the data myself and back transforming the coefficients. I would expect to see the same results but for some reason am getting slightly different coefficients.

My question is, how can I extract the same results, and why are the coefficients currently different this way?

EDIT: Here is my updated code based on @user2974951's response, it does appear that glmnet standardizes differently than the scale function in R. I want to note that I am fitting the initial model without the intercept since I am later calculating it by hand. Ultimately, this shouldn't matter as I provide a standardized x and y, the intercept should be zero.

set.seed(123)
library(glmnet)
library(tidyverse)
data(QuickStartExample)
# standardize data
n <- length(y)
y_mean <- mean(y)
y_centered <- y - mean(y)
y_sd <- sqrt(sum((y - y_mean) ^ 2) / n)
ys <- y_centered / y_sd
X_centered <- apply(x, 2, function(x) x - mean(x))
Xs <- apply(X_centered, 2, function(x) x / sqrt(sum(x ^ 2) / n))
ys <- y_centered / sqrt(sum(y_centered ^ 2) / n)

set.seed(123)
cv_standardized <- cv.glmnet(Xs, ys,
                             intercept = FALSE,
                             standardize = FALSE,
                             standardize.response = FALSE)
destandardized_coef <- coef(cv_standardized)[-1] * sd(y) / apply(x, 2, sd)
personal_standardize <- c(mean(y) - sum(destandardized_coef * colMeans(x)),
                          destandardized_coef)

set.seed(123)
cv_normal <- cv.glmnet(x, y, intercept = TRUE)
glmnet_standardize <- coef(cv_normal, cv_normal$lambda.min)

Results:

cbind(personal_standardize, glmnet_standardize)
21 x 2 sparse Matrix of class "dgCMatrix"
            personal_standardize  glmnet_standardize
(Intercept)           0.15473991  0.145832036
V1                    1.29441444  1.340981414
V2                    .           .          
V3                    0.62890756  0.708347139
V4                    .           .          
V5                   -0.77785401 -0.848087765
V6                    0.48387954  0.554823781
V7                    .           0.038519738
V8                    0.28419264  0.347221319
V9                    .           .          
V10                   .           0.010034050
V11                   0.09130386  0.186365264
V12                   .           .          
V13                   .           .          
V14                  -1.03241106 -1.086006902
V15                   .          -0.004444318
V16                   .           .          
V17                   .           .          
V18                   .           .          
V19                   .           .          
V20                  -0.96190123 -1.069942845

Thanks in advance.

AW27
  • 233
  • 1
  • 7
  • Not exactly the same (ridge rather than lasso) but this recent question may provide some insight https://stats.stackexchange.com/questions/519450/why-the-ridge-regression-is-not-scale-invariant/ – jcken Apr 16 '21 at 07:24
  • Two points: 1) `scale` uses n-1 when computing SD, glmnet does not, 2) your first model does not contain an intercept, while the second does. – user2974951 Apr 16 '21 at 08:07
  • @user2974951 sorry about that, you're right, it was a typo with `intercept = FALSE`. I've edited the results. I see that the standardisation methods are different but am still struggling to get the same results with the correct `glmnet` standardisation of dividing by n rather than n-1. – AW27 Apr 16 '21 at 08:14
  • 1
    The code you display doesn’t seem to show any control of the seed used for the random number generation used during `cv.glmnet`. Try setting the seed to the same number just before each invocation of that function, and make sure that the cases are in the exact same order both times. Then post your results again. If that fixes your problem, please write it up as an answer to your question, as a service to others facing similar issues. – EdM Apr 16 '21 at 08:45
  • @EdM I edited the code, looks like `set.seed` doesn't really alleviate the issue. – AW27 Apr 16 '21 at 08:53

1 Answers1

2

I had a bit of an issue with a typo but was able to figure it out. Based on @user2974951's response and the Github page for glmnet, it looks like they are using 1/n standardization rather than 1/n-1. I edited the code and was able to get the same results.

set.seed(123)
library(glmnet)
library(tidyverse)
data(QuickStartExample)
# standardize data
n <- length(y)
y_centered <- scale(y, center = TRUE, scale = FALSE)
y_sd <- sqrt(sum((y - mean(y)) ^ 2) / n)
ys <- y_centered / y_sd
x_centered <- scale(x, center = TRUE, scale = FALSE)
x_sd <- apply(x, 2, function(x) sqrt(sum((x - mean(x)) ^ 2) / n))
xs <- matrix(ncol = ncol(x), nrow = nrow(x))
for (i in seq_len(ncol(x))) {
  xs[, i] <- x_centered[, i] / x_sd[i]
}
### glmnet using standardized data
set.seed(123)
cv_standardized <- cv.glmnet(xs, ys,
                             intercept = FALSE,
                             standardize = FALSE,
                             standardize.response = FALSE)
destandardized_coef <- coef(cv_standardized, cv_standardized$lambda.min)[-1] * y_sd /
  apply(x, 2, function(x) sqrt(sum((x - mean(x)) ^ 2) / n))
personal_standardize <- c(mean(y) - sum(destandardized_coef * colMeans(x)),
                          destandardized_coef)

### Letting glmnet standardize data
set.seed(123)
cv_normal <- cv.glmnet(x, y, intercept = TRUE)
coef(cv_normal, cv_normal$lambda.min) %>% as.numeric()
glmnet_standardize <- coef(cv_normal, cv_normal$lambda.min)

Results:

cbind(personal_standardize, glmnet_standardize)
21 x 2 sparse Matrix of class "dgCMatrix"
            personal_standardize            1
(Intercept)          0.145832036  0.145832036
V1                   1.340981414  1.340981414
V2                   .            .          
V3                   0.708347139  0.708347139
V4                   .            .          
V5                  -0.848087765 -0.848087765
V6                   0.554823781  0.554823781
V7                   0.038519738  0.038519738
V8                   0.347221319  0.347221319
V9                   .            .          
V10                  0.010034050  0.010034050
V11                  0.186365264  0.186365264
V12                  .            .          
V13                  .            .          
V14                 -1.086006902 -1.086006902
V15                 -0.004444318 -0.004444318
V16                  .            .          
V17                  .            .          
V18                  .            .          
V19                  .            .          
V20                 -1.069942845 -1.069942845

You can even get the destandardized lambda from the personal stardization method by:

> cv_standardized$lambda.min * y_sd
[1] 0.06284188
> cv_normal$lambda.min
[1] 0.06284188

Also, set.seed(123) did help find my solution to this problem. Thank you for the help.

AW27
  • 233
  • 1
  • 7
  • Hello @AW27 This is a really nice example about the standardize option in glmnet. I have one following question. So if I scale the dataset by myself (for example, using a scale() function), then should I intentionally use "standardize = FALSE" to get "correct" coefficients? My goal is to rank the coefficients by the magnitudes. – rudgus51998 Jul 28 '21 at 14:38
  • 1
    Hi @rudgus51998, yes, you should use `standardize = FALSE` however, you will want to put them back into scale to make sure they're interpretable. If you don't rescale them you'll have standardised coefficients which provides a different interpretation. – AW27 Jul 28 '21 at 15:16
  • Thank you so much for your quick reply!! @AW27 Just like you did, right? (standardize = FALSE -> destandardized coef -> personal_standardize). – rudgus51998 Jul 28 '21 at 15:19
  • 1
    Or I can just use "unstandardized dataset" (the original one) directly to the glmnet function with standardize = TRUE option. Then the coefficients are comparable by its magnitude. Am I right, @AW27 ? – rudgus51998 Jul 28 '21 at 16:35
  • I think both of your posts are correct. If you let `glmnet` standardize the coefficients for you, it should return the coefficients back in their original magnitude, which is like the example. – AW27 Jul 28 '21 at 18:01