hal9001 vs cv.glmnet different coefficients, lambda stars - how to synchronize?

Question

I am working with hal9001, which calls cv.glmnet in the backend (if it is prompted to do so). I am getting slightly different results with the two approaches though. Ie, I would like the returned coefficients to be identical. I don't know too much about glmnet, and I was wondering if someone might know more and be able to easily see why they aren't the same. Of note, the lambda star are not the same.

# load the package and set a seed
library(hal9001)
library(glmnet)
#> Loading required package: Rcpp
#> hal9001 v0.2.8: The Scalable Highly Adaptive Lasso
set.seed(385971)
sim.fit=function(family){
# simulate data
n <- 50
p <- 1
x <- matrix(rnorm(n * p), n, p)
y <- rbinom(n,1,prob=1/(1+exp(-(x * sin(x)))))

md = 1
b=hal9001::enumerate_basis(x,max_degree=md)
ex = hal9001::make_design_matrix(x,b)
#ex = as.matrix(ex)
#as.matrix(d$beta)
#d = glmnet::glmnet(ex,y,lambda=0.03445705)
cv.glm=cv.glmnet(ex,y,standardize=FALSE,intercept=FALSE,family=family)
hal.fit = fit_hal(x,y,standardize=FALSE,intercept=FALSE,family=family,
                  fit_type="glmnet",max_degree=md,return_lasso=TRUE)

#cv.glm$glmnet.fit$beta
hal_coeffs = hal.fit$coefs

glmnet_coeffs <- coef(cv.glm, s = "lambda.min")

cbind(hal_coeffs,glmnet_coeffs)
}
#data.frame(name = tmp_coeffs@Dimnames[[1]][tmp_coeffs@i + 1], coefficient = tmp_coeffs@x)
sim.fit("gaussian")

output

51 x 2 sparse Matrix of class "dgCMatrix"
                    1         1
(Intercept) .         .        
V1          0.5062566 0.5061190
V2          .         .        
V3          0.2066873 0.2115345


> hal.fit$lambda_star
[1] 0.05141281
> cv.glm$lambda.min
[1] 0.04684544

With family = binomial, which is my intended use case, it's even worse. Eg the coefficient pairs for

sim.fit("binomial") with n=500, for example, are quite far off.

Also, the call from Hal can be found

Call: glmnet(x = x_basis, y = Y, lambda = lambda, family = family, penalty.factor = penalty_factor, standardize = FALSE, intercept = FALSE, lambdas = ..3)

Whereas the cv.glmnet call is

cv.glmnet(x = ex, y = y, standardize = FALSE, intercept = FALSE, 
    family = "gaussian")

Update: if lambda is the same, they agree exactly. so, the only difference is in selecting lambda.

In the example I gave, the lambda grid ares slightly different. But it is not only the lambda grid- just checked, even if they are the same, different result. So the lambda grids may be the same, but the selected lambdas are still different.

`glmnet` is random. To obtain reproducible results, you have to initialize it with the same random number seed. — whuber, Aug 22 '21 at 14:45

hal9001 vs cv.glmnet different coefficients, lambda stars - how to synchronize?

0 Answers0