NOT A DUPLICATE For the persons who marked this question as a duplicate of the post I mentioned in my original post: this is not a duplicate, as the correlation obtained with an intercept-only linear model would be NaN
or 0
, not 0.1
as mentioned in my post. Further I am asking how to use the most common R squared formulation and no answer is provided to that post.
Original post
I used caret
with glmnet
and selected repeatedcv
(10-fold, 5 repeats) to choose the best glmnet
parameters (alpha
and lambda
) in terms of Rsquared
(i.e. better = larger Rsquared
).
library(glmnet)
library(caret)
alpha.grid <- (1:20) * 0.05
lambda.grid <- 10^seq(4,-4,length=200)
EN.param.grid <- expand.grid(.alpha=alpha.grid, .lambda=lambda.grid)
train.params <- trainControl(method="repeatedcv", number=10, repeats=5)
EN.fit <- train(x=X, y=Y, method="glmnet", tuneGrid=EN.param.grid,
trControl=train.params, standardize=TRUE, metric="Rsquared")
Despite the best model having no remaining predictors (i.e. only an intercept), the associated Rsquared
as provided by caret
(as seen in EN.fit$results$Rsquared
dataframe) should be 0 but it is not (in my case 0.1
). Then, questions coming to mind are:
- Is the test
Rsquared
calculated in caret in terms of correlation instead of the more traditional sum-of-squares method (as suggested by this post)? - How to correct this behavior to obtain a test
Rsquared
of0
for an intercept-only model (as many would expect), and still be able to correctly optimize aglmnet
parameter search onRsquared
?