Why is caret assuming BestModel = the one who minimize the CV-metric and not the minimum+1se like in LASSO? (lambda.min vs lambda.1se)

Question

Usually, when approaching to LASSO, the best hyperparameter lambda is assumed to be lambda.1se (in the glmnet package), which is the lambda minimizing/maxining the CV-metric (usually AUC, Accuracy or Deviance) PLUS 1 standard deviation. lamnda.1se = lambda.min + sd(lambda.min)

In this way there is a stronger penalization and the model is less prone to overfitting.

However, in caret, I always find the assumption that the best model is the corresponding to the min/max of the CV-metric. Is it wise?

Is there a way to ask caret train function to select the model corresponding to lambda.1se?

Thank you.

score 5 · Answer 1 · answered Nov 13 '20 at 13:34

Usually, when approaching to LASSO, the best hyperparameter lambda is assumed to be lambda.1se (in the glmnet package), which is the lambda minimizing/maxining the CV-metric (usually AUC, Accuracy or Deviance) PLUS 1 standard deviation.

That isn't necessarily the case. The 1-standard-error rule (lambda.1se) can provide a more parsimonious model than the minimum cross-validated error (lambda.min). But it's just a heuristic rule with no strong theoretical justification. In fact, in the illustration of cross-validation for LASSO in Section 6.6.2 of ISLR, cv.glmnet() is used with lambda.min as the criterion.

I use lambda.min as a default, and think of lambda.1se as an option only if you need to reduce the number of retained predictors even further.

Why is caret assuming BestModel = the one who minimize the CV-metric and not the minimum+1se like in LASSO? (lambda.min vs lambda.1se)

1 Answers1

Linked