Selection of alpha in elastic net: overfitting?

Question

Earlier I asked whether grid fineness of $\lambda$ is related to overfitting in LASSO, ridge regression and elastic net models. I got an answer that it is not the case. Now I am asking,

Question: Is grid fineness of $\alpha$ in elastic net related to overfitting?
($\alpha$ is the parameter governing the balance between $L_1$ and $L_2$ penalty.)

The argumentation in the answer to the linked question goes like this:

we definitely want to optimize our penalized likelihood function for values $\lambda$, and it doesn't matter how many values of $\lambda$ we test, because out-of-sample performance for a fixed data set and fixed partitioning is entirely deterministic. More to the point, the out-of-sample metric is not at all altered by how many values $\lambda$ you test.

I would guess that the same applies to $\alpha$ in place of $\lambda$, and hence a finer grid can only help but not hurt. Is that right?

(A note may be due that when doing cross validation, I fix $\alpha$ first and then do a search over a grid of $\lambda$s.)

Some related questions are this, this and this.

Erik · Answer 1 · 2015-10-12T13:11:36.837

3

It is true that the out-of-sample performance on a fixed data set is deterministic, the question is whether the performance is generalizable. And that is not the case if you optimize your model to perform best on that fixed test data set.

Whenenver you optimize hyperparameters (here lambda) by checking the out-of-sample performance on a specific data set, you can no longer use that data set to get unbiased performance estimates. This does not mean that you don't want to optimize the parameter, it just means you can't use the performance estimates.

But don't believe me blindly, better to run a simulation yourself.

In the case of the grid size, you won't make overfitting much worse by making it finer. The important thing is whether you tune $\lambda$.

edited Oct 12 '15 at 13:11

answered Oct 12 '15 at 13:02

Erik

6,909
20
48

Thanks for your answer! Would you consider answering the linked question, too, for those who will only see *that* question without seeing your answer *here*? It should be easy to do as the argument is the same (basically, it will be copy-paste). Now regarding whether I understand you correctly, you stress that the *performance* metric can be overly optimistic. I get this. But I do not care about assessing performance. I care about selecting an optimal alpha (in this post) or lambda (in the other post). Do you agree that in order to find an optimal lambda and/or alpha a fine grid is no problem? – Richard Hardy Oct 12 '15 at 13:06
1

I think the other answer is more precise than you give it credit for. It correctly states that the grid size has relatively little impact; it's optimizing the parameter at all which causes some optimism in the estimates. – Erik Oct 12 '15 at 13:13
Also, I do not quite understand the idea of the last paragraph in your answer. – Richard Hardy Oct 12 '15 at 13:18
Your question is actually an interesting observation and I am not sure it is correct to say that the grid of lambda is not related to the overfitted solution. Suppose you have a way to tune lambda to get a 'non-overfitted' solution, call it method X (e.g. cross-validation may be a method although one may argue it is not the case). Typically we grid search lambda over some grid values of it. This means that method X finds 'optimal' lambda within the epsilon, where epsilon is the grid length. – Jonas Striaukas Mar 21 '20 at 20:59
Continued: This implies: 1. gird length matters and coarser grid may lead to overfitted solution, 2. you may want to take lambda+epsilon as the optimal lambda. – Jonas Striaukas Mar 21 '20 at 21:02

Selection of alpha in elastic net: overfitting?

1 Answers1

Linked