How to explain differences in formulas of ridge regression, lasso, and elastic net

Question

I am trying to understand the formulas of ridge regression, lasso, and elastic net. If I got it correctly, the only difference between these 3 shrinkage methods is the penalty term,

i.e. for ridge regression $\lambda\sum_{j = 1}^{p}\beta_j^2$,

for lasso $\lambda\sum_{j = 1}^{p}|\beta_j|$,

and for elastic net $\lambda\sum_{j = 1}^{p}(\alpha\beta_j^2+(1-\alpha)|\beta_j|)$.

According to Hastie 2009 - The Elements of Statistical Learning - page 63, the formula for ridge regression is

\begin{equation} \hat{\beta}^{ridge} = argmin_{\beta}\left\{\sum_{i=1}^{N}(y_i - \beta_0 - \sum_{j=1}^{p}x_{ij}\beta_j)^2 + \lambda\sum_{j=1}^{p}\beta_j^2 \right\}, \end{equation}

so we are adding the ridge penalty to a usual linear regression model.

Since I assume that the only difference between ridge regression, lasso, and elastic net is the penalty term I would expect that the formula for the lasso would be

\begin{equation} \hat{\beta}^{lasso} = argmin_{\beta}\left\{\sum_{i=1}^{N}(y_i - \beta_0 - \sum_{j=1}^{p}x_{ij}\beta_j)^2 + \lambda\sum_{j=1}^{p}|\beta_j| \right\} \end{equation}

and

\begin{equation} \hat{\beta}^{elnet} = argmin_{\beta}\left\{\sum_{i=1}^{N}(y_i - \beta_0 - \sum_{j=1}^{p}x_{ij}\beta_j)^2 + \lambda\sum_{j=1}^{p}(\alpha\beta_j^2+(1-\alpha)|\beta_j|) \right\} \end{equation}

for the elastic net respectively.

However, according to Hastie 2009 - The Elements of Statistical Learning - page 68 the formula of the lasso is

\begin{equation} \hat{\beta}^{lasso} = argmin_{\beta}\left\{\frac{1}{2}\sum_{i=1}^{N}(y_i - \beta_0 - \sum_{j=1}^{p}x_{ij}\beta_j)^2 + \lambda\sum_{j=1}^{p}|\beta_j| \right\} \end{equation}

with a $\frac{1}{2}$ in front of the first $\sum$. According to Wikipedia there should be a $\frac{1}{N}$ in front of the first $\sum$.

Question 1: Why do I have to put a $\frac{1}{2}$ or a $\frac{1}{N}$ in front of the first $\sum$. What is the correct formula for the lasso?

Question 2: What is the correct formula for the elastic net? Do I also have to put $\frac{1}{2}$ or $\frac{1}{N}$ in front of the first $\sum$?

I would assume that you do not have to care: it only affects the value of $\lambda$... — Pascal, Jun 27 '17 at 10:50
it's just a rescaling. a similar question where recently asked before. see https://stats.stackexchange.com/questions/287395/1-2-on-lagrangian-equation-from-lasso/ and the reference to an older thread in there. — BloXX, Jun 27 '17 at 12:26
Thanks a lot to both of you, that answers my question perfectly! — Joachim Schork, Jun 27 '17 at 12:33

How to explain differences in formulas of ridge regression, lasso, and elastic net

0 Answers0

Linked