How Ridge or Lasso regression really work?

Question

Very basic question here, but I would like to understand (not mathematically) how the fact to add a "penalty" (sum of squared coeff. times a scalar) to the residual sum of square can reduce big coefficients ? thanks !

For a graphical / visual intuition have a look at these: https://stats.stackexchange.com/questions/350046/the-graphical-intuiton-of-the-lasso-in-case-p-2/351883#351883 , https://stats.stackexchange.com/questions/351631/graphical-path-coordinate-descent-in-case-of-semi-differentiable-functions-such/351689#351689 — Xavier Bourret Sicotte, Jul 23 '18 at 09:55

score 6 · Accepted Answer · answered Jul 23 '18 at 09:41

Because your "penalty" represenation of the minimization problem is just the langrange form of a constraint optimization problem:

Assume centered variables. In both cases, lasso and ridge, your unconstrained target function is then the usual sum of squared residuals; i.e. given $p$ regressors you minimize: $$RSS(\boldsymbol{\beta}) = \sum_{i=1}^n (y_i-(x_{i,1}\beta_1 +\dots +x_{i,p}\beta_p))^2.$$ over all $\boldsymbol{\beta} =(\beta_1,\dots, \beta_p)$.

Now, in the case of a ridge regression you minimize $RSS(\boldsymbol{\beta})$ such that $$\sum_{i=1}^p\beta_p^2 \leq t_{ridge},$$ for some value of $t_{ridge}\geq 0$. For small values of $t_{ridge}$ it will be impossible to derive the same solution as in standard least square scenario in which case you only minimize $RSS(\boldsymbol{\beta})$ -- Think about $t_{ridge}=0$ then the only possible solution can be $\beta_1\equiv \dots \equiv \beta_p = 0$.

On the other hand, in the case of the lasso, you minimize $RSS(\boldsymbol{\beta})$ under the constraint $$\sum_{i=1}^p|\beta_p| \leq t_{lasso},$$ for some value of $t_{lasso}\geq 0$.

Both constrained optimization problems can be equivalently forumlated in terms of an unconstrained optimization problem, i.e. for the lasso: you can equivalently minimize

$$\sum_{i=1}^n (y_i-(x_{i,1}\beta_1 +\dots +x_{i,p}\beta_p))^2 + \lambda_{lasso}\sum_{i=1}^p|\beta_p|.$$

Thanks, i'll have to deep into the "constrained to unconstrained" part but i got the idea — TmSmth, Jul 23 '18 at 09:57

How Ridge or Lasso regression really work?

1 Answers1