47

I'm a software engineer learning machine learning, particularly through Andrew Ng's machine learning courses. While studying linear regression with regularization, I've found terms that are confusing:

  • Regression with L1 regularization or L2 regularization
  • LASSO
  • Ridge regression

So my questions:

  1. Is regression with L1 regularization exactly the same as LASSO?

  2. Is regression with L2 regularization exactly the same as Ridge Regression?

  3. How is "LASSO" used in writing? Should it be "LASSO regression"? I've seen usage like "the lasso is more appropriate".

If the answer is "yes" for 1 and 2 above, then why are there different names for these two terms? Does "L1" and "L2" come from computer science / math, and "LASSO" and "Ridge" from stats?

The use of these terms is confusing when I see posts like:

"What is the difference between L1 and L2 regularization?" (quora.com)

"When should I use lasso vs ridge?" (stats.stackexchange.com)

stackoverflowuser2010
  • 3,190
  • 5
  • 27
  • 35
  • Though I'm replying late. This comprehensive beginner's guide for Linear, Ridge and Lasso Regression will help the beginners to understand these terms clearly. See [here](https://www.analyticsvidhya.com/blog/2017/06/a-comprehensive-guide-for-linear-ridge-and-lasso-regression/) – Learner Jun 06 '18 at 11:57

1 Answers1

46
  1. Yes.

  2. Yes.

  3. LASSO is actually an acronym (least absolute shrinkage and selection operator), so it ought to be capitalized, but modern writing is the lexical equivalent of Mad Max. On the other hand, Amoeba writes that even the statisticians who coined the term LASSO now use the lower-case rendering (Hastie, Tibshirani and Wainwright, Statistical Learning with Sparsity). One can only speculate as to the motivation for the switch. If you're writing for an academic press, they typically have a style guide for this sort of thing. If you're writing on this forum, either is fine, and I doubt anyone really cares.

The $L$ notation is a reference to Minkowski norms and $L^p$ spaces. These just generalize the notion of taxicab and Euclidean distances to $p>0$ in the following expression: $$ \|x\|_p=(|x_1|^p+|x_2|^p+...+|x_n|^p)^{\frac{1}{p}} $$ Importantly, only $p\ge 1$ defines a metric distance; $0<p<1$ does not satisfy the triangle inequality, so it is not a distance by most definitions.

I'm not sure when the connection between ridge and LASSO was realized.

As for why there are multiple names, it's just a matter that these methods developed in different places at different times. A common theme in statistics is that concepts often have multiple names, one for each sub-field in which it was independently discovered (kernel functions vs covariance functions, Gaussian process regression vs Kriging, AUC vs $c$-statistic). Ridge regression should probably be called Tikhonov regularization, since I believe he has the earliest claim to the method. Meanwhile, LASSO was only introduced in 1996, much later than Tikhonov's "ridge" method!

Sycorax
  • 76,417
  • 20
  • 189
  • 313
  • Thanks for the info. If I use L1/L2 regularization for other machine learning algorithms like logistic regression, would the terms LASSO/Ridge still be applicable? – stackoverflowuser2010 Mar 07 '16 at 19:57
  • Also worthy of mention primarily because it has a cool name is the "nonnegative garrote." – dsaxton Mar 07 '16 at 19:57
  • stackoverflowuser2010 Yes. @dsaxton I can't say that I'm familiar with it, but I believe that it was a precursor to the development of the LASSO. – Sycorax Mar 07 '16 at 19:59
  • 6
    +1. In the very recent *Statistical Learning with Sparsity* textbook, **Hastie, Tibshirani, and Wainwright use all-lower-case "lasso" everywhere** and also write the following (footnote on page 8): "A lasso is a long rope with a noose at one end, used to catch horses and cattle. In a figurative sense, the method “lassos” the coefficients of the model. In the original lasso paper (Tibshirani 1996), the name “lasso” was also introduced as an acronym for “Least Absolute Selection and Shrinkage Operator.”" (CC to @stackoverflowuser2010.) – amoeba Mar 07 '16 at 21:42
  • 3
    And they continue: "Pronunciation: in the US “lasso” tends to be pronounced “lass-oh” (oh as in goat), while in the UK “lass-oo.” In the OED (2nd edition, 1965): “lasso is pronounced lasoo by those who use it, and by most English people too.”" :-) – amoeba Mar 07 '16 at 21:45
  • 1
    @amoeba Thanks for the reference! I think the most that can be said is that conventions vary. Considering that the Writers Guild in New York literally split into two organizations over the issue of whether or not to include an apostrophe in its name, it's safe to say that writing conventions evoke strong emotions over small distinctions. The comparison to *Mad Max* is intended to be tongue-in-cheek, not a partisan stance. – Sycorax Mar 07 '16 at 21:55
  • 5
    (+1) As acronyms proper (those abbreviations pronounced as words) gain currency their capitalization tends to go by the board. It's been a while since I've seen 'RADAR' or 'LASER'. – Scortchi - Reinstate Monica Mar 08 '16 at 10:26
  • 3
    @Scortchi SCUBA too. Meanwhile we have people writing STATA and MATLAB as if they're acronyms. – shadowtalker Apr 14 '16 at 03:18
  • @ssdecontrol: [*MATLAB* is an abbreviation for "matrix laboratory"](http://uk.mathworks.com/help/matlab/learn_matlab/matrices-and-arrays.html), we're told by the company that sell it; we may as well defer to them. (But defer to Nick Cox on "Stata".) – Scortchi - Reinstate Monica Apr 14 '16 at 14:42
  • @Scortchi really it should be MatLab then, but if they want to be wrong I won't stop them – shadowtalker Apr 14 '16 at 16:01
  • 1
    @ssdecontrol Just abbreviate it ML. What could go wrong? ;-) – Sycorax Apr 14 '16 at 16:06
  • 4
    @ssdecontrol: Should "ANOVA" be "AnOVa" then? – Scortchi - Reinstate Monica Apr 14 '16 at 16:06
  • Stackexchange is the first place I have seen the term "Taxicab distance". – Sander Heinsalu Mar 02 '21 at 14:49
  • @SanderHeinsalu it might be the first place you’ve seen it, but it’s far from the only place where it’s used. Here it is on Wikipedia https://en.m.wikipedia.org/wiki/Taxicab_geometry – Sycorax Mar 02 '21 at 15:02