Questions tagged [regularization]

Inclusion of additional constraints (typically a penalty for complexity) in the model fitting process. Used to prevent overfitting / enhance predictive accuracy.

Regularization refers to the inclusion of additional components in the model fitting process that are used to prevent overfitting and/or stabilize parameter estimates.

Parametric approaches to regularization typically add terms to the training error or MLE objective function that penalize model complexity, in addition to the standard data misfit terms (e.g. Ridge Regression, LASSO). This penalty can be interpreted as arising from a prior on the parameter vector in the framework of Bayesian MAP estimation.

Non-parametric regularization techniques include dropout (used in deep learning) and truncated-SVD (used in linear least squares).

Synonyms include: penalization, shrinkage methods, and constrained fitting.

1283 questions

141

votes

8 answers

Why L1 norm for sparse models

I am reading books about linear regression. There are some sentences about the L1 and L2 norm. I know the formulas, but I don't understand why the L1 norm enforces sparsity in models. Can someone give a simple explanation?

asked Dec 11 '12 at 07:25

Yongwei Xing

1,583
3
11
7

114

votes

4 answers

Why does the Lasso provide Variable Selection?

I've been reading Elements of Statistical Learning, and I would like to know why the Lasso provides variable selection and ridge regression doesn't. Both methods minimize the residual sum of squares and have a constraint on the possible values of…

regression feature-selection lasso regularization

asked Nov 04 '13 at 14:39

Zhi Zhao

1,352
3
9
9

votes

3 answers

What is the lasso in regression analysis?

I'm looking for a non-technical definition of the lasso and what it is used for.

regression lasso regularization

asked Oct 19 '11 at 04:24

Paul Vogt

votes

6 answers

Why is the L2 regularization equivalent to Gaussian prior?

I keep reading this and intuitively I can see this but how does one go from L2 regularization to saying that this is a Gaussian Prior analytically? Same goes for saying L1 is equivalent to a Laplacean prior. Any further references would be great.

regression references regularization

asked Jul 27 '15 at 14:59

Anonymous

1,169
2
10
10

votes

5 answers

What is regularization in plain english?

Unlike other articles, I found the wikipedia entry for this subject unreadable for a non-math person (like me). I understood the basic idea, that you favor models with fewer rules. What I don't get is how do you get from a set of rules to a…

regularization

asked Nov 27 '10 at 16:24

Meh

1,135
2
10
12

votes

5 answers

Unified view on shrinkage: what is the relation (if any) between Stein's paradox, ridge regression, and random effects in mixed models?

Consider the following three phenomena. Stein's paradox: given some data from multivariate normal distribution in $\mathbb R^n, \: n\ge 3$, sample mean is not a very good estimator of the true mean. One can obtain an estimation with lower mean…

regression mixed-model ridge-regression regularization steins-phenomenon

asked Oct 30 '14 at 15:08

amoeba

93,463
28
275
317

votes

6 answers

Why is multicollinearity not checked in modern statistics/machine learning

In traditional statistics, while building a model, we check for multicollinearity using methods such as estimates of the variance inflation factor (VIF), but in machine learning, we instead use regularization for feature selection and don't seem to…

regression machine-learning multicollinearity regularization variance-inflation-factor

asked Aug 25 '15 at 00:16

user

votes

5 answers

What problem do shrinkage methods solve?

The holiday season has given me the opportunity to curl up next to the fire with The Elements of Statistical Learning. Coming from a (frequentist) econometrics perspective, I'm having trouble grasping the uses of shrinkage methods like ridge…

lasso ridge-regression regularization lars

asked Dec 27 '11 at 22:35

Charlie

13,124
5
38
68

votes

3 answers

Why does ridge estimate become better than OLS by adding a constant to the diagonal?

I understand that the ridge regression estimate is the $\beta$ that minimizes residual sum of square and a penalty on the size of $\beta$ $$\beta_\mathrm{ridge} = (\lambda I_D + X'X)^{-1}X'y = \operatorname{argmin}\big[ \text{RSS} + \lambda…

regression least-squares ridge-regression regularization

asked Oct 11 '14 at 18:52

Heisenberg

4,239
3
23
54

votes

6 answers

Is ridge regression useless in high dimensions ($n \ll p$)? How can OLS fail to overfit?

Consider a good old regression problem with $p$ predictors and sample size $n$. The usual wisdom is that OLS estimator will overfit and will generally be outperformed by the ridge regression estimator: $$\hat\beta = (X^\top X + \lambda I)^{-1}X^\top…

cross-validation overfitting ridge-regression regularization

asked Feb 14 '18 at 16:31

amoeba

93,463
28
275
317

votes

7 answers

Why is the regularization term added to the cost function (instead of multiplied etc.)?

Whenever regularization is used, it is often added onto the cost function such as in the following cost function. $$ J(\theta)=\frac 1 2(y-\theta X^T)(y-\theta X^T)^T+\alpha\|\theta\|_2^2 $$ This makes intuitive sense to me since minimize the cost…

regularization

asked May 22 '18 at 09:48

grenmester

votes

3 answers

Why does shrinkage work?

In order to solve problems of model selection, a number of methods (LASSO, ridge regression, etc.) will shrink the coefficients of predictor variables towards zero. I am looking for an intuitive explanation of why this improves predictive ability.…

lasso ridge-regression intuition regularization

asked Nov 02 '15 at 20:29

aspiringstatistician

votes

5 answers

How to derive the ridge regression solution?

I am having some issues with the derivation of the solution for ridge regression. I know the regression solution without the regularization term: $$\beta = (X^TX)^{-1}X^Ty.$$ But after adding the L2 term $\lambda\|\beta\|_2^2$ to the cost function,…

regression least-squares regularization ridge-regression

asked Sep 04 '13 at 15:49

user34790

6,049
6
42
64

votes

3 answers

Regularization methods for logistic regression

Regularization using methods such as Ridge, Lasso, ElasticNet is quite common for linear regression. I wanted to know the following: Are these methods applicable for logistic regression? If so, are there any differences in the way they need to be…

regression logistic regularization

asked Aug 08 '16 at 10:29

Tapan Khopkar

votes

3 answers

Why do we only see $L_1$ and $L_2$ regularization but not other norms?

I am just curious why there are usually only $L_1$ and $L_2$ norms regularization. Are there proofs of why these are better?

lasso regularization ridge-regression

asked Mar 23 '17 at 09:28

user10024395

2 3

…

85 86 Next