How to perform Least Squares with constraints on a subset of the model coefficients?

Question

For solving an unconstrained LS regression $$\hat{y}=w_1.x_1+w_2.x_2+w_3.x_3+w_4.x_4 + \epsilon$$ I use the following normal equation: $$W^*=(X^{\top}X)^{-1}X^{\top}Y $$

If I want to introduce a constraint on all of the parameters, e.g. the ridge regression $w_i^2<c$, for $\forall i$, I can formulate the Lagrangian:

$$\mathcal{L(W,\lambda)} = (Y - XW)^2 - \lambda (W)^2.$$

and also obtain a matrix form solution, given by: $$W^*=(X^{\top}X-\lambda I)^{-1}X^{\top}Y $$

The question is: how may I formulate the above equation if I only want to impose constraints on some of the coefficients, e.g. $w_1=w_2$ with $w=[w_1, w_2, ..., w_d]$?

I can get to the Lagrangian, which would be: $$ \mathcal{L(W,\lambda)} = (Y - XW)^2 - \lambda (w_1-w_2)$$ but I can't get to the matrix solution for $W^*$.

I'm searching for a manual solution (i.e. with no python or R code). Thanks in advance and sorry for the not-rigorous notation.

With $w_1=w_2$, just look at the linear predictor $w_0+w_1 x_1 + w_2 x_2 + \dotsm = w_0 + w_1 (x_1+x_2) + \dotsm $! So jus remove one of them, and add the corresponding predictors. Here is a similar but more complicated example: https://stats.stackexchange.com/questions/248779/linear-model-with-constraints-on-coefficients-in-terms-of-ratios/248898#248898 — kjetil b halvorsen, Jun 27 '18 at 09:42
What could I do to impose a constraint that w1 should be "around" 1000 and w3 "around" 50? — MrT77, Jan 30 '19 at 05:59
Maybe a Bayes solution, with a prior distribution with mean of $W1$ 1000 and of $w3$ 50? and some prior variance expressing how certain you are about those restrictions. Or, not going that route, using regularization but with offsets in addition, that is , $w1$ is represented by `w1+offset(Id(50*w1))` in `R`. — kjetil b halvorsen, Jan 30 '19 at 07:29

score 0 · Accepted Answer · answered Jun 27 '18 at 09:25

Ridge is the exception, not the norm, in this case. Applying a quadratic term in the Lagrangian still allows for a closed solution of the regression, and is one of the reasons why Ridge is interesting. Using a different function to regularize will most likely not produce a closed form solution. As an example, L1-regularization, the LASSO, has no closed form solution for $W^*$.

While I could not show that your regularization has no closed form, I am persuaded that is does not; because this coupling of coordinates usually yields nasty results in matrix theory (see the uses of the Vandermonde determinant in Gaussian Random Matrices and you will see what I mean by nasty). This does not mean it is useless, just like LASSO is not useless, it just means that you should look for a numerical, iterative and algorithmic solution rather than a pretty closed form.

How to perform Least Squares with constraints on a subset of the model coefficients?

1 Answers1