Constrained Linear Regression with coefficients related by inequality

Question

I have a (slightly simplified) model of the following form:

$Y=c_1X_1 + c_2X_2 + \epsilon$ subject to the constraint $0\leq c_1\leq c_2$.

The distribution of $\epsilon$ is actually not important to me - fitting the curve by least squares is all I care about (for a volatility surface parameterization).

What is the best approach to solving this with deterministic results (i.e. not using iterative techniques dependent on starting points etc.).

I am working in C++ and I have Armadillo for my linear algebra library.

I am comfortable with python's libraries as well and R if necessary, but ultimate implementation will be C++.

Welcome to our site. As described in the answer at https://stats.stackexchange.com/questions/29279, write your model as $$Y = c_1(X_1 + X_2) + (c_2-c_1)X_2 + \epsilon$$ subject to the constraint $(c_1,c_2-c_1)\ge(0,0)$ and search the solutions for "non-negative least squares" problems given at https://stats.stackexchange.com/search?q=non-negative+least+squares. — whuber, Aug 28 '18 at 18:46
Thx - I knew it had to be something easy like this. I'll be back in the swing of things soon - been a while since doing this kind of stuff. I have to solve this problem a few thousand time a second, so we shall see how that goes. len(Y) usually between 10-100. If speed is a problem, I can do usual OLS and it would be ok if sometimes the constraint fails as long enough of them do not - usually that is the case. — FinanceGuyThatCantCode, Aug 28 '18 at 18:59
When the OLS solution doesn't work, you know at least one constraint holds. It isn't too inefficient just to solve the resulting three possible models $Y=c_1(X_1+X_2)+\epsilon,$ $Y=c_2 X_2 + \epsilon,$ and $Y=\epsilon$ and select the one with least sum of squared residuals. You could probably do up to a million of these a second (per core) by coding the procedure in a compiled language and perhaps a billion a second using a couple of high-end coprocessors, assuming you have the bandwidth to get all the data in quickly enough! — whuber, Aug 28 '18 at 19:16
Good idea - when my data is good, the $R^2$ does need to be better than 0.99, so what I do is test a model such as $Y=c_1\rho X_1 + c_1 X_2$ where $|\rho|\leq 1$. I test for 20 or 40 different $\rho$ from -1 to 1. (Note that this like solving the original problem where $0\leq |c_1| \leq c_2$ which is what I am really doing. — FinanceGuyThatCantCode, Aug 28 '18 at 19:27
That looks like a hard way to do it. Why not just use OLS to find $c_1$ and $c_2$ directly? — whuber, Aug 28 '18 at 19:28
I do use OLS to get the $c_1$ with the fixed $\rho$ - the trouble is the OLS with $c_1$ and $c_2$ may imply a $\rho$ that is out of bounds. — FinanceGuyThatCantCode, Aug 28 '18 at 19:29
See https://stats.stackexchange.com/questions/136563/linear-regression-with-individual-constraints-in-r/136602#136602 for a working example. — whuber, Aug 28 '18 at 19:30

score 1 · Answer 1 · answered May 16 '19 at 08:32

Reformulate your model to $$ Y = c_1(X_1+X_2) + (c_2-c_1) X_2 + \epsilon $$ (and probably an intercept which you have left out). Then the constraints is $0\le c_1, 0\le (c_2-c_1)$, and you can use an implementation of nonnegative least squares. In R there is a package colf, see an example in Looking for function to fit sigmoid-like curve. See also Linear Regression with individual constraints in R.

(This approach will be much simpler than what you propose in comments.)

Constrained Linear Regression with coefficients related by inequality

1 Answers1