Regression with "proportionality constraints" on unknown parameters

Question

Consider the following model:

$${\bf y} = {\bf X}{\bf b} + {\bf e}$$

where ${\bf y}, {\bf n}\in {\cal R}^m$, ${\bf b}\in{\cal R}^n$, and ${\bf X}\in{\cal R}^{m \times n}$ where $m>n = {\rm rank}({\bf X})$. ${\bf e}$ is an error vector (assumed zero mean, perhaps of known covariance).

Further consider the following "proportionality constraint" on some of the elements of ${\bf b}$ such that ${\bf b}=[{\bf c}^T~~ a{\bf c}^T~{\bf f}^T]^T$ where ${\bf c}$ and ${\bf f}$ are vectors and $a$ is a scalar.

I'm wondering if the problem of estimating the vectors ${\bf c}$, ${\bf f}$ and the scalar $a$ has been addressed previously in the literature.

Possible duplicate of [Regression with restrictions like b1/b2 = q](http://stats.stackexchange.com/questions/186050/regression-with-restrictions-like-b1-b2-q) — kjetil b halvorsen, Nov 16 '16 at 12:07
@kjetilbhalvorsen This question has a significant difference: while in the question you linked to the proportionality factor q is fixed, here the proportionality factor a is unknown and must be estimated as well. — jwimberley, Nov 16 '16 at 12:46

score 1 · Accepted Answer · edited Nov 16 '16 at 20:46

This constraint defines a non-linear subspace of $\mathbb{R}^n$ for the coefficient vector, so it is not simple. I can think of a couple of possible techniques, for which I can guarantee neither computational efficiency nor convergence, but which could probably work.

Meta-minimization

If you want to constraint the coefficients in this matter for a fixed $a$, you can do so trivially via a redefinition of the model matrix $X$: \begin{align} Xb &= \left[ \begin{array}{ccc} X_1 & X_2 & X_3 \end{array} \right] \left[ \begin{array}{c} c \\ a c \\ f \end{array} \right] \\[8pt] &= X_1 c + a X_2 c + X_3 f \\[5pt] &= (X_1 + a X_2) c + X_3 f \\[8pt] &= \left[ \begin{array}{cc} X_1 + a X_2 & X_3 \end{array} \right] \left[ \begin{array}{c} c \\ f \end{array} \right] \\ \end{align} where $X_1$ and $X_2$ are $N \times p_a$ matrices, and $c$ is length $p_a$, and $X_3$ is an $N \times p_b$ matrix and $f$ is length $p_b$. Define the $N \times (p_a + p_b)$ matrix $X_a = \left[ \begin{array}{cc} X_1 + a X_2 & X_3 \end{array} \right]$. For a fixed $a$ the least squares estimate is $$ \left[ \begin{array}{c} \hat c_a \\ \hat f_a \end{array} \right] = \left( {X_a}^T X_a \right)^{-1} X_a^T Y $$ The corresponding vector of residuals is $$ r_a = Y - \hat Y_a = Y - X_a \left[ \begin{array}{c} \hat c_a \\ \hat f_a \end{array} \right] = \left( I - X_a \left( {X_a}^T X_a \right)^{-1} X_a^T \right) Y $$ and the sum of squares is $$ r_a^T r_a = Y^T \left( I - X_a \left( {X_a}^T X_a \right)^{-1} X_a^T \right) Y $$ Out-of-the box regression tools will calculate the residual sum of squares for you. Then you need "only" minimize this value for $a$. That won't be computationally simple, however. With a bounded range for $a$ this could be done completely numerically. For an unconstrained range, you might need to make headway in analytically minimizing the residual sum of squares, and/or limiting the amount of computation that needs to be done for each possible $a$ (for example, it is possible that the SVD decomposition of $X_a$ could be related to that of $X$, since the column space of $X_a$ is a subspace of the column space of $X$).

Iteration

With a value $c_0$ for the vector $c$, you could iteratively perform least-squares in a larger space. Starting with $a_0=0$ and $c_0$ and $f_0$ determined from a least-squares fit to $$ Y = \left[ \begin{array}{ccc} X_1 & X_3 \end{array} \right] \left[ \begin{array}{c} c_0 \\ f_0 \end{array} \right], $$ the next step could be determined from a least squares fit to $$ Y = \left[ \begin{array}{ccc} X_1 & X_3 & X_2 c_0 \end{array} \right] \left[ \begin{array}{c} c_1 \\ f_1 \\ a_1 \end{array} \right]. $$ In general, for the $k$-th step, $$ Y = \left[ \begin{array}{ccc} X_1 & X_3 & X_2 c_{k-1} \end{array} \right] \left[ \begin{array}{c} c_k \\ f_k \\ a_k \end{array} \right]. $$ Of course, the crucial question here is whether this will converge. If you expect $a$ to be small, in comparison to the other coefficients, its a good bet that this will converge, but in general I have no guarantee -- you'd have to test it with some artificial data similar to your problem.

Artificial example

In R, for the artificial data

N <- 10000
k <- 3
X1 <- matrix(rnorm(N*k),N,k)
X2 <- matrix(rnorm(N*k,mean=1,sd=2),N,k)
X3 <- matrix(rnorm(N*k,mean=0.5,sd=3),N,k)
c <- c(1,-2,3)
f <- c(0.5,0.5,-0.5)
a <- 17
Y <- X1 %*% c + a * X2 %*% c + X3 %*% f + rnorm(N,mean=3,sd=0.5)

iteration works pretty well even though $a$ is larger than the other parameters. The trivial implementation

num_iters <- 15
fit <- glm(Y~X1+X3)
for (iter in 1:num_iters) {
  clast <- fit$coefficients[2:4]
  fit <- glm(Y~X1+X3+I(X2 %*% clast))
  print(fit$coefficients[8])
}
print(summary(fit))

produces the output

[1] "ITERATION  1 :  8.87355028403305"
[1] "ITERATION  2 :  14.3757107816677"
[1] "ITERATION  3 :  16.4456275628182"
[1] "ITERATION  4 :  16.5781670165545"
[1] "ITERATION  5 :  17.0033374400752"
[1] "ITERATION  6 :  17.1368037041068"
[1] "ITERATION  7 :  16.9283985018681"
[1] "ITERATION  8 :  17.1041965208343"
[1] "ITERATION  9 :  16.9957102474476"
[1] "ITERATION  10 :  17.0542110874913"
> print(summary(fit))

Call:
glm(formula = Y ~ X1 + X3 + I(X2 %*% clast))

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.94524  -0.36524  -0.00214   0.35935   2.18939  

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)      2.8458762  0.0057851   491.9   <2e-16 ***
X11              1.0026376  0.0053831   186.3   <2e-16 ***
X12             -1.9965402  0.0053947  -370.1   <2e-16 ***
X13              2.9933836  0.0054192   552.4   <2e-16 ***
X31              0.4997454  0.0017970   278.1   <2e-16 ***
X32              0.5000780  0.0017966   278.3   <2e-16 ***
X33             -0.5009818  0.0017785  -281.7   <2e-16 ***
I(X2 %*% clast) 17.0542111  0.0007385 23093.8   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.2917083)

    Null deviance: 1.5593e+08  on 9999  degrees of freedom
Residual deviance: 2.9147e+03  on 9992  degrees of freedom
AIC: 16069

Number of Fisher Scoring iterations: 2

I haven't implemented a generic method for the meta-minimization solution, but the following code (slowly) produces a plot of the RSS versus the value of $a$:

fixedfit <- function(a) {
  fit <- glm(Y~I(X1+a*X2)+X3)
  rss <- sum(fit$residuals^2)
  return(rss)
}
avec <- seq(-50,50,0.5)
rssvec <- sapply(avec,fixedfit)
plot(avec,log(rssvec))

A reasonable assumption is that the only minimum is near the true value 17, but this is entirely unproven, for this problem let alone the general case. Naive minimization techniques might be sensitive to the starting value. A small positive starting value will likely reach the true minimum, but a small negative starting value will likely get pushed to minus infinity.

Non-orthogonal artificial data

If the data X1, X2, and X3 are not orthogonal, e.g.

X <- matrix(rnorm(N*k*3),N,k*3)
X1 <- X[,1:3]
X2 <- 2*X[,1:3]-X[,4:6]
X3 <- X[,1:3]+X[,4:6]+X[,7:9]

then the iteration method can fail to converge:

[1] "ITERATION  1 :  0.391215448317999"
[1] "ITERATION  2 :  15.1536981308529"
[1] "ITERATION  3 :  0.455715181986398"
[1] "ITERATION  4 :  0.39480376574178"
[1] "ITERATION  5 :  0.440481548104305"
[1] "ITERATION  6 :  0.395283350706838"
[1] "ITERATION  7 :  0.431519796371501"
[1] "ITERATION  8 :  0.39570583204234"
[1] "ITERATION  9 :  0.425599434752256"
[1] "ITERATION  10 :  0.396082244739129"
> print(summary(fit))

Call:
glm(formula = Y ~ X1 + X3 + I(X2 %*% clast))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-85.727  -13.926    0.191   14.172   89.634  

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)       2.983438   0.206276  14.463  < 2e-16 ***
X11              69.934809   0.283017 247.104  < 2e-16 ***
X12               2.915685   0.519870   5.608  2.1e-08 ***
X13              23.730564   0.597014  39.749  < 2e-16 ***
X31             -13.380980   0.148947 -89.837  < 2e-16 ***
X32              -0.371261   0.171066  -2.170     0.03 *  
X33              -4.448586   0.180890 -24.593  < 2e-16 ***
I(X2 %*% clast)   0.396082   0.002013 196.731  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 425.386)

    Null deviance: 214702255  on 9999  degrees of freedom
Residual deviance:   4250457  on 9992  degrees of freedom
AIC: 88919

Number of Fisher Scoring iterations: 2

On the other hand, meta-minimization could still work in such a case:

This was very helpful. In my problem $a$ is in fact bounded, so a numerical search would be OK. Also, ${\bf f}=f\in{\cal R}$ is a scalar in my case. — rhz, Nov 16 '16 at 18:55

Regression with "proportionality constraints" on unknown parameters

1 Answers1