Find best fit diagonal matrix for error minimization

Question

I want a set of input values to be as similar to the output values as possible. I have an input matrix X (m*n) that has m data points and n dimensions for each data point. I also have an output matrix Y (m*n) that has m data points and n dimensions.

I want to solve the equation XA + e = Y where A is a diagonal matrix (nn) and e is the error. I want to find the diagonal matrix A that would minimize e. Diagonal matrix A would essentially give me one value for each dimension that would bring my X closest to Y. I would expect A to minimize the sum of squares between each data point.

Additional Constraint is that the diagonal matrix should have values between 0 and 1.

Any help is greatly appreciated. Any alternate approach for to find a value for each dimension that brings X closest to Y is appreciated.

@Firebug. Thanks. I have edited the question to add the constraint that the values of the diagonal matrix should be between 0 and 1. Could you please tell me what the approach\solution for the diagonal matrix would be? — rrrj, Aug 25 '16 at 12:39

whuber · Accepted Answer · 2016-08-25T16:12:32.680

Let the columns of $X$ be $X_1,X_2,\ldots, X_n$, the corresponding entries of $A$ be $a_1, a_2, \ldots, a_n$, the columns of $Y$ be $Y_1, Y_2, \ldots, Y_n$, and the error columns be $e_1, e_2, \ldots, e_n$.

Notice that

$$e_i = Y_i - a_i X_i.$$

Each parameter $a_i$ is involved in only one of these expressions. Therefore, the sum of squares of the $e_i$, equal to $\sum_{i=1}^n |e_i|^2$, can be minimized by separately and independently finding $a_i$ that minimize the squared norms $|e_i|^2 = e_i^\prime e_i$. That's a set of $n$ (univariate) regression-through-the-origin problems. With no constraints on the $a_i$, the solutions would be

$$\hat a_i = \frac{Y_i^\prime X_i}{X_i^\prime X_i}.$$

If any of the $\hat a_i$ lies outside the constraining interval $[0,1]$, the convexity of the objective function shows you only need to examine its values on the boundary $\partial[0,1]=\{0,1\}$. A simple approach is an exhaustive search of both points: that is, compare the values of $|Y_i - X_i|^2$ and $|Y_i|^2$, choosing $\hat a_i = 1$ when the former is smaller and $\hat a_i=0$ otherwise.

Here are two examples with $m=100$ rows and $n=5$ columns. They were generated by creating the $X$ and $A$ matrices randomly and adding random errors to them to obtain $Y$. Provided the entries in $A$ are all in the range $[0,1]$, the estimated values should be close to the original ones (depending on how large the random errors are). The left plot in each example is a dotplot of the estimate $\hat a_i$ and the original parameter value $a_i$, enabling visual comparison of the estimates to the parameters. The right plot in each example is a scatterplot of the residuals of this fit (the $\hat e_i$) against the original errors. When the constraints are not applied (as in the bottom row), this scatterplot should be tightly focused on the line of equality. When constraints are applied (the top row), there will be more scatter (contributed by the corresponding columns).

The R code to produce this figure will let you experiment with arbitrary values of $m$ and $n$. The estimate of $A$ consists of four lines exactly paralleling the analysis: computation of the regression coefficient, of the two values at the boundary, and the comparisons needed to select the best one. It is fast and parallelizable--apart from the plotting step, it will run in seconds even when $n$ is in the millions ($10^6$).

m <- 100
n <- 5
par(mfrow=c(2,2))

for (i in c(23, 19)) {
  #
  # Generate data.
  #
  set.seed(i)
  x <- matrix(rnorm(m*n), m)
  alpha <- rnorm(n, 1/2, 1/2)
  eps <- matrix(rnorm(m*n, 0, 1/4), m)
  y <- t(t(x) * alpha) + eps
  #
  # Compute A.
  #
  a <- colSums(x*y) / colSums(x*x)
  a.0 <- colSums(y*y)
  a.1 <- colSums((y-x)*(y-x))
  a <- ifelse(0 <= a & a <= 1, a, ifelse(a.0 <= a.1, 0, 1))
  #
  # Plot results.
  #
  e <- y - x %*% diag(a)
  u <- rbind(Parameter=alpha, Estimate=a)
  dotchart(u, col=ifelse(abs(u-1/2)>1/2, "Red", "Blue"), cex=0.6, pch=20, 
           xlab="Parameter value")
  plot(as.vector(eps), as.vector(e), asp=1, col="#00000040", 
       xlab="Error", ylab="Residual")
}

score 4 · Answer 2 · answered Aug 25 '16 at 13:51

Here is a more automated way of solving all at once than that provided by @whuber. But I agree with all his insights, and his approach could be used instead. This way of doing things essentially stacks several independent (i.e., separable) problems into one, and automatically handles the bound constraints.

I would solve the following problem:

Minimize the Frobenius norm of $Y-XA$, subject to the constraint that A is diagonal, with entries in the range of $0$ to $1$.

The Frobenius norm squared is the sum of the squared entries of the matrix $Y-AX$. We can as well minimize the Frobenius norm, due to square root being strictly monotonically increasing, thereby producing the same optimal solution A.

Here is what it would look like under CVX under MATLAB. This will find the globally optimal solution to the problem I specified. You could do something similar under another computing environment.

Let's make up some sample problem data:

AA = diag([0.2 0.9 0.8])
% Randomly generate some 5 by 3 X as matrix of independent N(0,1)
X = rand(5,3);
% Generate a sample Y, by adding matrix of Normal error with standard deviation 0.1
Y = X * AA + 0.1 * randn(5,3);

% Now here's the code to solve it

cvx_begin
variable A(3,3) diagonal
minimize(norm(Y-X*A,'fro'))
0 <= diag(A) <= 1
cvx_end

Here's the solution:

0.2637         0         0
     0    0.8482         0
     0         0    0.7914

Now use AA = diag([0.2 0.9 1.1]), so the "true" solution would violate the bounds. I will reuse the same error draws as previously, and generate the new Y corresponding to this new AA. The solution this time is

0.2637         0         0
     0    0.8482         0
     0         0    1.0000

Find best fit diagonal matrix for error minimization

2 Answers2