Ax = b. How can I estimate A, given multiple data vectors of x and b?

Question

I have a problem and I believe there must be a machine learning technique to solve it, but I am new to machine learning and I have no idea where to start.

So, I have multiple multivariate parameter vectors $\mathbf x$ and corresponding output vectors $\mathbf b$.

$\mathbf b$ was obtained by a matrix $\mathbf A$ such that $\mathbf {A x = b}$.

The data $\mathbf x$ and $\mathbf b$ that I have contain noises; anyway I would like to estimate the matrix $\mathbf A$ using the data $\mathbf x$ and $\mathbf b$ that I have.

So the problem should be solving

$$\operatorname*{arg\,min}_{\mathbf A} \| \mathbf{AX - B} \|$$

where $\mathbf X$ is a matrix containing multivariate parameter vectors in its columns, and the corresponding output vectors are stored in $\mathbf B$ as its column vectors.

Can anyone guide me how can I attempt to estimate the matrix $\mathbf A$?

Can you just do linear/logistic/etc regression, depending on the nature of your $b$ variable, without an intercept term? — TrynnaDoStat, Mar 16 '15 at 17:08
sorry, i'm a newbie so i'll study regression without an intercept term and come back tomorrow! btw the arg min equation has another s.t. condition in the original problem. will it make any difference? — anony, Mar 16 '15 at 17:18
Usually ones estimates $x$ in the context of $Ax=b$. Assuming your $X$ and $B$ are matrices, you seem to simply want to estimate the coefficients of a multivariate normal regression for the $d$-dimensional responses $B$ using the design matrices in $A$. Pretty much any statistical packages can do that. In MATLAB you would use [mvregress](http://www.mathworks.com/help/stats/mvregress.html) or in R something like [manova](https://stat.ethz.ch/R-manual/R-patched/library/stats/html/manova.html) or [MCMCglmm](http://www.inside-r.org/packages/cran/MCMCglmm/docs/MCMCglmm) (do not use `lm`). — usεr11852, Mar 16 '15 at 18:46
@usεr11852 Symbolically, a solution appears to exist if $XX^T$ is non-singular: $A = BX^T(XX^T)^{-1}$. This seems to be *too* obvious, so I wonder if I have missed something important. Thoughts? — Sycorax, Mar 16 '15 at 20:45
This is a standard [multivariate regression](http://stats.stackexchange.com/search?tab=votes&q=multivariate%20regression) problem. Your notation and terminology may confuse readers, because what you call "multiple multivariate parameter vectors $x$" would be called "data" by most people and the coefficients of $A$ would be called the "parameters." — whuber, Mar 16 '15 at 20:47
@user777: I agree, you wrote something valid. What you describe is correct but it is only a sub-case one might consider. Usually when one talks about multivariate regression some explicit assumptions about heteroskedasticity in the error are made. (Kronecker products appear in the error structure, EM-variants are need for the estimation, etc.) This is why I said `do not use lm`, I wanted to steer away the OP from treating a sub-case as the general case. (In general any system $A^TAX = A^TB$ is compatible.) — usεr11852, Mar 16 '15 at 21:36
This is not *so* standard as both $x$ and $b$ are noisy. Multivariate regression solves $b = Ax + \varepsilon$ in $A$ when $x$ is known precisely. — Elvis, Mar 17 '15 at 07:07
@Elvis You may be correct in your interpretation of the English statement of the problem. The *mathematical* formulation stated in this question, though, is unambiguous: it's ordinary (multivariate) regression. You might like to post an answer that explains the distinction and proposes an appropriate procedure. — whuber, Mar 17 '15 at 17:15
@whuber I am unfortunately unable to propose a procedure, but... I happen to know someone who spent quite an amount of time on this kind of problem. I'll try to check in her publications if there is something that can be useful to the OP. — Elvis, Mar 18 '15 at 00:55
@Elvis Your interpretation makes it a *multivariate mixed model,* if that's of any help. — whuber, Mar 18 '15 at 14:07
@whuber I think the problem is estimating $A$ from $b$ and $x$ in $b = Ax + A\epsilon_1 + A\epsilon_2$. If it were estimating $A$ in $b = Ax + B\epsilon_1 + \epsilon_2$ with $B$ known, it would be a mixed model; is it still a mixed model with the unknown $A$ involved in the residual variance?! — Elvis, Mar 18 '15 at 14:56
@Elvis I am grateful that you have made that distinction. It all comes down to what is assumed about the covariance structure of the "noises," about which we are told nothing in this question. In the formulation $x = x_0+\epsilon_1$, where the "noise" is additive error in $x$, the model will be $b = A(x_0+\epsilon_1)+\epsilon = Ax_0 + A\epsilon_1 + \epsilon$, which is a multivariate errors-in-variables model. — whuber, Mar 18 '15 at 15:55
thanks everyone for the answers! :) i'll check on each suggestion and slowly figure it out when i have a free time. — anony, Apr 08 '15 at 15:54

score 1 · Accepted Answer · edited Apr 13 '17 at 12:44

1

If $ B = AX + \epsilon$ I can write

$B^{T} = X^{T}A^{T} + \epsilon^{T}$.

Let $A^{T} = Y$. The transposed equation is essentially

$B_{1} = A_{1}Y + \epsilon_{1}$ (where the 1 suffix is for the transposed terms. This is now a standard form MANOVA. What you should do and how is beautifully explained in Multivariate multiple regression in R

edited Apr 13 '17 at 12:44

Community

1

answered Mar 17 '15 at 02:11

Sid

2,489
10
15

+1 just because you realized that this *simply* a MANOVA. – usεr11852 Mar 17 '15 at 18:17

Vladislavs Dovgalecs · Answer 2 · 2015-03-17T00:25:29.200

The way you wrote the optimization problem is that you are trying to find the map A. You can see the matrix A as a map or collection of basis on which projected data x gives b. Is this what are you looking for? If so, take a look at dictionary learning (e.g. SPAMS with many useful links). Typically one needs to provide additional information, for example enforce sparsity. Why? Because one can come up with infinitely many maps that give the desired mapping for the data you have. You should at least enforce constraints on elements in A, for example orthonormality.

Ax = b. How can I estimate A, given multiple data vectors of x and b?

2 Answers2