2

I have a problem and I believe there must be a machine learning technique to solve it, but I am new to machine learning and I have no idea where to start.

So, I have multiple multivariate parameter vectors $\mathbf x$ and corresponding output vectors $\mathbf b$.

$\mathbf b$ was obtained by a matrix $\mathbf A$ such that $\mathbf {A x = b}$.

The data $\mathbf x$ and $\mathbf b$ that I have contain noises; anyway I would like to estimate the matrix $\mathbf A$ using the data $\mathbf x$ and $\mathbf b$ that I have.

So the problem should be solving

$$\operatorname*{arg\,min}_{\mathbf A} \| \mathbf{AX - B} \|$$

where $\mathbf X$ is a matrix containing multivariate parameter vectors in its columns, and the corresponding output vectors are stored in $\mathbf B$ as its column vectors.

Can anyone guide me how can I attempt to estimate the matrix $\mathbf A$?

anony
  • 23
  • 3
  • 2
    Can you just do linear/logistic/etc regression, depending on the nature of your $b$ variable, without an intercept term? – TrynnaDoStat Mar 16 '15 at 17:08
  • sorry, i'm a newbie so i'll study regression without an intercept term and come back tomorrow! btw the arg min equation has another s.t. condition in the original problem. will it make any difference? – anony Mar 16 '15 at 17:18
  • Usually ones estimates $x$ in the context of $Ax=b$. Assuming your $X$ and $B$ are matrices, you seem to simply want to estimate the coefficients of a multivariate normal regression for the $d$-dimensional responses $B$ using the design matrices in $A$. Pretty much any statistical packages can do that. In MATLAB you would use [mvregress](http://www.mathworks.com/help/stats/mvregress.html) or in R something like [manova](https://stat.ethz.ch/R-manual/R-patched/library/stats/html/manova.html) or [MCMCglmm](http://www.inside-r.org/packages/cran/MCMCglmm/docs/MCMCglmm) (do not use `lm`). – usεr11852 Mar 16 '15 at 18:46
  • 1
    @usεr11852 Symbolically, a solution appears to exist if $XX^T$ is non-singular: $A = BX^T(XX^T)^{-1}$. This seems to be *too* obvious, so I wonder if I have missed something important. Thoughts? – Sycorax Mar 16 '15 at 20:45
  • 4
    This is a standard [multivariate regression](http://stats.stackexchange.com/search?tab=votes&q=multivariate%20regression) problem. Your notation and terminology may confuse readers, because what you call "multiple multivariate parameter vectors $x$" would be called "data" by most people and the coefficients of $A$ would be called the "parameters." – whuber Mar 16 '15 at 20:47
  • @user777: I agree, you wrote something valid. What you describe is correct but it is only a sub-case one might consider. Usually when one talks about multivariate regression some explicit assumptions about heteroskedasticity in the error are made. (Kronecker products appear in the error structure, EM-variants are need for the estimation, etc.) This is why I said `do not use lm`, I wanted to steer away the OP from treating a sub-case as the general case. (In general any system $A^TAX = A^TB$ is compatible.) – usεr11852 Mar 16 '15 at 21:36
  • @anony is this a homework problem? – shadowtalker Mar 17 '15 at 03:47
  • This is not *so* standard as both $x$ and $b$ are noisy. Multivariate regression solves $b = Ax + \varepsilon$ in $A$ when $x$ is known precisely. – Elvis Mar 17 '15 at 07:07
  • @Elvis You may be correct in your interpretation of the English statement of the problem. The *mathematical* formulation stated in this question, though, is unambiguous: it's ordinary (multivariate) regression. You might like to post an answer that explains the distinction and proposes an appropriate procedure. – whuber Mar 17 '15 at 17:15
  • @whuber I am unfortunately unable to propose a procedure, but... I happen to know someone who spent quite an amount of time on this kind of problem. I'll try to check in her publications if there is something that can be useful to the OP. – Elvis Mar 18 '15 at 00:55
  • @Elvis Your interpretation makes it a *multivariate mixed model,* if that's of any help. – whuber Mar 18 '15 at 14:07
  • @whuber I think the problem is estimating $A$ from $b$ and $x$ in $b = Ax + A\epsilon_1 + A\epsilon_2$. If it were estimating $A$ in $b = Ax + B\epsilon_1 + \epsilon_2$ with $B$ known, it would be a mixed model; is it still a mixed model with the unknown $A$ involved in the residual variance?! – Elvis Mar 18 '15 at 14:56
  • 1
    @Elvis I am grateful that you have made that distinction. It all comes down to what is assumed about the covariance structure of the "noises," about which we are told nothing in this question. In the formulation $x = x_0+\epsilon_1$, where the "noise" is additive error in $x$, the model will be $b = A(x_0+\epsilon_1)+\epsilon = Ax_0 + A\epsilon_1 + \epsilon$, which is a multivariate errors-in-variables model. – whuber Mar 18 '15 at 15:55
  • thanks everyone for the answers! :) i'll check on each suggestion and slowly figure it out when i have a free time. – anony Apr 08 '15 at 15:54

2 Answers2

1

If $ B = AX + \epsilon$ I can write

$B^{T} = X^{T}A^{T} + \epsilon^{T}$.

Let $A^{T} = Y$. The transposed equation is essentially

$B_{1} = A_{1}Y + \epsilon_{1}$ (where the 1 suffix is for the transposed terms. This is now a standard form MANOVA. What you should do and how is beautifully explained in Multivariate multiple regression in R

Sid
  • 2,489
  • 10
  • 15
0

The way you wrote the optimization problem is that you are trying to find the map A. You can see the matrix A as a map or collection of basis on which projected data x gives b. Is this what are you looking for? If so, take a look at dictionary learning (e.g. SPAMS with many useful links). Typically one needs to provide additional information, for example enforce sparsity. Why? Because one can come up with infinitely many maps that give the desired mapping for the data you have. You should at least enforce constraints on elements in A, for example orthonormality.

Vladislavs Dovgalecs
  • 2,315
  • 15
  • 18