0

Suppose $A$ is a matrix consisting of real numbers and nan's. What are some of the robust formulations and the associated algorithms for estimating its singular value decomposition? What are the references?

One formulation could be the following.

Let $I:=\{(i,j)|A_{i,j}\in \mathbf R\}$, $\mathscr U:=\{U|U^TU=I\}$, $\mathscr D:=$ set of diagonal matrices and $$f(U,\Lambda,V):=\sum_{(i,j)\in I}\big(A_{i,j}-(U\Lambda V^T)_{i,j}\big)^2.$$ We want to solve $$\min_{U,V\in\mathscr U,\,\Lambda\in\mathscr D} f(U,\Lambda,V).$$

Hans
  • 855
  • 4
  • 14
  • 1
    Have you considered estimating the missing values using the given data and then computing the SVD? – mhdadk Oct 16 '21 at 21:23
  • @mhdadk: How would you do that, particularly when the rows and columns can be permuted? – Hans Oct 16 '21 at 22:27
  • Suppose $A$ is $m \times n$, where each column is an i.i.d sample obtained from a multivariate normal distribution $N(\mathbf{\mu},\mathbf{\Sigma})$. $\mathbf{\mu}$ is the $m \times 1$ mean vector and $\mathbf{\Sigma}$ is the $m \times m$ covariance matrix. For each column in $A$, you can use the EM algorithm to alternate between estimating the missing values in the column given the observed values in the column and estimating $\mathbf{\mu}$ and $\mathbf{\Sigma}$. For any given $A$, this procedure can be used to estimate the missing values and then compute the SVD. Of course,... – mhdadk Oct 16 '21 at 22:41
  • ...this approach assumes your data is multivariate Gaussian distributed, so that is something that can be improved if needed. Also, while this approach is invariant to permutation of the columns during estimation, it will not be invariant to permutation of the rows during estimation. However, I am not sure why that would be needed. – mhdadk Oct 16 '21 at 22:41
  • @mhdadk: Your alternating algorithm does not estimate the missing value **before** computing the SVD. That contradicts what you suggested in your first comment. – Hans Oct 17 '21 at 00:02
  • No what I meant was: perform $K$ iterations of this estimation procedure, stop iterating, then compute the SVD. – mhdadk Oct 17 '21 at 00:07
  • @mhdadk: 1) Given $\mu$ and $\Sigma$, one can fill in the missing data multiple ways --- the simplest is taking $\mu\pm \delta$. That will result in distinct SVD's. That is undesirable. 2) If we use the alternating and integrative method to estimate $(\mu,\Sigma)$, why do we not directly apply that to estimate my formulation directly? The former seems to be quite rounabout. – Hans Oct 17 '21 at 19:20
  • 1
    Similar posts: (dups?) https://stats.stackexchange.com/questions/214900/how-to-perform-svd-to-impute-missing-values-a-concrete-example, https://stats.stackexchange.com/questions/33103/svd-of-a-matrix-with-missing-values, – kjetil b halvorsen Dec 02 '21 at 19:14

0 Answers0