0

I have a very large, very sparse matrix $A \in \mathbb{N}^{n \times m}$ I'd like to perform SVD on. It is non-centered. When I center it to $A'$, I can't even fit it in memory (because $A'$ is in $\mathbb{R}^{n \times m}$). There's a way for me to work with a reduced set of features (which is not ideal), in which case I can fit $A'$ into memory, but sparse SVD is still horribly slow - I'm using scipy.sparse.linalg.svds .

With $A'$, I would be able to interpet the choice of $k$ components I keep by how much of total variation it preserved; is there any similar logic for non-centered $A$?

Hicjo
  • 51
  • 2
  • How large is $A$? How many $k$ do you need? How fast do you need it to be? – Sycorax May 29 '20 at 01:00
  • If $A$ were centered, I'd say I need the smallest k that covers 95% of total variation. For non-centered $A$, I don't know know what to replace this criterion with - hence the question. – Hicjo May 29 '20 at 01:19
  • $A$ is 600k x 50k; I'd have expected it to be done in <2 hrs. – Hicjo May 29 '20 at 01:21
  • Power iteration can be surprisingly effective if you only need 1 or 2 components. On the other hand, `scipy.sparse.linalg.svds` is calling compiled code specialized for sparse matrix factorization, so you would be hard-pressed to get much faster than that. If you haven't already, you might be able to realize some gains by doing the computation in the cloud on a better computer. – Sycorax May 29 '20 at 16:09
  • See also: https://stats.stackexchange.com/questions/469187/why-is-non-centered-svd-accepted-in-lsa/469200#469200 – Sycorax May 31 '20 at 14:13

0 Answers0