4

Say I have a $n\times d$ dataset $D$ where $n\ll d$ ($n$ number of observations, $d$ number of dimensions).

Currently, if I want $m$ samples from $D$ assuming it is multivariate Gaussian, I can do this (in Matlab):

T = cholcov(cov( D ));
S = bsxfun( @plus, randn( m, size(D,2) ) * T, mean(D,1) );

However, this will compute a $d\times d$ covariance matrix which is very big, but not full rank obviously. I feel like this is wasting a lot of time and using the "wrong" information.

I know of a trick used with PCA in that case, which essentially uses $D D^T$ instead of $D^T D$ (that's an abusive notation, I said nothing about D being demeaned or the normalisation to get a covariance matrix of course.. but you get the idea, it uses the "small" full-rank covariance instead).

Is there a similar "trick" to sample from multivariate Gaussians?


So, from the link mentioned in comment, I put together this Matlab code which is already much faster, is that correct?

[U,L,V] = svd( D, 'econ' );
S = bsxfun( @plus, randn( m, size(D,1) ) * L * V', mean(D,1) );
Jonathan H
  • 246
  • 1
  • 10
  • Oh, I think this might be related: http://stats.stackexchange.com/q/159313/44129 – Jonathan H Jun 12 '16 at 12:20
  • Instead of using the sample covariance matrix, readers may want to consider a method designed for large $P$, small $N$. One example with very nice properties such as optimal shrinkage and guaranteed positive definiteness is [here][1]. But regardless of the estimation scheme, the computational aspect of this question is interesting. [1]: https://strimmerlab.github.io/software/corpcor/index.html – eric_kernfeld Aug 25 '21 at 20:28

1 Answers1

1

You can phrase this in a slightly more general way: you need to compute $\sqrt M x$ for a vector $x$ and a matrix $M$, and it's easy to multiply vectors by $M$ but not easy to store $M$ explicitly or do e.g. a Cholesky decomposition of $M$. There are methods to solve this problem -- take a look at the references below.

Pleiss, G., Jankowiak, M., Eriksson, D., Damle, A., & Gardner, J. R. (2020). Fast matrix square roots with applications to Gaussian processes and Bayesian optimization. arXiv preprint arXiv:2006.11267.

Allen, E. J., Baglama, J., & Boyd, S. K. (2000). Numerical approximation of the product of the square root of a matrix with a vector. Linear Algebra and its Applications, 310(1-3), 167-181.

https://scicomp.stackexchange.com/questions/11369/applying-matrix-square-root-inverse-in-matrix-free-regime?noredirect=1&lq=1

eric_kernfeld
  • 4,828
  • 1
  • 16
  • 41