How does the reparameterisation trick work for multivariate Gaussians?

Question

I understand that for sampling from a univariate Gaussian, we can use $x = g(\epsilon) = \mu + \epsilon \sigma $ and then differentiate this transformation with respect to $\mu, \sigma$. How does this work for sampling a multivariate Gaussian with a covariance matrix $\Sigma$?

In this paper (p19) they say $g(\epsilon) = \mu + L\epsilon$, with $LL^T = \Sigma$. What is this $L$? It can't be a Cholesky factorisation, for the terms to add it must be a vector, but covariance matrices won't decompose this way.

What is the transformation for the reparameterisation trick on a multivariate Gaussian? And then what is its gradient with respect to $\Sigma$?

score 2 · Answer 1 · answered Apr 11 '21 at 13:30

2

I am being stupid -- it is in fact the Cholesky, $L \epsilon$ is a vector for a vector $\epsilon$. Found my answer here in case it helps anyone else who is confused

answered Apr 11 '21 at 13:30

Ben Gutteridge

51
2

How does the reparameterisation trick work for multivariate Gaussians?

1 Answers1