1

I understand that for sampling from a univariate Gaussian, we can use $x = g(\epsilon) = \mu + \epsilon \sigma $ and then differentiate this transformation with respect to $\mu, \sigma$. How does this work for sampling a multivariate Gaussian with a covariance matrix $\Sigma$?

In this paper (p19) they say $g(\epsilon) = \mu + L\epsilon$, with $LL^T = \Sigma$. What is this $L$? It can't be a Cholesky factorisation, for the terms to add it must be a vector, but covariance matrices won't decompose this way.

What is the transformation for the reparameterisation trick on a multivariate Gaussian? And then what is its gradient with respect to $\Sigma$?

1 Answers1

2

I am being stupid -- it is in fact the Cholesky, $L \epsilon$ is a vector for a vector $\epsilon$. Found my answer here in case it helps anyone else who is confused