I understand that for sampling from a univariate Gaussian, we can use $x = g(\epsilon) = \mu + \epsilon \sigma $ and then differentiate this transformation with respect to $\mu, \sigma$. How does this work for sampling a multivariate Gaussian with a covariance matrix $\Sigma$?
In this paper (p19) they say $g(\epsilon) = \mu + L\epsilon$, with $LL^T = \Sigma$. What is this $L$? It can't be a Cholesky factorisation, for the terms to add it must be a vector, but covariance matrices won't decompose this way.
What is the transformation for the reparameterisation trick on a multivariate Gaussian? And then what is its gradient with respect to $\Sigma$?