Reparametrization trick with non-Gaussian distributions?

Question

I am thinking about the re-parameterization trick in a variational auto-encoder.

I know that it can be used with normal distribution. Can the re-parameterization trick be applied to other distributions like Bernoulli? What are the limitations of its application?

Sycorax · Accepted Answer · 2020-06-24T12:45:56.027

Reparameterization of a VAE can be applied to any distribution, as long as you can find a way to express that distribution (or an approximation of it) in terms of

The parameters emitted from the encoder
Some random generator. For a Gaussian VAE, this is a $\mathcal{N}(0,1)$ distribution because for $z \sim \mathcal{N}(0,1)$ means that $z\sigma + \mu =x\sim\mathcal{N}(\mu,\sigma^2)$. Other distributions might employ a different random number generator. A Dirichlet VAE uses a $\mathcal{U}(0,1)$ distribution, an approximation to a gamma distribution, and the ratio of independent gamma random variables to yield a Dirichlet distribution.

This transformation needs to be differentiable in order to use back-propagation. That's why we use re-parameterization.

The choice of re-parameterization can influence how the encoding works -- ideally, we want all components of the model to be used in encoding and decoding the data.

This article provides more details in the context of a Gaussian and Dirichlet VAE.

Weonyoung Joo, Wonsung Lee, Sungrae Park, Il-Chul Moon by "Dirichlet Variational Autoencoder"

This paper proposes Dirichlet Variational Autoencoder (DirVAE) using a Dirichlet prior for a continuous latent variable that exhibits the characteristic of the categorical probabilities. To infer the parameters of DirVAE, we utilize the stochastic gradient method by approximating the Gamma distribution, which is a component of the Dirichlet distribution, with the inverse Gamma CDF approximation. Additionally, we reshape the component collapsing issue by investigating two problem sources, which are decoder weight collapsing and latent value collapsing, and we show that DirVAE has no component collapsing; while Gaussian VAE exhibits the decoder weight collapsing and Stick-Breaking VAE shows the latent value collapsing. The experimental results show that 1) DirVAE models the latent representation result with the best log-likelihood compared to the baselines; and 2) DirVAE produces more interpretable latent values with no collapsing issues which the baseline models suffer from. Also, we show that the learned latent representation from the DirVAE achieves the best classification accuracy in the semi-supervised and the supervised classification tasks on MNIST, OMNIGLOT, and SVHN compared to the baseline VAEs. Finally, we demonstrated that the DirVAE augmented topic models show better performances in most cases.

Isn't there going to be a differentiability problem with a random node that has a Bernoulli distribution? (The point of the reparametrisation trick being to make gradient computation easier) — Thomas Lumley, Jun 24 '20 at 01:56
I don't mean just any random node. Specifically, if $y=f(U, p)$ where $U$ is the randomness and $y$ is Bernoulli and $p$ is the Bernoulli mean, how do you make $y$ differentiable wrt $p$? — Thomas Lumley, Jun 24 '20 at 02:05
I didn't make a claim that it's possible in the specific case of a Bernoulli. The question is framed in general terms, and the Bernoulli distribution is introduced as an example. My reading is that if OP is specifically interested in a VAE for a Bernoulli distribution, they would have said so, instead of introducing a Bernoulli distribution as an example of a VAE for a non-normal distribution. — Sycorax, Jun 24 '20 at 02:08
Ok. I thought you'd found some clever way to do it for the Bernoulli and I was not getting the explanation — Thomas Lumley, Jun 24 '20 at 02:11
On the other hand, Googling `Bernoulli VAE` turns up some promising results, such as https://davidstutz.de/bernoulli-variational-auto-encoder-in-torch/ — Sycorax, Jun 24 '20 at 02:14

Reparametrization trick with non-Gaussian distributions?

1 Answers1

Linked