Reparameterization of a VAE can be applied to any distribution, as long as you can find a way to express that distribution (or an approximation of it) in terms of
- The parameters emitted from the encoder
- Some random generator. For a Gaussian VAE, this is a $\mathcal{N}(0,1)$ distribution because for $z \sim \mathcal{N}(0,1)$ means that $z\sigma + \mu =x\sim\mathcal{N}(\mu,\sigma^2)$. Other distributions might employ a different random number generator. A Dirichlet VAE uses a $\mathcal{U}(0,1)$ distribution, an approximation to a gamma distribution, and the ratio of independent gamma random variables to yield a Dirichlet distribution.
This transformation needs to be differentiable in order to use back-propagation. That's why we use re-parameterization.
The choice of re-parameterization can influence how the encoding works -- ideally, we want all components of the model to be used in encoding and decoding the data.
This article provides more details in the context of a Gaussian and Dirichlet VAE.
Weonyoung Joo, Wonsung Lee, Sungrae Park, Il-Chul Moon by "Dirichlet Variational Autoencoder"
This paper proposes Dirichlet Variational Autoencoder (DirVAE) using a Dirichlet prior for a continuous latent variable that exhibits the characteristic of the categorical probabilities. To infer the parameters of DirVAE, we utilize the stochastic gradient method by approximating the Gamma distribution, which is a component of the Dirichlet distribution, with the inverse Gamma CDF approximation. Additionally, we reshape the component collapsing issue by investigating two problem sources, which are decoder weight collapsing and latent value collapsing, and we show that DirVAE has no component collapsing; while Gaussian VAE exhibits the decoder weight collapsing and Stick-Breaking VAE shows the latent value collapsing. The experimental results show that 1) DirVAE models the latent representation result with the best log-likelihood compared to the baselines; and 2) DirVAE produces more interpretable latent values with no collapsing issues which the baseline models suffer from. Also, we show that the learned latent representation from the DirVAE achieves the best classification accuracy in the semi-supervised and the supervised classification tasks on MNIST, OMNIGLOT, and SVHN compared to the baseline VAEs. Finally, we demonstrated that the DirVAE augmented topic models show better performances in most cases.