Variational Autoencoder with Dirichlet distributed latent space using the Weibull Distribution

Question

My goal is to create an VAE with an Dirichlet distributed latent space. Since the reparametrization trick does not work for the Dirichlet Distribution, I am trying to approximate the Gamma Distribution with the Weibull Distribution, from which I then would generate my Dirichlet distributed random samples.

To calculate the best approximation for a given distribution $Gamma(\alpha,\beta)$ I use the KL-Divergence of both distributions (see: WHAI: WEIBULL HYBRID AUTOENCODING INFERENCE FOR DEEP TOPIC MODELING):

$$ f(k, \lambda) = KL(Weibull(k,\lambda)||Gamma(\alpha,\beta)) $$ $$ f(k, \lambda) = -[\alpha\cdot ln(\lambda)-\frac{\gamma\cdot \alpha}{k}-ln(k)-\beta\cdot\lambda\cdot\Gamma(1+\frac{1}{k})+\gamma+1+\alpha\cdot ln(\beta)-ln(\Gamma(\alpha))] $$

If I set $$beta = 1$$ and differentiate f w.r.t. to $k$ and $\lambda$ I should get:

$$ \frac{\partial{f}}{\partial{k}} = -[\frac{\gamma\cdot \alpha}{k^2}-\frac{1}{k}-\lambda\cdot\Gamma'(1+\frac{1}{k})\cdot(-\frac{1}{k^2})] $$

$$ \frac{\partial{f}}{\partial{\lambda}} = -[\frac{\alpha}{\lambda}-\Gamma(1+\frac{1}{k})] $$

To calculate the best approximation for a given distribution $Gamma(\alpha,1)$ I would now set $\frac{\partial{f}}{\partial{k}} = 0$ and $\frac{\partial{f}}{\partial{\lambda}} = 0$.

My questions are:

Am I on the right track and if so can anybody help me to express $k$ and $\lambda$ explicitly?

The KL part in the ELBO is - according to DIRICHLET VARIATIONAL AUTOENCODER - calculated like this: $$ KL(Q||P)=\sum log(\Gamma(\alpha_k))-\sum log(\Gamma(\hat{\alpha}_k))+\sum(\hat{\alpha}_k-\alpha_k)\cdot\psi(\hat{\alpha}_k) $$

What choice of prior is advisable here?
The following question is not that important for my endeavor but clearing it up would be helpful: Foreach optimization step within a VAE the posterior is updated. Shouldn't then the prior be set to the latest posterior? How can that work if I choose distinct values for the parameters of the prior? (cf. Auto-Encoding Variational Bayes, Appendix B)

I know that there are other methods (e.g. Implicit Reparameterization Gradients) to achieve my goal, but I want to try this method for practice.

Variational Autoencoder with Dirichlet distributed latent space using the Weibull Distribution

0 Answers0