0

I'm looking at the following implementation of a VAE: https://github.com/jmtomczak/vae_vpflows/blob/master/models/VAE.py

KL divergence is implemented as:

# KL
log_p_z = log_Normal_standard(z_q, dim=1)
log_q_z = log_Normal_diag(z_q, z_q_mean, z_q_logvar, dim=1)
KL = -(log_p_z - log_q_z)

z_q is a batch of samples from the latent space p(z|x) and z_q_mean and z_q_logvar are the predicted means and log variances from which the sample is drawn. log_Normal_standard and log_Normal_diag are implemented as follows:

def log_Normal_diag(x, mean, log_var, average=False, dim=None):
    log_normal = -0.5 * ( log_var + torch.pow( x - mean, 2 ) / torch.exp( log_var ) )
    if average:
        return torch.mean( log_normal, dim )
    else:
        return torch.sum( log_normal, dim )

def log_Normal_standard(x, average=False, dim=None):
    log_normal = -0.5 * torch.pow( x , 2 )
    if average:
        return torch.mean( log_normal, dim )
    else:
        return torch.sum( log_normal, dim )

I'm unfamiliar with this calculation of KL divergence for lognormal distributions and I can't find any supplementary material that matches this formulation.

Can anyone point me to equations that match this formulation?

  • Check out https://stats.stackexchange.com/questions/7440/kl-divergence-between-two-univariate-gaussians – Jan Kukacka Jun 26 '19 at 07:47
  • to clarify: it's not a lognormal distribution, it's the log density of a normal distribution. – shimao Jun 26 '19 at 08:02
  • @JanKukacka I've looked at the derivation you provided but it is quite different to what I see here. On further reading the KL divergence expressed as (log_p_z - log_q_z) seems to be specified in terms of the log density ratio which I read about here https://tiao.io/post/density-ratio-estimation-for-kl-divergence-minimization-between-implicit-distributions/. So it isn't directly comparing distributions but their densities at z. The log_p_z density is given by the standard normal density function – Michael Anslow Jun 26 '19 at 15:19
  • https://wikimedia.org/api/rest_v1/media/math/render/svg/3123d8dd4c3386afe9fac119fed2cfaf7ce9f336. I suppose we ignore the normalising term for it because this is not important in optimisation? The same is done for the generalised normals distribution https://wikimedia.org/api/rest_v1/media/math/render/svg/dabaca1788ef8fdca1741f3481e862131ac54059. – Michael Anslow Jun 26 '19 at 15:27

0 Answers0