3

I have been puzzled trying to convert MSE to Log Likelihood in VAEs. Relevant Questions:

What is bits per dimension (bits/dim) exactly (in pixel CNN papers)?

Why is mean squared error the cross-entropy between the empirical distribution and a Gaussian model?

Relevant Discussion: Reddit: [Discussion] Calculation of bits/dims

In the paper: Masked Autoregressive Flow for Density Estimation

They provide a formula from going from "Pixel space" to logit space, but I don't understand the logic behind it.

enter image description here

They normalize the pixel values and then multiply by some hyper parameter that is chosen arbitrarily.

They then derive this formula:

enter image description here

For which it is not clear if $x_i$ is an image in the dataset/batch or a pixel of image $x_i$ (most likely the later one but still unsure)

For which $x_i$ is a pixel value of image x.

But it is not clear what is p(x) for my VAE trained on MSE.

Iordanis
  • 395
  • 2
  • 10

1 Answers1

1

They normalize the pixel values and then multiply by some hyper parameter that is chosen arbitrarily.

Actually, they're scaling and shifting an interval [0,255] to be [$\lambda,1-\lambda$], then applying logit to. Logit can't handle 0 or 1 as you know: $\mathrm{logit}(x)=\ln\frac x {1-x} $, so they have to put a floor and a ceiling on its inputs. They floor $\lambda$ is arbitrary in some sense.

They also "dequantize" the pixel values, by adding random noise, so the values become, sort of, continuous.

For which it is not clear if $x_i$ is an image in the dataset/batch or a pixel of image $x_i$ (most likely the later one but still unsure)

They denote $x$ the set of pixels of an image which has D pixels, but in logit space. So $x_i$ would be a pixel in logit space. The result of the formula is a density bits per pixel, which they get from $p(x)$ - density in logit space.

Aksakal
  • 55,939
  • 5
  • 90
  • 176
  • so is $p(x)$ calculated directly from the $x=$ relationship? if so what is it exactly in practice? – Iordanis Feb 18 '20 at 16:09
  • in the paper in this particular equation it's the density directly from the model. remember they logit transform the 256 bit pixels, z space, into x space which is "almost" $x\in [-\infty,\infty]$. the model fits to this space, and produces p(x) density, that's what they use in this equation – Aksakal Feb 18 '20 at 16:16
  • Sorry it is still not clear to me where to calculate the $p(x)$ from in the VAE context. – Iordanis Feb 18 '20 at 17:06