2

I'm not an ML scientist, but I'm trying to understand how variational autoencoder works.

I'll take as reference the following diagram, which it couldn't be used for backpropagation as includes a sampling process but it captures anyway what I don't understand. The diagram is taken from this link.

enter image description here

I'm specifically going to focus on the encoder part. My understanding is that $x_1,\ldots, x_6$ are real values (the features) and in the second layer each of $a_1, \ldots\ a_4$ is another real value.

No we have these functions $\mu_1,\mu_2,\sigma_1,\sigma_2$, from that diagram again it seems that both $\mu_1$ and $\mu_2$ compute the mean of the vector $a = (a_1,\ldots,a_4)$ but if this is the case then $\mu_1(a) = \mu_2(a)$ and I don't see the point of this, I'd make a similar observation for the $\sigma_1, \sigma_2$ functions.

The question is, w.r.t. that diagram, how exactly are $\mu_1$ and $\mu_2$ computed?

Sycorax
  • 76,417
  • 20
  • 189
  • 313
user8469759
  • 213
  • 1
  • 8

1 Answers1

1

The point of a variational autoencoder is to have an encoder that produces a probability distribution for a given input. In this model, the latent probability distribution is 2 independent normals, equivalently a bivariate normal distribution with mean vector $\begin{bmatrix}\mu_1 \\ \mu_2 \end{bmatrix}$ and covariance matrix $\begin{bmatrix} \sigma_1^2 & 0 \\ 0 & \sigma_2^2 \end{bmatrix}$. Each input is mapped to its own probability distribution. Then you sample from that distribution, and the decoder reconstructs the input given that random draw from the distribution.

Importantly, $\mu$ and $\sigma$ are not parameters of the network. They are the outputs of the encoder. The way that the model finds good values of $\mu$ and $\sigma$ is by updating the parameters (weights and biases) of the network.

With this in mind, it's important to recognize that $\mu_i$ doesn't compute the mean of $a$; it's an estimate of the mean parameter $\mu_i$ for that observation. Likewise, $\sigma$ is an estimate of the covariance matrix for the latent probability distribution. When your model learns a disentangled latent representation, each component $i$ corresponds to a different feature of that latent representation, so they will not be equal in general.

More details and general information about VAEs are available in this thread: What are variational autoencoders and to what learning tasks are they used?

Sycorax
  • 76,417
  • 20
  • 189
  • 313
  • So wait... are these $\mu$'s and $\sigma$'s learnt as well? – user8469759 Aug 26 '19 at 19:00
  • @user8469759 $\mu$ and $\sigma$ are not parameters of the network. They are the **outputs** of the encoder. The way that the model finds good values of $\mu$ and $\sigma$ is by updating the weights and biases of the network. – Sycorax Aug 26 '19 at 19:22
  • So they're literally the output of some activation function, it's just how we interpret them then, is that right? – user8469759 Aug 27 '19 at 08:32
  • It's a matter of interpretation and use. The re-parameterization trick uses $\mu$ and $\sigma$ so that you can draw a random normal deviate and still apply back-prop to the encoder. – Sycorax Aug 27 '19 at 11:40
  • But am I correct in saying that in a VAE we want to learn a distribution such that when we sample from such distribution using a Gaussian we get a sample of the distribution representing the data? (I'm reading through the link you provided). – user8469759 Aug 27 '19 at 13:00
  • You can ask a new question by clicking the "Ask Question" button. – Sycorax Aug 27 '19 at 13:00
  • Which question? it's still the same question. – user8469759 Aug 30 '19 at 08:46
  • I think I got what you meant in your answer anyway. The encoder of the network represents a sampling algorithm. After we do the reparameterization trick we can do backpropagation. The $\mu$ and $\sigma$ parameters are, as you said, output of the decoder. We don't learn these parameters ($\mu$s and $\sigma$s) but deep network of the encoder learns parameters the allow to compute $\mu$s and $\sigma$s as functions of the input $x$. Therefore the deep network defines $q_{\Phi}(z | x) = q_{\Phi}( z | \mu(x), \sigma(x))$. Is this correct? – user8469759 Aug 30 '19 at 09:38
  • Mostly. $\mu$ and $\sigma$ are the outputs of the **encoder.** The **decoder** reconstructs the input from the latent representation. If you've found my answer helpful, please consider up-voting and/or accepting it. – Sycorax Aug 30 '19 at 12:29
  • I'll accept once everything is clarified don't worry. Sorry anyway in my last comment I meant "encoder" (i.e. passing from features to latent space). With this change is my understanding correct? – user8469759 Aug 30 '19 at 13:05
  • Yes. (15 characters) – Sycorax Aug 30 '19 at 13:18