Variational Autoencoder, understanding this diagram

Question

I'm not an ML scientist, but I'm trying to understand how variational autoencoder works.

I'll take as reference the following diagram, which it couldn't be used for backpropagation as includes a sampling process but it captures anyway what I don't understand. The diagram is taken from this link.

I'm specifically going to focus on the encoder part. My understanding is that $x_1,\ldots, x_6$ are real values (the features) and in the second layer each of $a_1, \ldots\ a_4$ is another real value.

No we have these functions $\mu_1,\mu_2,\sigma_1,\sigma_2$, from that diagram again it seems that both $\mu_1$ and $\mu_2$ compute the mean of the vector $a = (a_1,\ldots,a_4)$ but if this is the case then $\mu_1(a) = \mu_2(a)$ and I don't see the point of this, I'd make a similar observation for the $\sigma_1, \sigma_2$ functions.

The question is, w.r.t. that diagram, how exactly are $\mu_1$ and $\mu_2$ computed?

Sycorax · Accepted Answer · 2019-08-26T19:25:23.433

1

The point of a variational autoencoder is to have an encoder that produces a probability distribution for a given input. In this model, the latent probability distribution is 2 independent normals, equivalently a bivariate normal distribution with mean vector $\begin{bmatrix}\mu_1 \\ \mu_2 \end{bmatrix}$ and covariance matrix $\begin{bmatrix} \sigma_1^2 & 0 \\ 0 & \sigma_2^2 \end{bmatrix}$. Each input is mapped to its own probability distribution. Then you sample from that distribution, and the decoder reconstructs the input given that random draw from the distribution.

Importantly, $\mu$ and $\sigma$ are not parameters of the network. They are the outputs of the encoder. The way that the model finds good values of $\mu$ and $\sigma$ is by updating the parameters (weights and biases) of the network.

With this in mind, it's important to recognize that $\mu_i$ doesn't compute the mean of $a$; it's an estimate of the mean parameter $\mu_i$ for that observation. Likewise, $\sigma$ is an estimate of the covariance matrix for the latent probability distribution. When your model learns a disentangled latent representation, each component $i$ corresponds to a different feature of that latent representation, so they will not be equal in general.

More details and general information about VAEs are available in this thread: What are variational autoencoders and to what learning tasks are they used?

edited Aug 26 '19 at 19:25

answered Aug 26 '19 at 17:18

Sycorax

76,417
20
189
313

So wait... are these $\mu$'s and $\sigma$'s learnt as well? – user8469759 Aug 26 '19 at 19:00
@user8469759 $\mu$ and $\sigma$ are not parameters of the network. They are the **outputs** of the encoder. The way that the model finds good values of $\mu$ and $\sigma$ is by updating the weights and biases of the network. – Sycorax Aug 26 '19 at 19:22
So they're literally the output of some activation function, it's just how we interpret them then, is that right? – user8469759 Aug 27 '19 at 08:32
It's a matter of interpretation and use. The re-parameterization trick uses $\mu$ and $\sigma$ so that you can draw a random normal deviate and still apply back-prop to the encoder. – Sycorax Aug 27 '19 at 11:40
But am I correct in saying that in a VAE we want to learn a distribution such that when we sample from such distribution using a Gaussian we get a sample of the distribution representing the data? (I'm reading through the link you provided). – user8469759 Aug 27 '19 at 13:00
You can ask a new question by clicking the "Ask Question" button. – Sycorax Aug 27 '19 at 13:00
Which question? it's still the same question. – user8469759 Aug 30 '19 at 08:46
I think I got what you meant in your answer anyway. The encoder of the network represents a sampling algorithm. After we do the reparameterization trick we can do backpropagation. The $\mu$ and $\sigma$ parameters are, as you said, output of the decoder. We don't learn these parameters ($\mu$s and $\sigma$s) but deep network of the encoder learns parameters the allow to compute $\mu$s and $\sigma$s as functions of the input $x$. Therefore the deep network defines $q_{\Phi}(z | x) = q_{\Phi}( z | \mu(x), \sigma(x))$. Is this correct? – user8469759 Aug 30 '19 at 09:38
Mostly. $\mu$ and $\sigma$ are the outputs of the **encoder.** The **decoder** reconstructs the input from the latent representation. If you've found my answer helpful, please consider up-voting and/or accepting it. – Sycorax Aug 30 '19 at 12:29
I'll accept once everything is clarified don't worry. Sorry anyway in my last comment I meant "encoder" (i.e. passing from features to latent space). With this change is my understanding correct? – user8469759 Aug 30 '19 at 13:05
Yes. (15 characters) – Sycorax Aug 30 '19 at 13:18

Variational Autoencoder, understanding this diagram

1 Answers1