Why use Binary Cross Entropy for Generator in Adversarial Networks

Question

I'm trying to work with General Adversarial Networks and there's something I'm seeing everywhere but can't explain why...

the GANs are usually constructed from a Generator (which usually generates an image and is connected to the discriminator) and a discriminator (which is responsible for determining if the generated image is fake or not).

I can understand why the loss function for the Discriminator should be Binary Cross Entropy (determining between 2 classes) but why should the Generator's loss also be Binary Cross Entropy?

If the Generator is supposed to generate images, isn't it more appropriate to use a MSE or MAE loss for it? and what exactly happens when we use any loss functions other than BCE for the generator ?

Thanks so much...

Actually, I have been thinking about it for a while. And I think @Cypher has already make it very clear. you can derive the BCEloss formula which we use when coding. and you will find it looks exactly similar to the originally proposed ones; and then the anwer will be in the paper. — Richardson, May 05 '20 at 06:50

score 8 · Accepted Answer · answered Oct 28 '16 at 15:49

I found a really good answer from user ajmooch on reddit and decided to post it here in case someone had the misconceptions I had:

There's several things to keep in mind here.

The first thing is that the BCE objective for the Generator can more accurately be stated as "the images output by the generator should be assigned a high probability by the Discriminator." It's not BCE as you might see in a binary reconstruction loss, which would be BCE(G(Z),X) where G(Z) is a generated image and X is a sample, it's BCE(D(G(Z)),1) where D(G(Z)) is the probability assigned to the generated image by the discriminator. Given a "perfect" generator which always has photorealistic outputs, the D(G(Z)) values should always be close to 1. Obviously in practice there's difficulties getting this kind of convergence (the training is sort of inherently unstable) but that is the goal.

The second is that in the standard GAN algorithm, the latent vector (the "random noise" which the generator receives as input and has to turn into an image) is sampled independently of training data. If you were to use the MSE between the outputs of the GAN and a single image, you might get some sort of result out, but you'd effectively be saying "given this (random) Z, produce this specific X" and you'd be implicitly forcing the generator to learn a nonsensical embedding of the image. If you think of the Z vector as a high-level description of the image, it would be like showing it a dog three times and asking it to generate the same dog given three different (and uncorrelated) descriptions of the dog. Compare this with something like a VAE which has an explicit inference mechanism (the encoder network of the VAE infers Z values given an image sample) and then attempts to reconstruct a given image using those inferred Zs. The GAN does not attempt to reconstruct an image, so in its vanilla form it doesn't make sense to compare its outputs to a set of samples using MSE or MAE.

There's been some work done recently on incorporating similarity metrics into GAN training--this openAI paper adds an MSE objective between G(Z) and X in the final FC layer of the discriminator (a la Discriminative Regularization), which seems to work really well for semi-supervised learning (as based on their insanely good SVHN results) but doesn't really improve sample quality.

You can also slam VAEs and GANs together (as I have done and as several others have done before me) and use the inference mechanism of the VAE to provide guidance for the GAN generator, such that it makes sense to do some pixel-wise comparisons for reconstructions.

dontloo · Answer 2 · 2019-08-02T06:41:30.017

If I understand correctly, the two networks (functions) are trained by same loss $$V(D,G)=E_{p_{data}}[\log(D(x))]+E_{p_z}[\log(1-D(G(z)))]$$ which is the Binary Cross Entropy w.r.t the output of the discriminator $D$. The generator tries to minimize it and the discriminator tries to maximize it.

If we only consider the generator $G$, it's not Binary Cross Entropy any more, because $D$ has now become part of the loss.

score 2 · Answer 3 · answered Jun 26 '19 at 19:39

Unlike common classification problems where loss function needs to be minimized, GAN is a game between two players, namely the discriminator (D)and generator (G). Since it is 'just a game', both players should fight for the same ball! This is why the output of D is used to optimize both D and G. Instead of the term 'loss', I would rather use 'objective'.

Please note that the common tensor flow implementation of GANs using Cross Entropy with Logits (CEWL) for both Discriminator and Generator is not fully in line with Ian's equations in https://arxiv.org/abs/1406.2661 . Yet, using CEWL seems to give better results than applying Ian's equations!

Here are the underlying equations for CEWL implementation:

$D_{o}=\frac{1}{2m} \sum_{i=1}^{m}\left[D_l\left(G\left(z^{(i)}\right)\right)+\log \left(1+e^{-D_l\left(G\left(z^{(i)}\right)\right)}\right)+\log \left(1+e^{-D_l\left(x^{(i)}\right)}\right)\right]$

$G_{o}=\frac{1}{m} \sum_{i=1}^{m}\left[\log \left(1+e^{-D_l\left(G\left(z^{(i)}\right)\right)}\right)\right]$

Both functions above are to be minimized. The second term in D's objective function has conflicting impact, yet it works!

Why use Binary Cross Entropy for Generator in Adversarial Networks

3 Answers3

Linked